Skip to main content
Log in

The compressed feature matrix—a fast method for feature based substructure search

  • Original Paper
  • Published:
Journal of Molecular Modeling Aims and scope Submit manuscript

Abstract

The compressed feature matrix (CFM) is a feature based molecular descriptor for the fast processing of pharmacochemical applications such as adaptive similarity search, pharmacophore development and substructure search. Depending on the particular purpose, the descriptor may be generated upon either topological or Euclidean molecular data. To assure a variable utilizability, the assignment of the structural patterns to feature types is arbitrarily determined by the user. This step is based on a graph algorithm for substructure search, which resembles the common substructure descriptors. While these merely allow a screening for the predefined patterns, the CFM permits a real substructure/subgraph search, presuming that all desired elements of the query substructure are described by the selected feature set. In this work, the CFM based substructure search is evaluated with regard to both the different outputs resulting from varying feature sets and the search speed. As a benchmark we use the programmable atom typer (PATTY) graph algorithm. When comparing the two methods, the CFM based matrix algorithm is up to several hundred times faster than PATTY and when using the CFM as a basis for substructure screening, the search speed is accelerated by three orders of magnitude. Thus, the CFM based substructure search complies with the requirements for interactive usage, even for the evaluation of several hundred thousand compounds. The concept of the CFM is implemented in the software COFEA.

Figure CFM based substructure search using the compounds dopamine and benzene-1,2-diol

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1a–c.
Fig. 2a, b.
Fig. 3.
Fig. 4a, b.
Fig. 5a, b.
Fig. 6.

Similar content being viewed by others

Abbreviations

CFM:

compressed feature matrix

MCS:

maximum common substructure

HSCS:

highest scoring common substructure

SSSR:

smallest set of smallest rings

ESER:

essential set of essential rings

ESSR:

extended set of smallest rings

GSCE:

graph of smallest cycles at edges

PATTY:

programmable atom typer

HTS:

high throughput screening

References

  1. Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, p 427

  2. Ihlenfeld WD, Gasteiger J (1994) J Comput Chem 15:793–813

    Google Scholar 

  3. Hurst T, Heritage TW (1997) HQSAR. A highly predictive QSAR technique based on molecular holograms. In: 213th ACS National Meeting, San Francisco, Calif.

  4. Seel M, Turner DB, Willett P (1999) Quant Struct Act Relat 18:245–252

    Article  CAS  Google Scholar 

  5. Carhart RE, Smith DH, Venkataraghavan R (1985) J Chem Inf Comput Sci 25:64–73

    CAS  Google Scholar 

  6. Scsibrany H, Varmuza K (1992) Topological similarity of molecules based on maximum common substructures. In: Ziessow D (ed) Software development in chemistry. Proceedings of the 7th CIC Workshop "Computers in Chemistry", Berlin

  7. Ullmann JR (1976) J Assoc Comput Mach 23:31–42

    Article  Google Scholar 

  8. Daylight Chemical Information Systems (2002) Daylight theory manual, http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html

  9. Rücker G, Rücker C (2001) J Chem Inf Comput Sci 33:1457–1462

    Article  Google Scholar 

  10. Figueras J (1996) J Chem Inf Comput Sci 36:986–991

    Article  CAS  Google Scholar 

  11. Fujita S (1988) J Chem Inf Comput Sci 28:1–9

    CAS  Google Scholar 

  12. Downs GM, Gillet VJ, Holliday JD, Lynch MF (1989) J Chem Inf Comput Sci 29:187–206

    CAS  Google Scholar 

  13. Dury L, Latour T, Leherte L, Barberis F, Vercauteren DB (2001) J Chem Inf Comput Sci 41:1437–1445

    Article  CAS  PubMed  Google Scholar 

  14. Abolmaali SFB, Ostermann C, Zell A (2003) J Mol Model, in press

  15. Bush BL, Sheridan RP (1993) J Chem Inf Comput Sci 33:756–762

    CAS  Google Scholar 

  16. Wegner JK, Zell A (2002) JOELib—a java based computational chemistry package. 16th Molecular Modeling Workshop, Darmstadt

  17. JOELib (2002) http://sourceforge.net/projects/joelib

  18. Böhm M, Klebe G (2002) J Med Chem 45:1585–1597

    Article  Google Scholar 

  19. MDL Information Systems (2002) CTfile formats, http://www.mdli.com/downloads/literature/ctfile.pdf

  20. Dalby A, Nourse JG, Hounshell WG, Gushurst AKI, Grier DL, Leland BA, Laufer J (1992) J Chem Inf Comput Sci 32:244–255

    CAS  Google Scholar 

  21. National Cancer Institute, Bethesda, Md., http://dtp.nci.nih.gov/webdata.html

Download references

Acknowledgments

This work was realized within the scope of the SOL project (Search and Optimization of Lead structures) which is supported by the German Federal Ministry of Education and Research, bmb+f under contract number 311681.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. F. Badreddin Abolmaali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abolmaali, S.F.B., Wegner, J.K. & Zell, A. The compressed feature matrix—a fast method for feature based substructure search. J Mol Model 9, 235–241 (2003). https://doi.org/10.1007/s00894-003-0126-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00894-003-0126-0

Keywords

Navigation