Abstract
The compressed feature matrix (CFM) is a feature based molecular descriptor for the fast processing of pharmacochemical applications such as adaptive similarity search, pharmacophore development and substructure search. Depending on the particular purpose, the descriptor may be generated upon either topological or Euclidean molecular data. To assure a variable utilizability, the assignment of the structural patterns to feature types is arbitrarily determined by the user. This step is based on a graph algorithm for substructure search, which resembles the common substructure descriptors. While these merely allow a screening for the predefined patterns, the CFM permits a real substructure/subgraph search, presuming that all desired elements of the query substructure are described by the selected feature set. In this work, the CFM based substructure search is evaluated with regard to both the different outputs resulting from varying feature sets and the search speed. As a benchmark we use the programmable atom typer (PATTY) graph algorithm. When comparing the two methods, the CFM based matrix algorithm is up to several hundred times faster than PATTY and when using the CFM as a basis for substructure screening, the search speed is accelerated by three orders of magnitude. Thus, the CFM based substructure search complies with the requirements for interactive usage, even for the evaluation of several hundred thousand compounds. The concept of the CFM is implemented in the software COFEA.
Figure CFM based substructure search using the compounds dopamine and benzene-1,2-diol
Similar content being viewed by others
Abbreviations
- CFM:
-
compressed feature matrix
- MCS:
-
maximum common substructure
- HSCS:
-
highest scoring common substructure
- SSSR:
-
smallest set of smallest rings
- ESER:
-
essential set of essential rings
- ESSR:
-
extended set of smallest rings
- GSCE:
-
graph of smallest cycles at edges
- PATTY:
-
programmable atom typer
- HTS:
-
high throughput screening
References
Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, p 427
Ihlenfeld WD, Gasteiger J (1994) J Comput Chem 15:793–813
Hurst T, Heritage TW (1997) HQSAR. A highly predictive QSAR technique based on molecular holograms. In: 213th ACS National Meeting, San Francisco, Calif.
Seel M, Turner DB, Willett P (1999) Quant Struct Act Relat 18:245–252
Carhart RE, Smith DH, Venkataraghavan R (1985) J Chem Inf Comput Sci 25:64–73
Scsibrany H, Varmuza K (1992) Topological similarity of molecules based on maximum common substructures. In: Ziessow D (ed) Software development in chemistry. Proceedings of the 7th CIC Workshop "Computers in Chemistry", Berlin
Ullmann JR (1976) J Assoc Comput Mach 23:31–42
Daylight Chemical Information Systems (2002) Daylight theory manual, http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
Rücker G, Rücker C (2001) J Chem Inf Comput Sci 33:1457–1462
Figueras J (1996) J Chem Inf Comput Sci 36:986–991
Fujita S (1988) J Chem Inf Comput Sci 28:1–9
Downs GM, Gillet VJ, Holliday JD, Lynch MF (1989) J Chem Inf Comput Sci 29:187–206
Dury L, Latour T, Leherte L, Barberis F, Vercauteren DB (2001) J Chem Inf Comput Sci 41:1437–1445
Abolmaali SFB, Ostermann C, Zell A (2003) J Mol Model, in press
Bush BL, Sheridan RP (1993) J Chem Inf Comput Sci 33:756–762
Wegner JK, Zell A (2002) JOELib—a java based computational chemistry package. 16th Molecular Modeling Workshop, Darmstadt
JOELib (2002) http://sourceforge.net/projects/joelib
Böhm M, Klebe G (2002) J Med Chem 45:1585–1597
MDL Information Systems (2002) CTfile formats, http://www.mdli.com/downloads/literature/ctfile.pdf
Dalby A, Nourse JG, Hounshell WG, Gushurst AKI, Grier DL, Leland BA, Laufer J (1992) J Chem Inf Comput Sci 32:244–255
National Cancer Institute, Bethesda, Md., http://dtp.nci.nih.gov/webdata.html
Acknowledgments
This work was realized within the scope of the SOL project (Search and Optimization of Lead structures) which is supported by the German Federal Ministry of Education and Research, bmb+f under contract number 311681.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abolmaali, S.F.B., Wegner, J.K. & Zell, A. The compressed feature matrix—a fast method for feature based substructure search. J Mol Model 9, 235–241 (2003). https://doi.org/10.1007/s00894-003-0126-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00894-003-0126-0