The compressed feature matrix—a fast method for feature based substructure search

Abolmaali, S. F. Badreddin; Wegner, Jörg K.; Zell, Andreas

doi:10.1007/s00894-003-0126-0

The compressed feature matrix—a fast method for feature based substructure search

Original Paper
Published: 26 April 2003

Volume 9, pages 235–241, (2003)
Cite this article

Journal of Molecular Modeling Aims and scope Submit manuscript

S. F. Badreddin Abolmaali¹,
Jörg K. Wegner¹ &
Andreas Zell¹

1248 Accesses
8 Citations
Explore all metrics

Abstract

The compressed feature matrix (CFM) is a feature based molecular descriptor for the fast processing of pharmacochemical applications such as adaptive similarity search, pharmacophore development and substructure search. Depending on the particular purpose, the descriptor may be generated upon either topological or Euclidean molecular data. To assure a variable utilizability, the assignment of the structural patterns to feature types is arbitrarily determined by the user. This step is based on a graph algorithm for substructure search, which resembles the common substructure descriptors. While these merely allow a screening for the predefined patterns, the CFM permits a real substructure/subgraph search, presuming that all desired elements of the query substructure are described by the selected feature set. In this work, the CFM based substructure search is evaluated with regard to both the different outputs resulting from varying feature sets and the search speed. As a benchmark we use the programmable atom typer (PATTY) graph algorithm. When comparing the two methods, the CFM based matrix algorithm is up to several hundred times faster than PATTY and when using the CFM as a basis for substructure screening, the search speed is accelerated by three orders of magnitude. Thus, the CFM based substructure search complies with the requirements for interactive usage, even for the evaluation of several hundred thousand compounds. The concept of the CFM is implemented in the software COFEA.

Figure CFM based substructure search using the compounds dopamine and benzene-1,2-diol

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Maximum common property: a new approach for molecular similarity

Article Open access 09 October 2020

Aurelio Antelo-Collado, Ramón Carrasco-Velar, … Gonzalo Cerruela-García

Scalable Similarity Search for Molecular Descriptors

A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening

Article Open access 10 May 2014

Francois Berenger, Arnout Voet, … Kam YJ Zhang

Abbreviations

CFM:: compressed feature matrix
MCS:: maximum common substructure
HSCS:: highest scoring common substructure
SSSR:: smallest set of smallest rings
ESER:: essential set of essential rings
ESSR:: extended set of smallest rings
GSCE:: graph of smallest cycles at edges
PATTY:: programmable atom typer
HTS:: high throughput screening

References

Todeschini R, Consonni V (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, p 427
Ihlenfeld WD, Gasteiger J (1994) J Comput Chem 15:793–813
Google Scholar
Hurst T, Heritage TW (1997) HQSAR. A highly predictive QSAR technique based on molecular holograms. In: 213th ACS National Meeting, San Francisco, Calif.
Seel M, Turner DB, Willett P (1999) Quant Struct Act Relat 18:245–252
Article CAS Google Scholar
Carhart RE, Smith DH, Venkataraghavan R (1985) J Chem Inf Comput Sci 25:64–73
CAS Google Scholar
Scsibrany H, Varmuza K (1992) Topological similarity of molecules based on maximum common substructures. In: Ziessow D (ed) Software development in chemistry. Proceedings of the 7th CIC Workshop "Computers in Chemistry", Berlin
Ullmann JR (1976) J Assoc Comput Mach 23:31–42
Article Google Scholar
Daylight Chemical Information Systems (2002) Daylight theory manual, http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html
Rücker G, Rücker C (2001) J Chem Inf Comput Sci 33:1457–1462
Article Google Scholar
Figueras J (1996) J Chem Inf Comput Sci 36:986–991
Article CAS Google Scholar
Fujita S (1988) J Chem Inf Comput Sci 28:1–9
CAS Google Scholar
Downs GM, Gillet VJ, Holliday JD, Lynch MF (1989) J Chem Inf Comput Sci 29:187–206
CAS Google Scholar
Dury L, Latour T, Leherte L, Barberis F, Vercauteren DB (2001) J Chem Inf Comput Sci 41:1437–1445
Article CAS PubMed Google Scholar
Abolmaali SFB, Ostermann C, Zell A (2003) J Mol Model, in press
Bush BL, Sheridan RP (1993) J Chem Inf Comput Sci 33:756–762
CAS Google Scholar
Wegner JK, Zell A (2002) JOELib—a java based computational chemistry package. 16th Molecular Modeling Workshop, Darmstadt
JOELib (2002) http://sourceforge.net/projects/joelib
Böhm M, Klebe G (2002) J Med Chem 45:1585–1597
Article Google Scholar
MDL Information Systems (2002) CTfile formats, http://www.mdli.com/downloads/literature/ctfile.pdf
Dalby A, Nourse JG, Hounshell WG, Gushurst AKI, Grier DL, Leland BA, Laufer J (1992) J Chem Inf Comput Sci 32:244–255
CAS Google Scholar
National Cancer Institute, Bethesda, Md., http://dtp.nci.nih.gov/webdata.html

Download references

Acknowledgments

This work was realized within the scope of the SOL project (Search and Optimization of Lead structures) which is supported by the German Federal Ministry of Education and Research, bmb+f under contract number 311681.

Author information

Authors and Affiliations

Department of Computer Science, University of Tuebingen, Sand 1, 72076, Tübingen, Germany
S. F. Badreddin Abolmaali, Jörg K. Wegner & Andreas Zell

Authors

S. F. Badreddin Abolmaali
View author publications
You can also search for this author in PubMed Google Scholar
Jörg K. Wegner
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Zell
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. F. Badreddin Abolmaali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abolmaali, S.F.B., Wegner, J.K. & Zell, A. The compressed feature matrix—a fast method for feature based substructure search. J Mol Model 9, 235–241 (2003). https://doi.org/10.1007/s00894-003-0126-0

Download citation

Received: 29 October 2002
Accepted: 03 February 2003
Published: 26 April 2003
Issue Date: August 2003
DOI: https://doi.org/10.1007/s00894-003-0126-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The compressed feature matrix—a fast method for feature based substructure search

Abstract

Access this article

Similar content being viewed by others

Maximum common property: a new approach for molecular similarity

Scalable Similarity Search for Molecular Descriptors

A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The compressed feature matrix—a fast method for feature based substructure search

Abstract

Access this article

Similar content being viewed by others

Maximum common property: a new approach for molecular similarity

Scalable Similarity Search for Molecular Descriptors

A rotation-translation invariant molecular descriptor of partial charges and its use in ligand-based virtual screening

Abbreviations

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation