Academic Journals Database
Disseminating quality controlled scientific knowledge

A tree-based method for the rapid screening of chemical fingerprints

ADD TO MY LIST
 
Author(s): Kristensen Thomas | Nielsen Jesper | Pedersen Christian

Journal: Algorithms for Molecular Biology
ISSN 1748-7188

Volume: 5;
Issue: 1;
Start page: 9;
Date: 2010;
Original page

ABSTRACT
Abstract Background The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase of drug development for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. Results In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the kD grid and the Multibit tree. The kD grid is based on splitting the fingerprints into k shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large real-world data set. Our experiments show that our method yields approximately a three-fold speed-up over previous methods. Conclusions Using the novel kD grid and Multibit tree significantly reduce the time needed for searching databases of fingerprints. This will allow researchers to (1) perform more searches than previously possible and (2) to easily search large databases.
Why do you need a reservation system?      Affiliate Program