T H E   N I H    C A T A L Y S T     M A Y  –  J U N E   2004

NCBI To Launch PubChem

BRINGING THE SECRETS OF SMALL MOLECULES OUT TO THE PUBLIC

 

by James Swyers

Chris Austin
Steve Bryant

Coming this fall, courtesy of the NIH Roadmap and NCBI is PubChem, a new database that seeks to do for small molecules what GenBank has done for nucleic acid sequences.

"PubChem will significantly improve researchers’ abilities to explore and discover the biological properties of small molecules," says Stephen Bryant, a senior investigator in NCBI’s Computational Biology Branch and PubChem team leader.

PubChem is an initiative within the Molecular Libraries and Imaging component of the Roadmap.

At the bench end of the bench-to-bedside panorama of the Roadmap, the Molecular Libraries component provides funding and infrastructure for small-molecule screening and probe generation, an informatics platform for archiving and utilizing small-molecule data in the public sector, and technology development to expand the diversity and robustness of chemical libraries, assays, and detection technologies.

The cheminformatics aspect of the initiative calls for the creation of "a database of chemical structures, properties, and activities" to be established at NCBI, namely, PubChem.

Screening Centers

According to Bryant, much of the data that will be archived in PubChem will be generated by these NIH-funded small-molecule screening centers. The screening centers—one to be based at NHGRI and several others at academic institutions around the country—will be analyzing thousands of chemically diverse small molecules via high-throughput screening processes to identify those compounds that are biologically active against a range of molecules, cells, or genes.

Chris Austin, a senior advisor on translational research at NHGRI and director of the NHGRI intramural screening center, which is also due to begin operations in the fall of 2004, says that PubChem will make available to the general research public small-molecule compounds and information that traditionally has been proprietary within the private sector.

"People in the pharmaceutical industry have had access to this kind of information for some time. This is the first time that [such] comprehensive information on the chemical structures and biological activities of thousands of small molecules will be freely available to the public sector. . . . [It’s] a tremendous step forward," Austin says.

Cross-linking

Bryant notes that PubChem will be cross-linked to NCBI’s other databases, such as PubMed, in ways that can further enable research.

"Chemical structures in PubChem will automatically be neighbored, or compared to one another, and this will allow users to make new connections between articles in the literature, such as those concerning biological activity, toxicology, and animal or clinical studies. These articles might refer to the same or chemically similar compounds, but since compounds have many names, the connection can only be made by linking through chemical structure and structural similarity. We expect that these new cross-links will make PubChem an extremely powerful research tool," he says.

PubChem will initially contain "legacy" data, such as that from NCI’s Developmental Therapeutics Program (DTP), a decades-old program that plans, conducts, and facilitates development of therapeutic agents for cancer and AIDS. DTP maintains a repository of synthetic compounds and fully characterized pure natural products that have been evaluated as potential anticancer and anti-HIV agents. It has an inventory of more than 140,000 nondiscrete compounds that have been submitted to DTP from a variety of sources worldwide.

Small Molecules as Chemical Probes

Another initiative of the Molecular Libraries component is the creation of a repository to collect and house the small molecules that will be analyzed by the NIH-funded screening centers.

This "Small Molecule Repository" will provide the centers with large sets of chemical compounds to be screened and will provide the biomedical research community with access to small-molecule probes generated by the screening centers.

Now being created, the repository has a mandate to acquire, maintain, and distribute a collection of approximately 1 million chemically diverse small molecules with known and unknown activities. Over time, this collection will be expanded and modified to include compounds that are capable of interacting with an increasing number and diversity of biomolecular target domains.

Bryant and Austin expect that the chemical probes generated will be used mainly as research tools for the study of genetic and cellular pathways in health and disease. But these tools should also give researchers developing new drugs a leg up, they note, and in selected cases may even be used directly as starting points for diagnostic tests or drug development, particularly for rare and orphan diseases.

"PubChem will be a huge cross-referencing resource for hundreds of thousands of small, biologically active molecules. It cannot help but speed up the drug development process," Bryant says.

 

 


Return to Table of Contents