Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in ...medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better define the unknown chemical space, we have enumerated 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens forming the chemical universe database GDB-17, covering a size range containing many drugs and typical for lead compounds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known molecules in PubChem, GDB-17 molecules are much richer in nonaromatic heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types.
GDB-13 enumerates small organic molecules containing up to 13 atoms of C, N, O, S, and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is ...the largest publicly available small organic molecule database to date.
The degree and the origins of quantitative variability of most human plasma proteins are largely unknown. Because the twin study design provides a natural opportunity to estimate the relative ...contribution of heritability and environment to different traits in human population, we applied here the highly accurate and reproducible SWATH mass spectrometry technique to quantify 1,904 peptides defining 342 unique plasma proteins in 232 plasma samples collected longitudinally from pairs of monozygotic and dizygotic twins at intervals of 2–7 years, and proportioned the observed total quantitative variability to its root causes, genes, and environmental and longitudinal factors. The data indicate that different proteins show vastly different patterns of abundance variability among humans and that genetic control and longitudinal variation affect protein levels and biological processes to different degrees. The data further strongly suggest that the plasma concentrations of clinical biomarkers need to be calibrated against genetic and temporal factors. Moreover, we identified 13 cis‐SNPs significantly influencing the level of specific plasma proteins. These results therefore have immediate implications for the effective design of blood‐based biomarker studies.
Synopsis
The degree and origins of the abundance variability of 342 human plasma proteins are analyzed by a longitudinal twin design and SWATH mass spectrometry. The results suggest genetic control and longitudinal variation affect protein levels and biological processes to different degrees.
We used the highly accurate and reproducible SWATH mass spectrometry technique to quantify 342 unique plasma proteins in 232 plasma samples collected longitudinally from pairs of monozygotic and dizygotic twins at intervals of 2–7 years.
The observed total quantitative variability of human plasma proteome is dissected to its root causes, genes, environment and longitudinal factors.
The roles of the heritable, environmental and longitudinal determinants in controlling plasma protein levels are different for different proteins and functional clusters, strongly suggesting that the plasma concentrations of clinical biomarkers need to be calibrated against genetic and temporal factors.
We further identified 13 cis‐SNPs significantly influencing the level of specific plasma proteins as protein quantitative trait loci (pQTLs), and five of them are associated with gene expression QTLs (eQTLs) in human tissues.
The degree and origins of the abundance variability of 342 human plasma proteins are analyzed by a longitudinal twin design and SWATH mass spectrometry. The results suggest genetic control and longitudinal variation affect protein levels and biological processes to different degrees.
The chemical universe database GDB-17 contains 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens obeying rules for chemical stability, synthetic feasibility, and medicinal ...chemistry. GDB-17 was analyzed using 42 integer value descriptors of molecular structure which we term “Molecular Quantum Numbers” (MQN). Principal component analysis and representation of the (PC1, PC2)-plane provided a graphical overview of the GDB-17 chemical space. Rapid ligand-based virtual screening (LBVS) of GDB-17 using the city-block distance CBDMQN as a similarity search measure was enabled by a hashed MQN-fingerprint. LBVS of the entire GDB-17 and of selected subsets identified shape similar, scaffold hopping analogs (ROCS > 1.6 and T SF < 0.5) of 15 drugs. Over 97% of these analogs occurred within CBDMQN ≤ 12 from each drug, a constraint which might help focus advanced virtual screening. An MQN-searchable 50 million subset of GDB-17 is publicly available at www.gdb.unibe.ch.
Clinical specimens are each inherently unique, limited and nonrenewable. Small samples such as tissue biopsies are often completely consumed after a limited number of analyses. Here we present a ...method that enables fast and reproducible conversion of a small amount of tissue (approximating the quantity obtained by a biopsy) into a single, permanent digital file representing the mass spectrometry (MS)-measurable proteome of the sample. The method combines pressure cycling technology (PCT) and sequential window acquisition of all theoretical fragment ion spectra (SWATH)-MS. The resulting proteome maps can be analyzed, re-analyzed, compared and mined in silico to detect and quantify specific proteins across multiple samples. We used this method to process and convert 18 biopsy samples from nine patients with renal cell carcinoma into SWATH-MS fragment ion maps. From these proteome maps we detected and quantified more than 2,000 proteins with a high degree of reproducibility across all samples. The measured proteins clearly distinguished tumorous kidney tissues from healthy tissues and differentiated distinct histomorphological kidney cancer subtypes.
Ozonation of secondary wastewater effluents can reduce the discharge of micropollutants by transforming their chemical structures. Therefore, a better understanding of the formation of transformation ...products during ozonation is important. In this study, a computer-based prediction platform for the kinetics and mechanisms of the reactions of ozone with organic compounds was developed to enable in silico predictions of transformation products. With the developed prediction platform, reaction kinetics expressed as second-order rate constants for the reactions of ozone with selected organic compounds (k
, M
s
) can be predicted based on an adapted k
prediction model from a previous study (Lee et al., Environ. Sci. Technol., 2015, 49, 9925-9935) (average model error of about a factor of 6 for 14 compound classes with 284 model compounds). Ozone reaction mechanisms reported in the literature have been reviewed and, using chemoinformatics tools, encoded into about 340 individual reaction rules that can be generally applied to predict the transformation products of micropollutants. Predictions for k
and/or transformation products were overall consistent with the experimental data for three micropollutants used as validation compounds (e.g., carbamazepine, tramadol, and triclosan). However, limitations of the current k
prediction platform were also identified: ambiguous assignment of the n-th highest occupied molecular orbital energy (E
) to the reactive sites, potential errors associated with the use of a gas-phase geometry, and a poor k
prediction for certain compounds (cetirizine). Therefore, the current prediction tool should not be considered as a substitute for experimental studies and experimental data are still required in the future to obtain a more robust prediction model. Nonetheless, the developed prediction platform, made available as a stand-alone graphical user interface (GUI) application, will provide useful information about aqueous ozone chemistry to various groups of end-users such as environmental chemists, engineers, or toxicologists.
The chemical space is the ensemble of all possible molecules, which is believed to contain at least 10
60
organic molecules below 500 Da of possible interest for drug discovery. This review ...summarizes the development of the chemical space concept from enumerating acyclic hydrocarbons in the 1800's to the recent assembly of the chemical universe database GDB. Chemical space travel algorithms can be used to explore defined regions of chemical space by generating focused virtual libraries. Maps of the chemical space are produced from property spaces visualized by principal component analysis or by self-organizing maps, and from structural analyses such as the scaffold-tree or the MQN-system. Virtual screening of virtual chemical space followed by synthesis and testing of the best hits leads to the discovery of new drug molecules.
Methods to enumerate, visualise and virtually screen the ensemble of all possible organic molecules, illustrated by the MQN-map of the chemical universe database GDB-11, offer promising opportunities for drug discovery.
Classifying organic molecules using counts for simple structural features, such as atom, bond and ring types, called molecular quantum numbers (MQNs), defines a universal chemical space for analyzing ...large molecular databases such as ZINC and GDB. The organization of MQN space is revealed by principal component analysis (PCA), as shown for the GDB‐11 database (26.4 million structures, up to 11 atoms of C, N, O, F).
A Searchable Map of PubChem Deursen, Ruud van; Blum, Lorenz C; Reymond, Jean-Louis
Journal of chemical information and modeling,
11/2010, Volume:
50, Issue:
11
Journal Article
Peer reviewed
The database PubChem was classified using 42 integer value descriptors of molecular structure, here called molecular quantum numbers (MQNs), which count atoms and bond types, polar groups, and ...topological features. Principal component analysis of the MQN data set shows that PubChem compounds occupy a partially filled elliptical cone in the (PC1,PC2,PC3) space whose axis is the first principal component PC1 (65% variability) representing molecular size, and the ellipse axes are PC2 (18% variability, representing structural flexibility) and PC3 (7% variability, representing polarity). A visual overview of PubChem is provided by color-coded representations of the (PC2,PC3) plane. The MQNs form a scalar fingerprint which can be used to measure the similarity between pairs of molecules and enable ligand-based virtual screening, as illustrated for the enrichment of bioactives from the DUD data set from PubChem. An MQN-annotated version of PubChem with an MQN-similarity search tool is available at www.gdb.unibe.ch.
The chemical universe database GDB-13, which enumerates 977 million organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules, ...represents a vast reservoir for new fragments. GDB-13 was classified using the MQN-system discussed in the preceding paper for the analysis of PubChem fragments. Two hundred and fifty-five subsets of GDB-13 were generated by the combinatorial use of eight restrictive criteria, including fragment-like (“rule of three”) and scaffold-like (no acyclic carbon atoms) filters. Virtual screening for analogs of 15 commercial drugs of 13 non-hydrogen atoms or less shows that retrieving MQN-neighbors of a query molecule from GDB-13 or its subsets provides on average a 38-fold enrichment in structural analogs (Daylight-type substructure fingerprint Tanimoto
T
SF
> 0.7), and a 75-fold enrichment in shape-similar analogs (ROCS TanimotoCombo score > 1.4). An MQN-searchable version of GDB-13 is provided at
www.gdb.unibe.ch
.
Graphical Abstract