Abstract
Background
High-throughput screening (HTS) and virtual screening (VS) have been widely used to identify potential hits from large chemical libraries. However, the frequent occurrence of ...‘noisy compounds’ in the screened libraries, such as compounds with poor drug-likeness, poor selectivity or potential toxicity, has greatly weakened the enrichment capability of HTS and VS campaigns. Therefore, the development of comprehensive and credible tools to detect noisy compounds from chemical libraries is urgently needed in early stages of drug discovery.
Results
In this study, we developed a freely available integrated python library for negative design, called Scopy, which supports the functions of data preparation, calculation of descriptors, scaffolds and screening filters, and data visualization. The current version of Scopy can calculate 39 basic molecular properties, 3 comprehensive molecular evaluation scores, 2 types of molecular scaffolds, 6 types of substructure descriptors and 2 types of fingerprints. A number of important screening rules are also provided by Scopy, including 15 drug-likeness rules (13 drug-likeness rules and 2 building block rules), 8 frequent hitter rules (four assay interference substructure filters and four promiscuous compound substructure filters), and 11 toxicophore filters (five human-related toxicity substructure filters, three environment-related toxicity substructure filters and three comprehensive toxicity substructure filters). Moreover, this library supports four different visualization functions to help users to gain a better understanding of the screened data, including basic feature radar chart, feature-feature-related scatter diagram, functional group marker gram and cloud gram.
Conclusion
Scopy provides a comprehensive Python package to filter out compounds with undesirable properties or substructures, which will benefit the design of high-quality chemical libraries for drug design and discovery. It is freely available at https://github.com/kotori-y/Scopy.
Thiazoles exhibit a wide range of biological activities and therefore represent useful and attractive building blocks. To evaluate their usefulness and pinpoint their liabilities in fragment ...screening campaigns, we assembled a focused library of 49 fragment-sized thiazoles and thiadiazoles with various substituents, namely amines, bromides, carboxylic acids, and nitriles. The library was profiled in a cascade of biochemical inhibition assays, redox activity, thiol reactivity, and stability assays. Our study indicates that when thiazole derivatives are identified as screening hits, their reactivity should be carefully addressed and correlated with specific on-target engagement. Importantly, nonspecific inhibition should be excluded using experimental approaches and in silico predictions. To help with validation of hits identified in fragment screening campaigns, we can apply our high-throughput profiling workflow to focus on the most tractable compounds with a clear mechanism of action.
Compounds interfering with high-throughput screening (HTS) assay technologies (also known as “badly behaving compounds”, “bad actors”, “nuisance compounds” or “PAINS”) pose a major challenge to ...early-stage drug discovery. Many of these problematic compounds are “frequent hitters”, and we have recently published a set of machine learning models (“Hit Dexter 2.0”) for flagging such compounds.
Here we present a new generation of machine learning models which are derived from a large, manually curated and annotated data set. For the first time, these models cover, in addition to target-based assays, also cell-based assays. Our experiments show that cell-based assays behave indeed differently from target-based assays, with respect to hit rates and frequent hitters, and that dedicated models are required to produce meaningful predictions. In addition to these extensions and refinements, we explored a variety of additional setups for modeling, including the combination of four machine learning classifiers (i.e. k-nearest neighbors (KNN), extra trees, random forest and multilayer perceptron) with four sets of descriptors (Morgan2 fingerprints, Morgan3 fingerprints, MACCS keys and 2D physicochemical property descriptors).
Testing on holdout data as well as data sets of “dark chemical matter” (i.e. compounds that have been extensively tested in biological assays but have never shown activity) and known bad actors show that the multilayer perceptron classifiers in combination with Morgan2 fingerprints outperform other setups in most cases. The best multilayer perceptron classifiers obtained Matthews correlation coefficients of up to 0.648 on holdout data. These models are available via a free web service.
Display omitted
A significant challenge in high‐throughput screening (HTS) campaigns is the identification of assay technology interference compounds. A Compound Interfering with an Assay Technology (CIAT) gives ...false readouts in many assays. CIATs are often considered viable hits and investigated in follow‐up studies, thus impeding research and wasting resources. In this study, we developed a machine‐learning (ML) model to predict CIATs for three assay technologies. The model was trained on known CIATs and non‐CIATs (NCIATs) identified in artefact assays and described by their 2D structural descriptors. Usual methods identifying CIATs are based on statistical analysis of historical primary screening data and do not consider experimental assays identifying CIATs. Our results show successful prediction of CIATs for existing and novel compounds and provide a complementary and wider set of predicted CIATs compared to BSF, a published structure‐independent model, and to the PAINS substructural filters. Our analysis is an example of how well‐curated datasets can provide powerful predictive models despite their relatively small size.
Prediction of compounds that interfere with HTS technology through a machine‐learning model: In this work, compounds that interfere with a high‐throughput screening technology are identified in counter‐screen assays and their chemical structures are used to train a random‐forest model to predict the behavior of new compounds. The model performs well, with respective ROC AUC values of 0.70, 0.62, and 0.57 in AlphaScreen, FRET, and TR‐FRET technologies and outperforms another published statistical method.
Abstract
Background
Fluorescent detection methods are indispensable tools for chemical biology. However, the frequent appearance of potential fluorescent compound has greatly interfered with the ...recognition of compounds with genuine activity. Such fluorescence interference is especially difficult to identify as it is reproducible and possesses concentration-dependent characteristic. Therefore, the development of a credible screening tool to detect fluorescent compounds from chemical libraries is urgently needed in early stages of drug discovery.
Results
In this study, we developed a webserver ChemFLuo for fluorescent compound detection, based on two large and high-quality training datasets containing 4906 blue and 8632 green fluorescent compounds. These molecules were used to construct a group of prediction models based on the combination of three machine learning algorithms and seven types of molecular representations. The best blue fluorescence prediction model achieved with balanced accuracy (BA) = 0.858 and area under the receiver operating characteristic curve (AUC) = 0.931 for the validation set, and BA = 0.823 and AUC = 0.903 for the test set. The best green fluorescence prediction model achieved the prediction accuracy with BA = 0.810 and AUC = 0.887 for the validation set, and BA = 0.771 and AUC = 0.852 for the test set. Besides prediction model, 22 blue and 16 green representative fluorescent substructures were summarized for the screening of potential fluorescent compounds. The comparison with other fluorescence detection tools and theapplication to external validation sets and large molecule libraries have demonstrated the reliability of prediction model for fluorescent compound detection.
Conclusion
ChemFLuo is a public webserver to filter out compounds with undesirable fluorescent properties, which will benefit the design of high-quality chemical libraries for drug discovery. It is freely available at http://admet.scbdd.com/chemfluo/index/.
Graphical Abstract
Graphical Abstract
False‐positive assay readouts caused by badly behaving compounds—frequent hitters, pan‐assay interference compounds (PAINS), aggregators, and others—continue to pose a major challenge to experimental ...screening. There are only a few in silico methods that allow the prediction of such problematic compounds. We report the development of Hit Dexter, two extremely randomized trees classifiers for the prediction of compounds likely to trigger positive assay readouts either by true promiscuity or by assay interference. The models were trained on a well‐prepared dataset extracted from the PubChem Bioassay database, consisting of approximately 311 000 compounds tested for activity on at least 50 proteins. Hit Dexter reached MCC and AUC values of up to 0.67 and 0.96 on an independent test set, respectively. The models are expected to be of high value, in particular to medicinal chemists and biochemists who can use Hit Dexter to identify compounds for which extra caution should be exercised with positive assay readouts. Hit Dexter is available as a free web service at http://hitdexter.zbh. uni‐hamburg.de.
Hit Dexter: False‐positive assay signals triggered by badly behaving compounds continue to pose a major challenge to experimental screening. A free web service, called Hit Dexter, is able to identify such compounds with high accuracy, enabling chemists to make better‐informed decisions on their hit compounds.
AlphaScreen is one of the most widely used assay technologies in drug discovery due to its versatility, dynamic range and sensitivity. However, a presence of false positives and frequent hitters ...contributes to difficulties with an interpretation of measured HTS data. Although filters do exist to identify frequent hitters for AlphaScreen, they are frequently based on privileged scaffolds. The development of such filters is time consuming and requires deep domain knowledge. Recently, machine learning and artificial intelligence methods are emerging as important tools to advance drug discovery and chemoinformatics, including their application to identification of frequent hitters in screening assays. However, the relative performance and complementarity of the Machine Learning and scaffold‐based techniques has not yet been comprehensively compared. In this study, we analysed filters based on the privileged scaffolds with filters built using machine learning. Our results demonstrate that machine‐learning methods provide more accurate filters for identification of frequent hitters in AlphaScreen assays than scaffold‐based methods and can be easily redeveloped once new data are measured. We present highly accurate models to identify frequent hitters in AlphaScreen assays.
This perspective is an attempt to document the problems that medicinal chemists are facing in drug discovery. It is also trying to identify relevant/possible, research areas in which academics can ...have an impact and should thus be the subject of grant calls. Accordingly, it describes how hit discovery happens, how compounds to be screened are selected from available chemicals and the possible reasons for the recurrent paucity of useful/exploitable results reported. This is followed by the successful hit to lead stories leading to recent and original antibacterials which are, or about to be, used in human medicine. Then, illustrated considerations and suggestions are made on the possible inputs of academic medicinal chemists. This starts with the observation that discovering a “good” hit in the course of a screening campaign still rely on a lot of luck – which is within the reach of academics –, that the hit to lead process requires a lot of chemistry and that if public–private partnerships can be important throughout these stages, they are absolute requirements for clinical trials. Concerning suggestions to improve the current hit success rate, one academic input in organic chemistry would be to identify new and pertinent chemical space, design synthetic accesses to reach these and prepare the corresponding chemical libraries. Concerning hit to lead programs on a given target, if no new hits are available, previously reported leads along with new structural data can be pertinent starting points to design, prepare and assay original analogues. In conclusion, this text is an actual plea illustrating that, in many countries, academic research in medicinal chemistry should be more funded, especially in the therapeutic area neglected by the industry. At the least, such funds would provide the intensive to secure series of hopefully relevant chemical entities which appears to often lack when considering the results of academic as well as industrial screening campaigns.