A variety of machine learning methods such as naive Bayesian, support vector machines and more recently deep neural networks are demonstrating their utility for drug discovery and development. These ...leverage the generally bigger datasets created from high-throughput screening data and allow prediction of bioactivities for targets and molecular properties with increased levels of accuracy. We have only just begun to exploit the potential of these techniques but they may already be fundamentally changing the research process for identifying new molecules and/or repurposing old drugs. The integrated application of such machine learning models for end-to-end (E2E) application is broadly relevant and has considerable implications for developing future therapies and their targeting.
Over the past decade we have witnessed the increasing sophistication of machine learning algorithms applied in daily use from internet searches, voice recognition, social network software to machine ...vision software in cameras, phones, robots and self-driving cars. Pharmaceutical research has also seen its fair share of machine learning developments. For example, applying such methods to mine the growing datasets that are created in drug discovery not only enables us to learn from the past but to predict a molecule’s properties and behavior in future. The latest machine learning algorithm garnering significant attention is deep learning, which is an artificial neural network with multiple hidden layers. Publications over the last 3 years suggest that this algorithm may have advantages over previous machine learning methods and offer a slight but discernable edge in predictive performance. The time has come for a balanced review of this technique but also to apply machine learning methods such as deep learning across a wider array of endpoints relevant to pharmaceutical research for which the datasets are growing such as physicochemical property prediction, formulation prediction, absorption, distribution, metabolism, excretion and toxicity (ADME/Tox), target prediction and skin permeation, etc. We also show that there are many potential applications of deep learning beyond cheminformatics. It will be important to perform prospective testing (which has been carried out rarely to date) in order to convince skeptics that there will be benefits from investing in this technique.
Organic cation transporter (OCT) 2 mediates the entry step for organic cation secretion by renal proximal tubule cells and is a site of unwanted drug-drug interactions (DDIs). But reliance on ...decision tree-based predictions of DDIs at OCT2 that depend on IC
values can be suspect because they can be influenced by choice of transported substrate; for example, IC
values for the inhibition of metformin versus MPP transport can vary by 5- to 10-fold. However, it is not clear whether the substrate dependence of a ligand interaction is common among OCT2 substrates. To address this question, we screened the inhibitory effectiveness of 20
M concentrations of several hundred compounds against OCT2-mediated uptake of six structurally distinct substrates: MPP, metformin,
,
,
-trimethyl-2-methyl(7-nitrobenzoc1,2,5oxadiazol-4-yl)aminoethanaminium (NBD-MTMA), TEA, cimetidine, and 4-4-dimethylaminostyryl-
-methylpyridinium (ASP). Of these, MPP transport was least sensitive to inhibition. IC
values for 20 structurally diverse compounds confirmed this profile, with IC
values for MPP averaging 6-fold larger than those for the other substrates. Bayesian machine-learning models of ligand-induced inhibition displayed generally good statistics after cross-validation and external testing. Applying our ASP model to a previously published large-scale screening study for inhibition of OCT2-mediated ASP transport resulted in comparable statistics, with approximately 75% of "active" inhibitors predicted correctly. The differential sensitivity of MPP transport to inhibition suggests that multiple ligands can interact simultaneously with OCT2 and supports the recommendation that MPP not be used as a test substrate for OCT2 screening. Instead, metformin appears to be a comparatively representative OCT2 substrate for both in vitro and in vivo (clinical) use.
The growing quantity of public and private data sets focused on small molecules screened against biological targets or whole organisms provides a wealth of drug discovery relevant data. This is ...matched by the availability of machine learning algorithms such as Support Vector Machines (SVM) and Deep Neural Networks (DNN) that are computationally expensive to perform on very large data sets with thousands of molecular descriptors. Quantum computer (QC) algorithms have been proposed to offer an approach to accelerate quantum machine learning over classical computer (CC) algorithms, however with significant limitations. In the case of cheminformatics, which is widely used in drug discovery, one of the challenges to overcome is the need for compression of large numbers of molecular descriptors for use on a QC. Here, we show how to achieve compression with data sets using hundreds of molecules (SARS-CoV-2) to hundreds of thousands of molecules (whole cell screening data sets for plague and M. tuberculosis) with SVM and the data reuploading classifier (a DNN equivalent algorithm) on a QC benchmarked against CC and hybrid approaches. This study illustrates the steps needed in order to be “quantum computer ready” in order to apply quantum computing to drug discovery and to provide the foundation on which to build this field.
Drug-induced liver injury (DILI) is one the most unpredictable adverse reactions to xenobiotics in humans and the leading cause of postmarketing withdrawals of approved drugs. To date, these drugs ...have been collated by the FDA to form the DILIRank database, which classifies DILI severity and potential. These classifications have been used by various research groups in generating computational predictions for this type of liver injury. Recently, groups from Pfizer and AstraZeneca have collated DILI in vitro data and physicochemical properties for compounds that can be used along with data from the FDA to build machine learning models for DILI. In this study, we have used these data sets, as well as the Biopharmaceutics Drug Disposition Classification System data set, to generate Bayesian machine learning models with our in-house software, Assay Central. The performance of all machine learning models was assessed through both the internal 5-fold cross-validation metrics and prediction accuracy of an external test set of compounds with known hepatotoxicity. The best-performing Bayesian model was based on the DILI-concern category from the DILIRank database with an ROC of 0.814, a sensitivity of 0.741, a specificity of 0.755, and an accuracy of 0.746. A comparison of alternative machine learning algorithms, such as k-nearest neighbors, support vector classification, AdaBoosted decision trees, and deep learning methods, produced similar statistics to those generated with the Bayesian algorithm in Assay Central. This study demonstrates machine learning models grouped in a tool called MegaTox that can be used to predict early-stage clinical compounds, as well as recent FDA-approved drugs, to identify potential DILI.
One approach to speed up drug discovery is to examine new uses for existing approved drugs, so-called ‘drug repositioning’ or ‘drug repurposing’, which has become increasingly popular in recent ...years. Analysis of the literature reveals many examples of US Food and Drug Administration-approved drugs that are active against multiple targets (also termed promiscuity) that can also be used to therapeutic advantage for repositioning for other neglected and rare diseases. Using proof-of-principle examples, we suggest here that with current
in silico technologies and databases of the structures and biological activities of chemical compounds (drugs) and related data, as well as close integration with
in vitro screening data, improved opportunities for drug repurposing will emerge for neglected or rare/orphan diseases.
Anyone involved in designing or finding molecules in the life sciences over the past few years has witnessed a dramatic change in how we now work due to the COVID-19 pandemic. Computational ...technologies like artificial intelligence (AI) seemed to become ubiquitous in 2020 and have been increasingly applied as scientists worked from home and were separated from the laboratory and their colleagues. This shift may be more permanent as the future of molecule design across different industries will increasingly require machine learning models for design and optimization of molecules as they become “designed by AI”. AI and machine learning has essentially become a commodity within the pharmaceutical industry. This perspective will briefly describe our personal opinions of how machine learning has evolved and is being applied to model different molecule properties that crosses industries in their utility and ultimately suggests the potential for tight integration of AI into equipment and automated experimental pipelines. It will also describe how many groups have implemented generative models covering different architectures, for de novo design of molecules. We also highlight some of the companies at the forefront of using AI to demonstrate how machine learning has impacted and influenced our work. Finally, we will peer into the future and suggest some of the areas that represent the most interesting technologies that may shape the future of molecule design, highlighting how we can help increase the efficiency of the design-make-test cycle which is currently a major focus across industries.
► Poor quality structure-based data can impact modeling and interlinking of resources. ► We critique the approaches taken to assemble data into chemical compound databases. ► Approaches to deliver ...definitive reference data sources for chemists are discussed.
In recent years there has been a dramatic increase in the number of freely accessible online databases serving the chemistry community. The internet provides chemistry data that can be used for data-mining, for computer models, and integration into systems to aid drug discovery. There is however a responsibility to ensure that the data are high quality to ensure that time is not wasted in erroneous searches, that models are underpinned by accurate data and that improved discoverability of online resources is not marred by incorrect data. In this article we provide an overview of some of the experiences of the authors using online chemical compound databases, critique the approaches taken to assemble data and we suggest approaches to deliver definitive reference data sources.
Chemistry databases are widely available on the internet which is potentially of high value to researchers, however the quality of the content is variable and errors proliferate and we suggest there should be efforts to improve the situation and provide a chemistry database as a gold standard.
...it is also in the best interests of conference organizers to provide free Wi-Fi so that international attendees do not have to use their expensive data plans and because the phone signal in many ...conference venues is generally weak. Common Twitter Abbreviations # = hashtag @ = nametag, a way to reply to someone .@ = broadcast a tweet that begins with a nametag RT = retweet, share something already tweeted HT = hat tip, acknowledge or thank a source DM = direct message CX = correction Tweetup = physical meeting of tweeters Additional abbreviations can be found elsewhere: http://socialmediatoday.com/emoderation/512987/top-twitter-abbreviations-you-need-know http://www.ogawadesign.com/services/twitter-for-your-biz/twitter-abbreviations-and-twitter-acronymns.html http://www.webopedia.com/quick_ref/Twitter_Dictionary_Guide.asp Acronyms for common conferences can be found here: http://www.abbreviations.com/acronyms/CONF At the other extreme, which unfortunately is representative of most scientific conferences we have attended, there are few if any active live tweeters.
The hepatic bile acid uptake transporter sodium taurocholate cotransporting polypeptide (NTCP) is less well characterized than its ileal paralog, the apical sodium dependent bile acid transporter ...(ASBT), in terms of drug inhibition requirements. The objectives of this study were (a) to identify FDA approved drugs that inhibit human NTCP, (b) to develop pharmacophore and Bayesian computational models for NTCP inhibition, and (c) to compare NTCP and ASBT transport inhibition requirements. A series of NTCP inhibition studies were performed using FDA approved drugs, in concert with iterative computational model development. Screening studies identified 27 drugs as novel NTCP inhibitors, including irbesartan (K i = 11.9 μM) and ezetimibe (K i = 25.0 μM). The common feature pharmacophore indicated that two hydrophobes and one hydrogen bond acceptor were important for inhibition of NTCP. From 72 drugs screened in vitro, a total of 31 drugs inhibited NTCP, while 51 drugs (i.e., more than half) inhibited ASBT. Hence, while there was inhibitor overlap, ASBT unexpectedly was more permissive to drug inhibition than was NTCP, and this may be related to NTCP possessing fewer pharmacophore features. Findings reflected that a combination of computational and in vitro approaches enriched the understanding of these poorly characterized transporters and yielded additional chemical probes for possible drug–transporter interaction determinations.