This review covers a novel approach to comparing methods, based on the sum of ranking differences (SRD). Many method-comparison studies suffer from ambiguity or from comparisons not being quite fair. ...This problem can be avoided if there are differences between ideal and actual rankings. The absolute values of differences for the ideal and actual ranking are summed up and the procedure is repeated for each (actual) method. The SRD values obtained such a way order the methods simply. If the ideal ranking is not known, it can be replaced by the average (maximum or minimum of all methods or by a known sequence).
SRD corresponds to the principle of parsimony and provides an easy tool to evaluate the methods: the smaller the sum the better the method. Models and other items can be similarly ranked.
Validation can be carried out using simulated random numbers for comparison: an empirical histogram (bootstrap-like) shows whether the SRD values are far from random.
Two case studies (clustering of HPLC columns and prediction of retention data) illustrate and validate the applicability of this novel approach to comparing methods.
The technique is entirely general; it can be used in different fields (e.g., for stationary-phase (column) selection in chromatography, model and descriptor selection, comparing analytical and chemometric techniques, determination of panel consistency, etc.). The only prerequisite is that the data can be arranged in matrix form without empty cells.
Since the pioneering works of Kaliszan (R. Kaliszan, Quantitative Structure–Chromatographic Retention Relationships, Wiley, New York, 1987; and R. Kaliszan, Structure and Retention in Chromatography. ...A Chemometric Approach, Harwood Academic, Amsterdam, 1997) no comprehensive summary is available in the field. Present review covers the period of 1996–August 2006. The sources are grouped according to the special properties of kinds of chromatography: Quantitative structure–retention relationship in gas chromatography, in planar chromatography, in column liquid chromatography, in micellar liquid chromatography, affinity chromatography and quantitative structure enantioselective retention relationships. General tendencies, misleading practice and conclusions, validation of the models, suggestions for future works are summarized for each sub-field. Some straightforward applications are emphasized but standard ones. The sources and the model compounds, descriptors, predicted retention data, modeling methods and indicators of their performance, validation of models, and stationary phases are collected in the tables. Some important conclusions are: Not all physicochemical descriptors correlate with the retention data strongly; the heat of formation is not related to the chromatographic retention. It is not appropriate to give the errors of Kovats indices in percentages. The apparently low values (1–3%) can disorient the reviewers and readers. Contemporary mean interlaboratory reproducibility of Kovats indices are about 5–10
i.u. for standard non polar phases and 10–25
i.u. for standard polar phases. The predictive performance of QSRR models deteriorates as the polarity of GC stationary phase increases. The correlation coefficient alone is not a particularly good indicator for the model performance. Residuals are more useful than plots of measured and calculated values. There is no need to give the retention data in a form of an equation if the numbers of compounds are small. The domain of model applicability of models should be given in all cases.
Applied datasets can vary from a few hundred to thousands of samples in typical quantitative structure-activity/property (QSAR/QSPR) relationships and classification. However, the size of the ...datasets and the train/test split ratios can greatly affect the outcome of the models, and thus the classification performance itself. We compared several combinations of dataset sizes and split ratios with five different machine learning algorithms to find the differences or similarities and to select the best parameter settings in nonbinary (multiclass) classification. It is also known that the models are ranked differently according to the performance merit(s) used. Here, 25 performance parameters were calculated for each model, then factorial ANOVA was applied to compare the results. The results clearly show the differences not just between the applied machine learning algorithms but also between the dataset sizes and to a lesser extent the train/test split ratios. The XGBoost algorithm could outperform the others, even in multiclass modeling. The performance parameters reacted differently to the change of the sample set size; some of them were much more sensitive to this factor than the others. Moreover, significant differences could be detected between train/test split ratios as well, exerting a great effect on the test validation of our models.
Background
Cheminformaticians are equipped with a very rich toolbox when carrying out molecular similarity calculations. A large number of molecular representations exist, and there are several ...methods (similarity and distance metrics) to quantify the similarity of molecular representations. In this work, eight well-known similarity/distance metrics are compared on a large dataset of molecular fingerprints with sum of ranking differences (SRD) and ANOVA analysis. The effects of molecular size, selection methods and data pretreatment methods on the outcome of the comparison are also assessed.
Results
A supplier database (
https://mcule.com/
) was used as the source of compounds for the similarity calculations in this study. A large number of datasets, each consisting of one hundred compounds, were compiled, molecular fingerprints were generated and similarity values between a randomly chosen reference compound and the rest were calculated for each dataset. Similarity metrics were compared based on their ranking of the compounds within one experiment (one dataset) using sum of ranking differences (SRD), while the results of the entire set of experiments were summarized on box and whisker plots. Finally, the effects of various factors (data pretreatment, molecule size, selection method) were evaluated with analysis of variance (ANOVA).
Conclusions
This study complements previous efforts to examine and rank various metrics for molecular similarity calculations. Here, however, an entirely general approach was taken to neglect any
a priori
knowledge on the compounds involved, as well as any bias introduced by examining only one or a few specific scenarios. The Tanimoto index, Dice index, Cosine coefficient and Soergel distance were identified to be the best (and in some sense equivalent) metrics for similarity calculations,
i.e
. these metrics could produce the rankings closest to the composite (average) ranking of the eight metrics. The similarity metrics derived from Euclidean and Manhattan distances are not recommended on their own, although their variability and diversity from other similarity metrics might be advantageous in certain cases (
e.g.
for data fusion). Conclusions are also drawn regarding the effects of molecule size, selection method and data pretreatment on the ranking behavior of the studied metrics.
Graphical Abstract
A visual summary of the comparison of similarity metrics with sum of ranking differences (SRD).
Significant progress has been achieved since the introduction of the new similarity measure: the sum of absolute ranking differences (SRDs) TrAC — Trends in Anal. Chem. 29 (2010) 101–109. Empirical ...evidences were accumulated about scaling, selection of the reference (benchmark) vector, cross-validation and grouping of variables (features, models, methods, etc.). The theory has been developed including the repeated observations (ties):(i)The exact theoretical distribution (null distribution) for 4<number of objects<9 has been calculated for SRD treatment of random numbers (it's a kind of validation, a permutation test). All possible reference vectors with ties imply different distribution.(ii)For number of objects above eight (n>8) an approximation has been developed using the Gaussian distribution fitted on the SRD distribution given by generating of three million n-dimensional random vectors.
The validity and features of the SRD methodology with ties are illustrated using two case studies: evaluation of a sensory panel and ranking of financial indicators.
•The procedure “sum of ranking differences” (SRD) was extended considering ties as well.•Theoretical distribution of SRD for random numbers is derived or approximated.•Scaling, selection of benchmark for ranking and cross-validation are discussed.•Sensory panel- and bank indicator illustrates the SRD ranking with ties.
Background
Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. ...As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied.
Results
The performances were primarily compared based on AUC values, but we have also used the original similarity data for the comparison of similarity metrics with several statistical tests and the novel, robust sum of ranking differences (SRD) algorithm. With SRD, we can evaluate the consistency (or concordance) of the various similarity metrics to an ideal reference metric, which is provided by data fusion from the existing metrics. Different aspects of IFP configurations and similarity metrics were examined based on SRD values with analysis of variance (ANOVA) tests.
Conclusion
A general approach is provided that can be applied for the reliable interpretation and usage of similarity measures with interaction fingerprints. Metrics that are viable alternatives to the commonly used Tanimoto coefficient were identified based on a comparison with an ideal reference metric (consensus). A careful selection of the applied bits (interaction definitions) and IFP filtering rules can improve the results of virtual screening (in terms of their agreement with the consensus metric). The open-source Python package FPKit was introduced for the similarity calculations and IFP filtering; it is available at:
https://github.com/davidbajusz/fpkit
.
Quantification of the similarity of objects is a key concept in many areas of computational science. This includes cheminformatics, where molecular similarity is usually quantified based on binary ...fingerprints. While there is a wide selection of available molecular representations and similarity metrics, there were no previous efforts to extend the computational framework of similarity calculations to the simultaneous comparison of more than two objects (molecules) at the same time. The present study bridges this gap, by introducing a straightforward computational framework for comparing multiple objects at the same time and providing extended formulas for as many similarity metrics as possible. In the binary case (
i.e.
when comparing two molecules pairwise) these are naturally reduced to their well-known formulas. We provide a detailed analysis on the effects of various parameters on the similarity values calculated by the extended formulas. The extended similarity indices are entirely general and do not depend on the fingerprints used. Two types of variance analysis (ANOVA) help to understand the main features of the indices: (i) ANOVA of mean similarity indices; (ii) ANOVA of sum of ranking differences (SRD). Practical aspects and applications of the extended similarity indices are detailed in the accompanying paper: Miranda-Quintana
et al.
J Cheminform.
2021
.
https://doi.org/10.1186/s13321-021-00504-4
. Python code for calculating the extended similarity metrics is freely available at:
https://github.com/ramirandaq/MultipleComparisons
.
Non-negative matrix factorization (NMF) efficiently reduces high dimensionality for many-objective ranking problems. In multi-objective optimization, as long as only three or four conflicting ...viewpoints are present, an optimal solution can be determined by finding the Pareto front. When the number of the objectives increases, the multi-objective problem evolves into a many-objective optimization task, where the Pareto front becomes oversaturated. The key idea is that NMF aggregates the objectives so that the Pareto front can be applied, while the Sum of Ranking Differences (SRD) method selects the objectives that have a detrimental effect on the aggregation, and validates the findings. The applicability of the method is illustrated by the ranking of 1176 universities based on 46 variables of the CWTS Leiden Ranking 2020 database. The performance of NMF is compared to principal component analysis (PCA) and sparse non-negative matrix factorization-based solutions. The results illustrate that PCA incorporates negatively correlated objectives into the same principal component. On the contrary, NMF only allows non-negative correlations, which enable the proper use of the Pareto front. With the combination of NMF and SRD, a non-biased ranking of the universities based on 46 criteria is established, where Harvard, Rockefeller and Stanford Universities are determined as the first three. To evaluate the ranking capabilities of the methods, measures based on Relative Entropy (RE) and Hypervolume (HV) are proposed. The results confirm that the sparse NMF method provides the most informative ranking. The results highlight that academic excellence can be improved by decreasing the proportion of unknown open-access publications and short distance collaborations. The proportion of gender indicators barely correlate with scientific impact. More authors, long-distance collaborations, publications that have more scientific impact and citations on average highly influence the university ranking in a positive direction.
Despite being a central concept in cheminformatics, molecular similarity has so far been limited to the simultaneous comparison of only two molecules at a time and using one index, generally the ...Tanimoto coefficent. In a recent contribution we have not only introduced a complete mathematical framework for extended similarity calculations, (i.e. comparisons of more than two molecules at a time) but defined a series of novel idices. Part 1 is a detailed analysis of the effects of various parameters on the similarity values calculated by the extended formulas. Their features were revealed by sum of ranking differences and ANOVA. Here, in addition to characterizing several important aspects of the newly introduced similarity metrics, we will highlight their applicability and utility in real-life scenarios using datasets with popular molecular fingerprints. Remarkably, for large datasets, the use of extended similarity measures provides an unprecedented speed-up over “traditional” pairwise similarity matrix calculations. We also provide illustrative examples of a more direct algorithm based on the extended Tanimoto similarity to select diverse compound sets, resulting in much higher levels of diversity than traditional approaches. We discuss the inner and outer consistency of our indices, which are key in practical applications, showing whether the
n
-ary and binary indices rank the data in the same way. We demonstrate the use of the new
n
-ary similarity metrics on
t
-distributed stochastic neighbor embedding (
t
-SNE) plots of datasets of varying diversity, or corresponding to ligands of different pharmaceutical targets, which show that our indices provide a better measure of set compactness than standard binary measures. We also present a conceptual example of the applicability of our indices in agglomerative hierarchical algorithms. The Python code for calculating the extended similarity metrics is freely available at:
https://github.com/ramirandaq/MultipleComparisons
Reversed-phase high-performance liquid chromatography (RP-HPLC) is the most popular chromatographic mode, accounting for more than 90% of all separations. HPLC itself owes its immense popularity to ...it being relatively simple and inexpensive, with the equipment being reliable and easy to operate. Due to extensive automation, it can be run virtually unattended with multiple samples at various separation conditions, even by relatively low-skilled personnel. Currently, there are >600 RP-HPLC columns available to end users for purchase, some of which exhibit very large differences in selectivity and production quality. Often, two similar RP-HPLC columns are not equally suitable for the requisite separation, and to date, there is no universal RP-HPLC column covering a variety of analytes. This forces analytical laboratories to keep a multitude of diverse columns. Therefore, column selection is a crucial segment of RP-HPLC method development, especially since sample complexity is constantly increasing. Rationally choosing an appropriate column is complicated. In addition to the differences in the primary intermolecular interactions with analytes of the dispersive (London) type, individual columns can also exhibit a unique character owing to specific polar, hydrogen bond, and electron pair donor–acceptor interactions. They can also vary depending on the type of packing, amount and type of residual silanols, “end-capping”, bonding density of ligands, and pore size, among others. Consequently, the chromatographic performance of RP-HPLC systems is often considerably altered depending on the selected column. Although a wide spectrum of knowledge is available on this important subject, there is still a lack of a comprehensive review for an objective comparison and/or selection of chromatographic columns. We aim for this review to be a comprehensive, authoritative, critical, and easily readable monograph of the most relevant publications regarding column selection and characterization in RP-HPLC covering the past four decades. Future perspectives, which involve the integration of state-of-the-art molecular simulations (molecular dynamics or Monte Carlo) with minimal experiments, aimed at nearly “experiment-free” column selection methodology, are proposed.