In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the ...so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.
Finite-sum problems appear as the sample average approximation of a stochastic optimization problem and often arise in machine learning applications with large scale data sets. A very popular ...approach to face finite-sum problems is the stochastic gradient method. It is well known that a proper strategy to select the hyperparameters of this method (i.e. the set of a-priori selected parameters) and, in particular, the learning rate, is needed to guarantee convergence properties and good practical performance. In this paper, we analyse standard and line search based updating rules to fix the learning rate sequence, also in relation to the size of the mini batch chosen to compute the current stochastic gradient. An extensive numerical experimentation is carried out in order to evaluate the effectiveness of the discussed strategies for convex and non-convex finite-sum test problems, highlighting that the line search based methods avoid expensive initial setting of the hyperparameters. The line search based approaches have also been applied to train a Convolutional Neural Network, providing very promising results.
In order to solve constrained optimization problems on convex sets, the class of scaled gradient projection methods is often exploited in combination with non-monotone Armijo–like line search ...strategies. These techniques are adopted for efficiently selecting the steplength parameter and can be realized by means of two different approaches: either the one along the arc or the one along the feasible directions. In this paper we deeply analyze the convergence properties of the scaled gradient projection methods equipped with the non-monotone version of both these Armijo–like line searches. To the best of our knowledge, not all the convergence results proved for either the non-scaled or the monotone gradient projection algorithm have been also stated for the non-monotone and scaled counterpart. The goal of this paper is to fill this gap of knowledge by detailing which hypotheses are needed to guarantee both the stationarity of the limit points and the convergence of the sequence generated by the non-monotone scaled gradient projection schemes. Moreover, in the case of polyhedral constraint set, we discuss the identification of the active set at the solution for the sequence generated by the along the arc approach. Several numerical experiments on quadratic and non-quadratic optimization problems have been carried out in order to compare the behaviour of the considered scaled gradient projection methods.
It is well known that biomedical imaging analysis plays a crucial role in the healthcare sector and produces a huge quantity of data. These data can be exploited to study diseases and their evolution ...in a deeper way or to predict their onsets. In particular, image classification represents one of the main problems in the biomedical imaging context. Due to the data complexity, biomedical image classification can be carried out by trainable mathematical models, such as artificial neural networks. When employing a neural network, one of the main challenges is to determine the optimal duration of the training phase to achieve the best performance. This paper introduces a new adaptive early stopping technique to set the optimal training time based on dynamic selection strategies to fix the learning rate and the mini-batch size of the stochastic gradient method exploited as the optimizer. The numerical experiments, carried out on different artificial neural networks for image classification, show that the developed adaptive early stopping procedure leads to the same literature performance while finalizing the training in fewer epochs. The numerical examples have been performed on the CIFAR100 dataset and on two distinct MedMNIST2D datasets which are the large-scale lightweight benchmark for biomedical image classification.
Signal transducer and activator of transcription 3 (STAT3) plays an essential role in cell growth regulation and survival. An aberrant STAT3 activation and/or expression is implied in various solid ...and blood tumors as well as in other pathologies like rheumatoid arthritis and pulmonary fibrosis, thus making the search for STAT3 inhibitors a growing field of study. With the aim of identifying new inhibitors of STAT3 dimerization, we screened a database including more than 1 320 000 commercially available compounds using a receptor-based pharmacophore model comprising the key protein-protein interactions identified in the STAT3 dimer and refining the search through docking and molecular dynamic simulations studies. STAT3 binding assays revealed a significant STAT3 inhibitory activity and selectivity versus Grb2 for one of the four top-scored compounds, thus verifying the reliability of the virtual screening workflow. Moreover, such compound could already be considered as a lead for the development of new and more potent STAT3 dimerization inhibitors.
Due to the continued success of machine learning and deep learning in particular, supervised classification problems are ubiquitous in numerous scientific fields. Training these models typically ...involves the minimization of the empirical risk over large data sets along with a possibly non-differentiable regularization. In this paper, we introduce a stochastic gradient method for the considered classification problem. To control the variance of the objective's gradients, we use an automatic sample size selection along with a variable metric to precondition the stochastic gradient directions. Further, we utilize a non-monotone line search to automatize step size selection. Convergence results are provided for both convex and non-convex objective functions. Extensive numerical experiments verify that the suggested approach performs on par with state-of-the-art methods for training both statistical models for binary classification and artificial neural networks for multi-class image classification. The code is publicly available at https://github.com/koblererich/lisavm.
•Supervised classification problems are ubiquitous in several scientific fields.•Proximal stochastic gradient algorithms are the gold standard to solve classification problems.•Variable metric strategies help to control the variance of the stochastic gradients.•Non-monotone line search procedures allow to automatically adjust the learning rate.
Mycobacterium tuberculosis (Mtb), the main aetiological agent of tuberculosis (TB) in humans, is estimated to cause nearly two million deaths every year. Despite their huge therapeutic value, ...existing antitubercular drugs have several shortcomings, such as for instance the insurgence of drug resistance, which is mostly triggered by lack of compliance during the lengthy treatment. Novel and more effective drugs against Mtb acting on new molecular targets are therefore in demand in order to reduce treatment time and address the severe issue related to the progressive loss of antibiotic efficacy. Mtb encodes for two low molecular weight tyrosine specific phosphatases (MPtpA and MPtpB) that are crucially involved in Mtb pathogenesis. While MPtpA interferes with phagosome acidification blocking its maturation, MPtpB disrupts host signal transduction cascades, causing immune response subversion in the host. The important role played by both MPtpA and MPtpB in host-pathogen interaction makes them appealing targets for TB drug discovery. Here, we provide an exhaustive review of the current knowledge on MPtpA and MPtpB characterization and role in TB pathogenesis. In particular, special emphasis is placed on all class of inhibitors that have been developed and studied to date; their binding mode, design strategies, biological activities, main pharmacophore features as well as the efforts to overcome the poor druggability of their target are summarized in detail.
In this paper we study a stochastic gradient algorithm which rules the increase of the mini-batch size in a predefined fashion and automatically adjusts the learning rate by means of a monotone or ...non-monotone line search procedure. The mini-batch size is incremented at a suitable a priori rate throughout the iterative process in order that the variance of the stochastic gradients is progressively reduced. The a priori rate is not subject to restrictive assumptions, allowing for the possibility of a slow increase in the mini-batch size. On the other hand, the learning rate can vary non-monotonically throughout the iterations, as long as it is appropriately bounded. Convergence results for the proposed method are provided for both convex and non-convex objective functions. Moreover it can be proved that the algorithm enjoys a global linear rate of convergence on strongly convex functions. The low per-iteration cost, the limited memory requirements and the robustness against the hyperparameters setting make the suggested approach well-suited for implementation within the deep learning framework, also for GPGPU-equipped architectures. Numerical results on training deep neural networks for multi-class image classification show a promising behaviour of the proposed scheme with respect to similar state of the art competitors.
We discuss a number of novel steplength selection schemes for proximal-based convex optimization algorithms. In particular, we consider the problem where the Lipschitz constant of the gradient of the ...smooth part of the objective function is unknown. We generalize two optimization algorithms of Khobotov type and prove convergence. We also take into account possible inaccurate computation of the proximal operator of the non-smooth part of the objective function. Secondly, we show convergence of an iterative algorithm with Armijo-type steplength rule, and discuss its use with an approximate computation of the proximal operator. Numerical experiments show the efficiency of the methods in comparison to some existing schemes.