We provide an introduction to Gaussian process regression (GPR) machine-learning methods in computational materials science and chemistry. The focus of the present review is on the regression of ...atomistic properties: in particular, on the construction of interatomic potentials, or force fields, in the Gaussian Approximation Potential (GAP) framework; beyond this, we also discuss the fitting of arbitrary scalar, vectorial, and tensorial quantities. Methodological aspects of reference data generation, representation, and regression, as well as the question of how a data-driven model may be validated, are reviewed and critically discussed. A survey of applications to a variety of research questions in chemistry and materials science illustrates the rapid growth in the field. A vision is outlined for the development of the methodology in the years to come.
Peripheral artery disease is an atherosclerotic disorder which, when present, portends poor patient outcomes. Low diagnosis rates perpetuate poor management, leading to limb loss and excess rates of ...cardiovascular morbidity and death. Machine learning algorithms and artificially intelligent systems have shown great promise in application to many areas in health care, such as accurately detecting disease, predicting patient outcomes, and automating image interpretation. Although the application of these technologies to peripheral artery disease are in their infancy, their promises are tremendous. In this review, we provide an introduction to important concepts in the fields of machine learning and artificial intelligence, detail the current state of how these technologies have been applied to peripheral artery disease, and discuss potential areas for future care enhancement with advanced analytics.
At the moment, there are a considerable number of different automated machine learning frameworks. They are often use predefined pipelines and choose the best one among them. However, searching for ...optimal pipelines can be improved by using methods that generate pipelines step by step. The paper introduces an approach to generate ensemble pipelines using policy-based reinforcement learning. Approach consists of pipeline, environment, state, action and reward representations. This approach was successfully integrated into automatic machine learning framework. The generated pipelines were tested by comparing a baseline model using OpenML datasets, and the proposed approach demonstrated high efficiency, even surpassing the metrics for some datasets. This research has the potential to enhance the existing pipeline generation methods.
The review covers automatic segmentation of images by means of deep learning approaches in the area of medical imaging. Current developments in machine learning, particularly related to deep ...learning, are proving instrumental in identification, and quantification of patterns in the medical images. The pivotal point of these advancements is the essential capability of the deep learning approaches to obtain hierarchical feature representations directly from the images, which in turn is eliminating the need for handcrafted features. Deep learning is expeditiously turning into the state-of-the-art for medical image processing and has resulted in performance improvements in diverse clinical applications. In this review, the basics of deep learning methods are discussed along with an overview of successful implementations involving image segmentation for different medical applications. Finally, some research issues are highlighted and the future need for further improvements is pointed out.
A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop ...and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). Methods made use of modern machine learning tools and were based on unsupervised learning (autoencoders, generative adversarial networks, normalizing flows), weakly supervised learning, and semi-supervised learning. This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders.
Feature selection (FS) is an important data processing technique in the field of machine learning. There have been various FS methods, but all assume that the cost associated with a feature is ...precise, which restricts their real applications. Focusing on the FS problem with fuzzy cost, a fuzzy multiobjective FS method with particle swarm optimization, called PSOMOFS, is studied in this article. The proposed method develops a fuzzy dominance relationship to compare the goodness of candidate particles and defines a fuzzy crowding distance measure to prune the elitist archive and determine the global leader of particles. Also, a tolerance coefficient is introduced into the proposed method to ensure that the Pareto-optimal solutions obtained satisfy decision makers' preferences. The developed method is used to tackle a series of the UCI datasets and is compared with three fuzzy multiobjective evolutionary methods and three typical multiobjective FS methods. Experimental results show that the proposed method can achieve feature sets with superior performances in approximation, diversity, and feature cost.
Abstract
Sorting coffee bean nowadays is still done manually, although there is already a support machine for separation through size, but to determine the quality of the seeds remain manual using ...human power. This coffee bean sorting is in the spotlight to research whether it can be more effective if there is a tool that can directly find out the quality of coffee. This system will make it easier for workers in the field. In addition to saving time, costs will decrease and also the work of workers will be reduced. This paper present the implementation of machine learning method to classify the coffee bean quality. The dataset use 90 coffee bean for three classes and 30 for each class. From the experimental result, the highest accuracy obtain 83%.
Graph machine learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships ...between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarize work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest GML will become a modelling framework of choice within biomedical machine learning.
The subject of this paper is the technology (the 'how') of constructing machine-learning interatomic potentials, rather than science (the 'what' and 'why') of atomistic simulations using ...machine-learning potentials. Namely, we illustrate how to construct moment tensor potentials using active learning as implemented in the MLIP package, focusing on the efficient ways to automatically sample configurations for the training set, how expanding the training set changes the error of predictions, how to set up ab initio calculations in a cost-effective manner, etc. The MLIP package (short for Machine-Learning Interatomic Potentials) is available at https://mlip.skoltech.ru/download/.