Alzheimer's disease is still a field of research with lots of open questions. The complexity of the disease prevents the early diagnosis before visible symptoms regarding the individual's cognitive ...capabilities occur. This research presents an in-depth analysis of a huge data set encompassing medical, cognitive and lifestyle's measurements from more than 12,000 individuals. Several hypothesis were established whose validity has been questioned considering the obtained results. The importance of appropriate experimental design is highly stressed in the research. Thus, a sequence of methods for handling missing data, redundancy, data imbalance, and correlation analysis have been applied for appropriate preprocessing of the data set, and consequently XGBoost model has been trained and evaluated with special attention to the hyperparameters tuning. The model was explained by using the Shapley values produced by the SHAP method. XGBoost produced a f1-score of 0.84 and as such is considered to be highly competitive among those published in the literature. This achievement, however, was not the main contribution of this paper. This research's goal was to perform global and local interpretability of the intelligent model and derive valuable conclusions over the established hypothesis. Those methods led to a single scheme which presents either positive, or, negative influence of the values of each of the features whose importance has been confirmed by means of Shapley values. This scheme might be considered as additional source of knowledge for the physicians and other experts whose concern is the exact diagnosis of early stage of Alzheimer's disease. The conclusions derived from the intelligent model's data-driven interpretability confronted all the established hypotheses. This research clearly showed the importance of explainable Machine learning approach that opens the black box and clearly unveils the relationships among the features and the diagnoses.
Evidence-based dietary information represented as unstructured text is a crucial information that needs to be accessed in order to help dietitians follow the new knowledge arrives daily with newly ...published scientific reports. Different named-entity recognition (NER) methods have been introduced previously to extract useful information from the biomedical literature. They are focused on, for example extracting gene mentions, proteins mentions, relationships between genes and proteins, chemical concepts and relationships between drugs and diseases. In this paper, we present a novel NER method, called drNER, for knowledge extraction of evidence-based dietary information. To the best of our knowledge this is the first attempt at extracting dietary concepts. DrNER is a rule-based NER that consists of two phases. The first one involves the detection and determination of the entities mention, and the second one involves the selection and extraction of the entities. We evaluate the method by using text corpora from heterogeneous sources, including text from several scientifically validated web sites and text from scientific publications. Evaluation of the method showed that drNER gives good results and can be used for knowledge extraction of evidence-based dietary recommendations.
The present study tested the combination of an established and a validated food-choice research method (the 'fake food buffet') with a new food-matching technology to automate the data collection and ...analysis.
The methodology combines fake-food image recognition using deep learning and food matching and standardization based on natural language processing. The former is specific because it uses a single deep learning network to perform both the segmentation and the classification at the pixel level of the image. To assess its performance, measures based on the standard pixel accuracy and Intersection over Union were applied. Food matching firstly describes each of the recognized food items in the image and then matches the food items with their compositional data, considering both their food names and their descriptors.
The final accuracy of the deep learning model trained on fake-food images acquired by 124 study participants and providing fifty-five food classes was 92·18 %, while the food matching was performed with a classification accuracy of 93 %.
The present findings are a step towards automating dietary assessment and food-choice research. The methodology outperforms other approaches in pixel accuracy, and since it is the first automatic solution for recognizing the images of fake foods, the results could be used as a baseline for possible future studies. As the approach enables a semi-automatic description of recognized food items (e.g. with respect to FoodEx2), these can be linked to any food composition database that applies the same classification and description system.
As great amounts of food-related information is presented in the form of heterogeneous textual data, computer-based methods are useful to automatically extract such information. One way to do this is ...to utilize Named-Entity Recognition (NER) methods that are broadly used in computer science for information extraction. Despite the existence of numerous and well-versed NER methods in the biomedical domain, the domain of food science still remains scarcely resourced. In this paper, we provide an overview and a comparison of named-entity recognition methods in the food domain, which can be used for automated extraction of food information from text. Four methods are discussed: FoodIE, NCBO (SNOMED CT), NCBO (OntoFood), and NCBO (FoodON). We compare them using a benchmark data set that consists of 1000 manually annotated recipes initially obtained from Allrecipes, which is the largest social network focused on food. After analysing the results from the evaluation, it is evident that FoodIE obtains very promising results compared to the other food named-entity recognition methods taken into consideration.
Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities ...such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources.
In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction.
We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags.
All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%.
FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags.
In optimization, algorithm selection, which is the selection of the most suitable algorithm for a specific problem, is of great importance, as algorithm performance is heavily dependent on the ...problem being solved. However, when using machine learning for algorithm selection, the performance of the algorithm selection model depends on the data used to train and test the model, and existing optimization benchmarks only provide a limited amount of data. To help with this problem, artificial problem generation has been shown to be a useful tool for augmenting existing benchmark problems. In this paper, we are interested in the problem of knowledge transfer between the artificially generated and existing handmade benchmark problems in the domain of continuous numerical optimization. That is, can an algorithm selection model trained purely on artificially generated problems correctly provide algorithm recommendations for existing handmade problems. We show that such a model produces low-quality results, and we also provide explanations about how the algorithm selection model works and show the differences between the problem data sets in order to explain the model’s performance.
Knowledge about the interactions between dietary and biomedical factors is scattered throughout uncountable research articles in an unstructured form (e.g., text, images, etc.) and requires automatic ...structuring so that it can be provided to medical professionals in a suitable format. Various biomedical knowledge graphs exist, however, they require further extension with relations between food and biomedical entities. In this study, we evaluate the performance of three state-of-the-art relation-mining pipelines (FooDis, FoodChem and ChemDis) which extract relations between food, chemical and disease entities from textual data. We perform two case studies, where relations were automatically extracted by the pipelines and validated by domain experts. The results show that the pipelines can extract relations with an average precision around 70%, making new discoveries available to domain experts with reduced human effort, since the domain experts should only evaluate the results, instead of finding, and reading all new scientific papers.
When making statistical analysis of single-objective optimization algorithms’ performance, researchers usually estimate it according to the obtained optimization results in the form of ...minimal/maximal values. Though this is a good indicator about the performance of the algorithm, it does not provide any information about the reasons why it happens. One possibility to get additional information about the performance of the algorithms is to study their exploration and exploitation abilities. In this paper, we present an easy-to-use step by step pipeline that can be used for performing exploration and exploitation analysis of single-objective optimization algorithms. The pipeline is based on a web-service-based e-Learning tool called DSCTool, which can be used for making statistical analysis not only with regard to the obtained solution values but also with regard to the distribution of the solutions in the search space. Its usage does not require any special statistic knowledge from the user. The gained knowledge from such analysis can be used to better understand algorithm’s performance when compared to other algorithms or while performing hyperparameter tuning.
•ISO-FOOD ontology – a new way of representing isotopic data for food research.•Defines metadata needed for isotopic characterization.•Describes a powerful technique for organizing and sharing stable ...isotope data across Food Science.
To link and harmonize different knowledge repositories with respect to isotopic data, we propose an ISO-FOOD ontology as a domain ontology for describing isotopic data within Food Science. The ISO-FOOD ontology consists of metadata and provenance data that needs to be stored together with data elements in order to describe isotopic measurements with all necessary information required for future analysis. The new domain has been linked with existing ontologies, such as Units of Measurements Ontology, Food, Nutrient and the Bibliographic Ontology. To show how such an ontology can be used in practise, it was populated with 20 isotopic measurements of Slovenian food samples. Describing data in this way offers a powerful technique for organizing and sharing stable isotope data across Food Science.
The European Food Safety Authority has developed a standardized food classification and description system called FoodEx2. It uses facets to describe food properties and aspects from various ...perspectives, making it easier to compare food consumption data from different sources and perform more detailed data analyses. However, both food composition data and food consumption data, which need to be linked, are lacking in FoodEx2 because the process of classification and description has to be manually performed-a process that is laborious and requires good knowledge of the system and also good knowledge of food (composition, processing, marketing, etc.). In this paper, we introduce a semi-automatic system for classifying and describing foods according to FoodEx2, which consists of three parts. The first involves a machine learning approach and classifies foods into four FoodEx2 categories, with two for single foods: raw (r) and derivatives (d), and two for composite foods: simple (s) and aggregated (c). The second uses a natural language processing approach and probability theory to describe foods. The third combines the result from the first and the second part by defining post-processing rules in order to improve the result for the classification part. We tested the system using a set of food items (from Slovenia) manually-coded according to FoodEx2. The new semi-automatic system obtained an accuracy of 89% for the classification part and 79% for the description part, or an overall result of 79% for the whole system.