The coordinated transcriptomic responses of both mutualistic ectomycorrhizal (ECM) fungi and their hosts during the establishment of symbiosis are not well-understood. This study characterizes the ...transcriptomic alterations of the ECM fungus Laccaria bicolor during different colonization stages on two hosts (Populus trichocarpa and Pseudotsuga menziesii) and compares this to the transcriptomic variations of P. trichocarpa across the same time-points. A large number of L. bicolor genes (≥ 8,000) were significantly regulated at the transcriptional level in at least one stage of colonization. From our data, we identify 1,249 genes that we hypothesize is the 'core' gene regulon necessary for the mutualistic interaction between L. bicolor and its host plants. We further identify a group of 1,210 genes that are regulated in a host-specific manner. This variable regulon encodes a number of genes coding for proteases and xenobiotic efflux transporters that we hypothesize act to counter chemical-based defenses simultaneously activated at the transcriptomic level in P. trichocarpa. The transcriptional response of the host plant P. trichocarpa consisted of differential waves of gene regulation related to signaling perception and transduction, defense response, and the induction of nutrient transfer in P. trichocarpa tissues. This study, therefore, gives fresh insight into the shifting transcriptomic landscape in both the colonizing fungus and its host and the different strategies employed by both partners in orchestrating a mutualistic interaction.
In recent years, health data collected during the clinical care process have been often repurposed for secondary use through clinical data warehouses (CDWs), which interconnect disparate data from ...different sources. A large amount of information of high clinical value is stored in unstructured text format. Natural language processing (NLP), which implements algorithms that can operate on massive unstructured textual data, has the potential to structure the data and make clinical information more accessible.
The aim of this review was to provide an overview of studies applying NLP to textual data from CDWs. It focuses on identifying the (1) NLP tasks applied to data from CDWs and (2) NLP methods used to tackle these tasks.
This review was performed according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched for relevant articles in 3 bibliographic databases: PubMed, Google Scholar, and ACL Anthology. We reviewed the titles and abstracts and included articles according to the following inclusion criteria: (1) focus on NLP applied to textual data from CDWs, (2) articles published between 1995 and 2021, and (3) written in English.
We identified 1353 articles, of which 194 (14.34%) met the inclusion criteria. Among all identified NLP tasks in the included papers, information extraction from clinical text (112/194, 57.7%) and the identification of patients (51/194, 26.3%) were the most frequent tasks. To address the various tasks, symbolic methods were the most common NLP methods (124/232, 53.4%), showing that some tasks can be partially achieved with classical NLP techniques, such as regular expressions or pattern matching that exploit specialized lexica, such as drug lists and terminologies. Machine learning (70/232, 30.2%) and deep learning (38/232, 16.4%) have been increasingly used in recent years, including the most recent approaches based on transformers. NLP methods were mostly applied to English language data (153/194, 78.9%).
CDWs are central to the secondary use of clinical texts for research purposes. Although the use of NLP on data from CDWs is growing, there remain challenges in this field, especially with regard to languages other than English. Clinical NLP is an effective strategy for accessing, extracting, and transforming data from CDWs. Information retrieved with NLP can assist in clinical research and have an impact on clinical practice.
The slow adoption of innovations is a key challenge that the European sheep sector faces for its sustainability. The future of the sector lies on the adoption of best practices, modern technologies ...and innovations that can improve its resilience and mitigate its dependence on public support. In this study, the concept of technical efficiency was used to reveal the most efficient sheep meat farms and to identify the best practices and farm innovations that could potentially be adopted by other farms of similar production systems. Data Envelopment Analysis was applied to farm accounting data from 458 sheep meat farms of intensive, semi-intensive and extensive systems from France, Spain and the UK, and the structural and economic characteristics of the most efficient farms were analyzed. These best farmers were indicated through a survey, which was conducted within the Innovation for Sustainable Sheep and Goat Production in the Europe (iSAGE) Horizon 2020 project, the management and production practices and innovations that improve their economic performance and make them better than their peers.
Full text
Available for:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, UILJ, UKNU, UL, UM, UPUK
The Canadian Institutes for Health Research (CIHR) launched the "International Collaborative Research Strategy for Alzheimer's Disease" as a signature initiative, focusing on Alzheimer's Disease (AD) ...and related neurodegenerative disorders (NDDs). The Canadian Consortium for Neurodegeneration and Aging (CCNA) was subsequently established to coordinate and strengthen Canadian research on AD and NDDs. To facilitate this research, CCNA uses LORIS, a modular data management system that integrates acquisition, storage, curation, and dissemination across multiple modalities. Through an unprecedented national collaboration studying various groups of dementia-related diagnoses, CCNA aims to investigate and develop proactive treatment strategies to improve disease prognosis and quality of life of those affected. However, this constitutes a unique technical undertaking, as heterogeneous data collected from sites across Canada must be uniformly organized, stored, and processed in a consistent manner. Currently clinical, neuropsychological, imaging, genomic, and biospecimen data for 509 CCNA subjects have been uploaded to LORIS. In addition, data validation is handled through a number of quality control (QC) measures such as double data entry (DDE), conflict flagging and resolution, imaging protocol checks, and visual imaging quality validation. Site coordinators are also notified of incidental findings found in MRI reads or biosample analyses. Data is then disseminated to CCNA researchers via a web-based Data-Querying Tool (DQT). This paper will detail the wide array of capabilities handled by LORIS for CCNA, aiming to provide the necessary neuroinformatic infrastructure for this nation-wide investigation of healthy and diseased aging.
Background
Often missing from or uncertain in a biomedical data warehouse (BDW), vital status after discharge is central to the value of a BDW in medical research. The French National Mortality ...Database (FNMD) offers open-source nominative records of every death. Matching large-scale BDWs records with the FNMD combines multiple challenges: absence of unique common identifiers between the 2 databases, names changing over life, clerical errors, and the exponential growth of the number of comparisons to compute.
Objective
We aimed to develop a new algorithm for matching BDW records to the FNMD and evaluated its performance.
Methods
We developed a deterministic algorithm based on advanced data cleaning and knowledge of the naming system and the Damerau-Levenshtein distance (DLD). The algorithm’s performance was independently assessed using BDW data of 3 university hospitals: Lille, Nantes, and Rennes. Specificity was evaluated with living patients on January 1, 2016 (ie, patients with at least 1 hospital encounter before and after this date). Sensitivity was evaluated with patients recorded as deceased between January 1, 2001, and December 31, 2020. The DLD-based algorithm was compared to a direct matching algorithm with minimal data cleaning as a reference.
Results
All centers combined, sensitivity was 11% higher for the DLD-based algorithm (93.3%, 95% CI 92.8-93.9) than for the direct algorithm (82.7%, 95% CI 81.8-83.6; P<.001). Sensitivity was superior for men at 2 centers (Nantes: 87%, 95% CI 85.1-89 vs 83.6%, 95% CI 81.4-85.8; P=.006; Rennes: 98.6%, 95% CI 98.1-99.2 vs 96%, 95% CI 94.9-97.1; P<.001) and for patients born in France at all centers (Nantes: 85.8%, 95% CI 84.3-87.3 vs 74.9%, 95% CI 72.8-77.0; P<.001). The DLD-based algorithm revealed significant differences in sensitivity among centers (Nantes, 85.3% vs Lille and Rennes, 97.3%, P<.001). Specificity was >98% in all subgroups. Our algorithm matched tens of millions of death records from BDWs, with parallel computing capabilities and low RAM requirements. We used the Inseehop open-source R script for this measurement.
Conclusions
Overall, sensitivity/recall was 11% higher using the DLD-based algorithm than that using the direct algorithm. This shows the importance of advanced data cleaning and knowledge of a naming system through DLD use. Statistically significant differences in sensitivity between groups could be found and must be considered when performing an analysis to avoid differential biases. Our algorithm, originally conceived for linking a BDW with the FNMD, can be used to match any large-scale databases. While matching operations using names are considered sensitive computational operations, the Inseehop package released here is easy to run on premises, thereby facilitating compliance with cybersecurity local framework. The use of an advanced deterministic matching algorithm such as the DLD-based algorithm is an insightful example of combining open-source external data to improve the usage value of BDWs.
This article presents a method of extracting bilingual lexica composed of single-word terms (SWTs) and multi-word terms (MWTs) from comparable corpora of a technical domain. First, this method ...extracts MWTs in each language, and then uses statistical methods to align single words and MWTs by exploiting the term contexts. After explaining the difficulties involved in aligning MWTs and specifying our approach, we show the adopted process for bilingual terminology extraction and the resources used in our experiments. Finally, we evaluate our approach and demonstrate its significance, particularly in relation to non-compositional MWT alignment.
Full text
Available for:
FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NUK, OBVAL, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
The use of sub‐2‐μm particle columns for fast high throughput metabolite ID applications was investigated. Three LC‐MS methods based on different sub‐2‐μm particle size columns using the same ...analytical 3 min gradient were developed (Methods A, B, and C). Method A was comprised of a 1.8 μm particle column coupled to an MS, methods B and C utilized a 1.7 μm particle column (BEH 50×2.1 mm2 id) and 1.8 μm particle column coupled to a Q‐TOF MS. The precision and the separation efficiency of the methods was compared with repeated standard injections (N = 10) of reference compounds verapamil (VP), propranolol, and fluoxetine. Separation efficiency and MS/MS spectral quality were also evaluated for separation and detection of VP and its two major metabolites norverapamil (NVP) and O‐demethylverapamil (ODMVP) in human‐liver microsomal incubates. Results show that 1.8 μm particle columns show similar performance for separation of VP and its major metabolites and comparable spectral quality in MSE mode of the Q‐TOF instrument compared to 1.7 μm particle columns. Additionally, the study also confirmed that sub‐2‐μm particle size columns can be operated with standard analytical HPLC but that performance is maximized by integrating column in UPLC method with reduced void volumes. All the methods are suitable for the determination of major metabolites for compounds with high metabolic turnover. The high throughput metabolite profile analysis using 384‐well plate format of up to 48 compounds in incubates of human‐liver microsomes was discussed.
Full text
Available for:
BFBNIB, FZAB, GIS, IJS, KILJ, NLZOH, NUK, OILJ, SAZU, SBCE, SBMB, UL, UM, UPUK
Recent developments in computational terminology call for the design of multiple and complementary tools for the acquisition, the structuring and the exploitation of terminological data. This paper ...proposes to bridge the gap between term acquisition and thesaurus construction by offering a framework for automatic structuring of multi-word candidate terms with the help of corpus-based links between single-word terms. First, we present a system for corpus-based acquisition of terminological relationships through discursive patterns. This system is built on previous work on automatic extraction of hyponymy links through shallow parsing. Second, we show how hypernym links between single-word terms can be extended to semantic links between multi-word terms through corpus-based extraction of semantic variants. The induced hierarchy is incomplete but provides an automatic generalization of singleword terms relations to multi-word terms that are pervasive in technical thesauri and corpora.
Full text
Available for:
BFBNIB, NMLJ, NUK, PNG, UL, UM, UPUK
The adoption of the best practices is crucial for the survival of the dairy sheep farms that operate under extensive and/or semi-extensive systems. In this study, an efficiency analysis was ...implemented to reveal the best observed practices applied by the more efficient dairy sheep farms. Data Envelopment Analysis was used on data from 60 dairy sheep farms that rear Manech or Basco-bearnaise, and Lacaune breeds under semi-extensive systems in France. The main characteristics of the most efficient farms are presented and a comparative economic analysis is applied between the fully efficient and less efficient farms, highlighting the optimal farm structure and determining the major cost drivers in sheep farming. The most efficient farmers provided information within the iSAGE Horizon 2020 project regarding the management practices that enhance their sustainability. The results show that there is room for improvement in semi-extensive dairy sheep farming. The most efficient farms rear smaller flocks than the less efficient farms and achieve higher milk yields. Fixed capital, labor, and feeding constitute the main cost drivers. Results show that farms should exploit economies of scale in the use of labor and infrastructure to reduce their cost per product, as well as their uptake practices and innovations, related mainly to modern breeding and reproduction methods, efficient feeding practices and digital technologies.
The main work in bilingual lexicon extraction from comparable corpora is based on the implicit hypothesis that corpora are balanced in terms of size. However, the historical context-based projection ...method is relatively insensitive to the size of each part of the comparable corpus. Within this context, we have carried out a study on the influence of unbalanced specialized comparable corpora and on the quality of bilingual terminology extraction by doing different experiments. Moreover, we have introduced a strategy into the context-based projection method to re-estimate word co-occurrence observations. This is done by using smoothing or prediction techniques that boost the observations of word co-occurrences which are mainly useful for the smallest part of an unbalanced comparable corpus. Our results show that the use of unbalanced specialized comparable corpora results in a significant improvement in the quality of extracted lexicons.