Randomized controlled trials (RCTs) play a major role in aiding biomedical research and practices. To inform this research, the demand for highly accurate retrieval of scientific articles on RCT ...research has grown in recent decades. However, correctly identifying all published RCTs in a given domain is a non-trivial task, which has motivated computer scientists to develop methods for identifying papers involving RCTs. Although existing studies have provided invaluable insights into how RCT tags can be predicted for biomedicine research articles, they used datasets from different sources in varying sizes and timeframes and their models and findings cannot be compared across studies. In addition, as datasets and code are rarely shared, researchers who conduct RCT classification have to write code from scratch, reinventing the wheel. In this paper, we present Bat4RCT, a suite of data and an integrated method to serve as a strong baseline for RCT classification, which includes the use of BERT-based models in comparison with conventional machine learning techniques. To validate our approach, all models are applied on 500,000 paper records in MEDLINE. The BERT-based models showed consistently higher recall scores than conventional machine learning and CNN models while producing slightly better or similar precision scores. The best performance was achieved by the BioBERT model when trained on both title and abstract texts, with the F1 score of 90.85%. This infrastructure of dataset and code will provide a competitive baseline for the evaluation and comparison of new methods and the convenience of future benchmarking. To our best knowledge, our study is the first work to apply BERT-based language modeling techniques to RCT classification tasks and to share dataset and code in order to promote reproducibility and improvement in text classification in biomedicine research.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Identifying causal sentences from nuclear incident reports is essential for advancing nuclear safety research and applications. Nonetheless, accurately locating and labeling causal sentences in text ...data is challenging, and might benefit from the usage of automated techniques. In this paper, we introduce LERCause, a labeled dataset combined with labeling methods meant to serve as a foundation for the classification of causal sentences in the domain of nuclear safety. We used three BERT models (BERT, BioBERT, and SciBERT) to 10,608 annotated sentences from the Licensee Event Report (LER) corpus for predicting sentence labels (Causal vs. non-Causal). We also used a keyword-based heuristic strategy, three standard machine learning methods (Logistic Regression, Gradient Boosting, and Support Vector Machine), and a deep learning approach (Convolutional Neural Network; CNN) for comparison. We found that the BERT-centric models outperformed all other tested models in terms of all evaluation metrics (accuracy, precision, recall, and F1 score). BioBERT resulted in the highest overall F1 score of 94.49% from the ten-fold cross-validation. Our dataset and coding framework can provide a robust baseline for assessing and comparing new causal sentences extraction techniques. As far as we know, our research breaks new ground by leveraging BERT-centric models for causal sentence classification in the nuclear safety domain and by openly distributing labeled data and code to enable reproducibility in subsequent research.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
•ANDez consolidates multiple ML techniques for disambiguation.•Built using Python and popular ML libraries.•Provides a unified platform to evaluate and refine ML methods.•Assists scholars with ...limited ML expertise in bibliographic data analysis.
Author name disambiguation in bibliographic data is challenging due to the same names of different authors and name variations of authors. Various machine learning (ML) methods address this, but a unified framework for comparing them is lacking. This study introduces ANDez, an open-source tool that integrates top-performing ML techniques for author name disambiguation. Developed in Python using popular ML libraries, ANDez provides a transparent system, merging complex procedures from different ML approaches. This promotes the assessment, modification, and benchmarking of ML techniques in author name disambiguation. ANDez's user-friendly design also helps researchers analyze ambiguous bibliographic data without needing advanced ML coding expertise.
This paper examines how metapragmatic framings of multilingual competency and incompetency have become indexes of global and South Korean citizenship in the South Korean popular media. Drawing upon ...depictions of multilingualism in South Korean television deurama ('drama'), comedy skits, and popular music, it examines how the locus of modernity and cosmopolitanism is moving away from the U.S.-oriented overseas Korean (gyopo) and towards the figure of the elite transnational returnee (saldaon saram). It argues that as the transnational circulation of people, media, and ideologies accelerates in the age of globalization, intra-ethnic discourses of linguistic mockery will also intensify. Adapted from the source document.
This article examines how discourses of linguistic (in)competency regiment productions of citizenship in the South Korean popular media. Through an analysis of newspaper articles and television ...programs, we investigate how depictions of language competency become key resources for locating individuals within genealogies of kinship and chronotopic figures of personhood. In some cases, the speech of these celebrities associates them with imaginings of their backwards, low-class Korean kin, the Japanese colonial period, and American military presence, while in other cases, their language is associated with the 21st-century ideal of the modern, elite, globetrotting neoliberal subject. This analysis demonstrates how competence is read in relation to changing notions of citizenship in the new ‘multicultural’ Korea as these men are differentially positioned between multiple raced, classed, and gendered imaginings of Whiteness and Koreanness. More generally, we argue that understandings of linguistic competence are social productions, rather than reflections of language ability.
In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial ...role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real‐world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine‐learning‐based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full‐length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full‐string format via record linkage for improved disambiguation performances.
Optical density (OD) measurement is the gold standard to estimate microbial cell density in aqueous systems. Recording microbial growth curves is essential to assess substrate utilization, gauge ...sensitivity to inhibitors or toxins, or determine the perfect sampling point. Manual sampling for cuvette-photometer-based measurements can cause disturbances and impact growth, especially for strictly anaerobic or thermophilic microbes. For slow growing microbes, manual sampling can cause data gaps that complicate analysis. Online OD measurement systems provide a solution, but are often expensive and ill-suited for applications such as monitoring microbial growth in custom or larger anaerobic vessels. Furthermore, growth measurements of thermophilic cultures are limited by the heat sensitivity of complex electronics. Here, we present two simple, low-cost, self-assembled photometers-a "TubeOD" for online measurement of anaerobic and thermophilic cultures in Hungate tubes and a "ClampOD" that can be attached to virtually any transparent growth vessel. Both OD-meters can be calibrated in minutes. We detail the manufacturing and calibration procedure and demonstrate continuous acquisition of high quality cell density data of a variety of microbes, including strict anaerobes, a thermophile, and gas-utilizing strains in various glassware. When calibrated and operated within their detection limits (ca. 0.3-90% of the photosensor voltage range), these self-build OD-meters can be used for continuous measurement of microbial growth in a variety of applications, thereby, simplifying and enhancing everyday lab operations.
Background: Conjunctival melanoma is a potentially lethal malignancy of the ocular surface. There have been no therapeutic advancements made in the past several decades despite increasing prevalence ...of the disease. Methods: The authors report the case of a 52-year-old Caucasian male with unresectable, recurrent conjunctival melanoma with V600 BRAF mutation who was treated with systemic BRAF/MEK inhibition. Results: There was complete regression of local disease within the first 9 months. The patient remains without local recurrence or systemic metastasis at 1 year. Conclusion: This is the first reported case of conjunctival melanoma with complete response to BRAF/MEK inhibition. As long as targeted therapy remains an option, patients with conjunctival melanoma should undergo mutational profiling of their tumor.