The paper presents a deep learning approach to task duration prediction on a small dataset of emails. This task turns out to be a difficult NLP problem, since it concerns timeline understanding. We ...propose a solution of the problem by transfer learning based on BERT-like transformer models for a set of emails in Polish language, combined with emails translated to Polish from the Enron dataset. The influence of augmentation with the Polish analogue to WordNet, as well as with the Word2Vec on the model quality are investigated.
This review emphasizes the critical need for accurate integration of solar energy into power grids. It meticulously examines the advancements in transformer models for solar forecasting, representing ...a confluence of renewable energy research and cutting-edge machine learning. It evaluates the effectiveness of various transformer architectures, including single, hybrid, and specialized models, across different forecasting horizons, from short to medium term. This review unveils substantial improvements in forecasting accuracy and computational efficiency, highlighting the models' proficiency in handling complex and diverse solar data. A key contribution is the emphasis on the crucial role of hyperparameters in refining model performance, balancing precision against computational demands. Importantly, the research also identifies critical challenges, such as the significant computational resources required and the need for expansive, high-quality datasets, which limit the broader application of these models. In response, this review advocates for future research directions focused on standardizing model configurations, venturing into longer-term forecasting, and fostering innovations to enhance computational economy. These proposed pathways aim to surmount current challenges, steering the domain towards more accurate, adaptable, and sustainable solar forecasting solutions that can contribute to achieving global renewable energy and climate objectives. This review not only maps the present landscape of transformer models in solar energy forecasting but also charts a trajectory for future advancements. It serves as a pivotal guide for researchers and practitioners, delineating the current advancements and future directions in navigating the complexities of solar data interpretation and forecasting, thereby significantly contributing to the development of reliable and efficient renewable energy systems.
•Addressing the critical need for accurate forecasting of solar irradiance and photovoltaic generation.•Examining transformer advancements in solar forecasting, critical for integrating solar energy into power grids.•Analyzing model capabilities in diverse environments, highlighting their role in forecasting accuracy and deployment issues.•Identifying future research in model standardization and computational efficiency to improve solar forecasting.
Transformer-based NLP models are trained using hundreds of millions or even billions of parameters, limiting their applicability in computationally constrained environments. While the number of ...parameters generally correlates with performance, it is not clear whether the entire network is required for a downstream task. Motivated by the recent work on pruning and distilling pre-trained models, we explore strategies to drop layers in pre-trained models, and observe the effect of pruning on downstream GLUE tasks. We were able to prune BERT, RoBERTa and XLNet models up to 40%, while maintaining up to 98% of their original performance. Additionally we show that our pruned models are on par with those built using knowledge distillation, both in terms of size and performance. Our experiments yield interesting observations such as: (i) the lower layers are most critical to maintain downstream task performance, (ii) some tasks such as paraphrase detection and sentence similarity are more robust to the dropping of layers, and (iii) models trained using different objective function exhibit different learning patterns and w.r.t the layer dropping.
Over the past few years, researchers have been focusing on the identification of offensive language on social networks. In places where English is not the primary language, social media users tend to ...post/comment using a code-mixed form of text. This poses various hitches in identifying offensive texts, and when combined with the limited resources available for languages such as Tamil, the task becomes considerably more challenging. This study undertakes multiple tests in order to detect potentially offensive texts in YouTube comments, made available through the HASOC-Offensive Language Identification track in Dravidian Code-Mix FIRE 2021.11https://competitions.codalab.org/competitions/31146. To detect the offensive texts, models based on traditional machine learning techniques, namely Bernoulli Naïve Bayes, Support Vector Machine, Logistic Regression, and K-Nearest Neighbor, were created. In addition, pre-trained multilingual transformer-based natural language processing models such as mBERT, MuRIL (Base and Large), and XLM-RoBERTa (Base and Large) were also attempted. These models were used as fine-tuner and adapter transformers. In essence, adapters and fine-tuners accomplish the same goal, but adapters function by adding layers to the main pre-trained model and freezing their weights. This study shows that transformer-based models outperform machine learning approaches. Furthermore, in low-resource languages such as Tamil, adapter-based techniques surpass fine-tuned models in terms of both time and efficiency.
Of all the adapter-based approaches, XLM-RoBERTa (Large) was found to have the highest accuracy of 88.5%. The study also demonstrates that, compared to fine-tuning the models, the adapter models require training of a fewer parameters. In addition, the tests revealed that the proposed models performed notably well against a cross-domain data set.
•Identifying the predictive features that distinguish offensive texts in Tamil.•Measuring the efficacy of transformer models in classifying offensive texts in Tamil.•Testing the cross-domain ability of the proposed models on the misogynous texts.
Clinical concept extraction using transformers Yang, Xi; Bian, Jiang; Hogan, William R ...
Journal of the American Medical Informatics Association : JAMIA,
12/2020, Volume:
27, Issue:
12
Journal Article
Peer reviewed
Open access
The goal of this study is to explore transformer-based models (eg, Bidirectional Encoder Representations from Transformers BERT) for clinical concept extraction and develop an open-source package ...with pretrained clinical models to facilitate concept extraction and other downstream natural language processing (NLP) tasks in the medical domain.
We systematically explored 4 widely used transformer-based architectures, including BERT, RoBERTa, ALBERT, and ELECTRA, for extracting various types of clinical concepts using 3 public datasets from the 2010 and 2012 i2b2 challenges and the 2018 n2c2 challenge. We examined general transformer models pretrained using general English corpora as well as clinical transformer models pretrained using a clinical corpus and compared them with a long short-term memory conditional random fields (LSTM-CRFs) mode as a baseline. Furthermore, we integrated the 4 clinical transformer-based models into an open-source package.
The RoBERTa-MIMIC model achieved state-of-the-art performance on 3 public clinical concept extraction datasets with F1-scores of 0.8994, 0.8053, and 0.8907, respectively. Compared to the baseline LSTM-CRFs model, RoBERTa-MIMIC remarkably improved the F1-score by approximately 4% and 6% on the 2010 and 2012 i2b2 datasets. This study demonstrated the efficiency of transformer-based models for clinical concept extraction. Our methods and systems can be applied to other clinical tasks. The clinical transformer package with 4 pretrained clinical models is publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerNER. We believe this package will improve current practice on clinical concept extraction and other tasks in the medical domain.
. Low-dose computed tomography (LDCT) denoising is an important problem in CT research. Compared to the normal dose CT, LDCT images are subjected to severe noise and artifacts. Recently in many ...studies, vision transformers have shown superior feature representation ability over the convolutional neural networks (CNNs). However, unlike CNNs, the potential of vision transformers in LDCT denoising was little explored so far. Our paper aims to further explore the power of transformer for the LDCT denoising problem.
. In this paper, we propose a Convolution-free Token2Token Dilated Vision Transformer (CTformer) for LDCT denoising. The CTformer uses a more powerful token rearrangement to encompass local contextual information and thus avoids convolution. It also dilates and shifts feature maps to capture longer-range interaction. We interpret the CTformer by statically inspecting patterns of its internal attention maps and dynamically tracing the hierarchical attention flow with an explanatory graph. Furthermore, overlapped inference mechanism is employed to effectively eliminate the boundary artifacts that are common for encoder-decoder-based denoising models.
. Experimental results on Mayo dataset suggest that the CTformer outperforms the state-of-the-art denoising methods with a low computational overhead.
. The proposed model delivers excellent denoising performance on LDCT. Moreover, low computational cost and interpretability make the CTformer promising for clinical applications.
This survey paper reviews Natural Language Processing Models and their use in COVID-19 research in two main areas. Firstly, a range of transformer-based biomedical pretrained language models are ...evaluated using the BLURB benchmark. Secondly, models used in sentiment analysis surrounding COVID-19 vaccination are evaluated. We filtered literature curated from various repositories such as PubMed and Scopus and reviewed 27 papers. When evaluated using the BLURB benchmark, the novel T-BPLM BioLinkBERT gives groundbreaking results by incorporating document link knowledge and hyperlinking into its pretraining. Sentiment analysis of COVID-19 vaccination through various Twitter API tools has shown the public’s sentiment towards vaccination to be mostly positive. Finally, we outline some limitations and potential solutions to drive the research community to improve the models used for NLP tasks.
•This paper primarily reviews transformer-based NLP models for COVID-19 research.•The performance of these models is compared using the BLURB benchmarking framework.•The use of these models for sentiment analysis relating to vaccine hesitancy is reviewed.•Open challenges relating to the optimisation of these ML models are discussed.
Medical image segmentation is crucial for enhancing diagnostic accuracy through pixel labeling. State-of-the-art networks, despite their performance, have high computational demands, limiting ...real-time use on constrained devices. Lightweight networks face challenges in balancing detail processing with precision. Vision Transformer models, while promising, also have computational concerns. This study presents a novel method that merges Vision Transformer strengths with a unique knowledge distillation technique. A pivotal element of our approach is the Token Importance Ranking Distillation, which facilitates the meticulous transfer of top-k token importance rankings between a complex teacher model and a simplified student model, guided by a specialized ranking loss function. This method is essential for optimizing the student model to effectively emulate the teacher model’s ability to encapsulate vital semantic and spatial information. Additionally, we introduce an innovative methodology in structural texture knowledge, utilizing a Contourlet Decomposition Module (CDM), which enriches the models with nuanced texture representation, crucial for extracting directional features and capturing intricate global and local contexts in medical imaging. Complementing this, we deploy a unique multi-stage distillation strategy, the Space Channel Cascade Fusion (SCCF), to refine both spatial and channel information concurrently, mitigating redundancy and enhancing representational effectiveness in feature maps. Experimental results demonstrate the effectiveness of our approach in elevating the performance of student models while diminishing computational demands, thereby enabling efficient, real-time medical image segmentation on resource-constrained devices.
The COVID-19 pandemic has resulted in a surge of fake news, creating public health risks. However, developing an effective way to detect such news is challenging, especially when published news ...involves mixing true and false information. Detecting COVID-19 fake news has become a critical task in the field of natural language processing (NLP). This paper explores the effectiveness of several machine learning algorithms and fine-tuning pre-trained transformer-based models, including Bidirectional Encoder Representations from Transformers (BERT) and COVID-Twitter-BERT (CT-BERT), for COVID-19 fake news detection. We evaluate the performance of different downstream neural network structures, such as CNN and BiGRU layers, added on top of BERT and CT-BERT with frozen or unfrozen parameters. Our experiments on a real-world COVID-19 fake news dataset demonstrate that incorporating BiGRU on top of the CT-BERT model achieves outstanding performance, with a state-of-the-art F1 score of 98%. These results have significant implications for mitigating the spread of COVID-19 misinformation and highlight the potential of advanced machine learning models for fake news detection.