In maritime traffic surveillance, detecting illegal activities, such as illegal fishing or transshipment of illicit products is a crucial task of the coastal administration. In the open sea, one has ...to rely on Automatic Identification System (AIS) message transmitted by on-board transponders, which are captured by surveillance satellites. However, insincere vessels often intentionally shut down their AIS transponders to hide illegal activities. In the open sea, it is very challenging to differentiate intentional AIS shutdowns from missing reception due to protocol limitations, bad weather conditions or restricting satellite positions. This paper presents a novel approach for the detection of abnormal AIS missing reception based on self-supervised deep learning techniques and transformer models. Using historical data, the trained model predicts if a message should be received in the upcoming minute or not. Afterwards, the model reports on detected anomalies by comparing the prediction with what actually happens. Our method can process AIS messages in real-time, in particular, more than 500 Millions AIS messages per month, corresponding to the trajectories of more than 60 000 ships. The method is evaluated on 1-year of real-world data coming from four Norwegian surveillance satellites. Using related research results, we validated our method by rediscovering already detected intentional AIS shutdowns.
In recent years, neural models learned through self-supervised pretraining on large scale multilingual text or speech data have exhibited promising results for underresourced languages, especially ...when a relatively large amount of data from related language(s) is available. While the technology has a potential for facilitating tasks carried out in language documentation projects, such as speech transcription, pretraining a multilingual model from scratch for every new language would be highly impractical. We investigate the possibility for adapting an existing multilingual wav2vec 2.0 model for a new language, focusing on actual fieldwork data from a critically endangered tongue: Ainu. Specifically, we (i) examine the feasibility of leveraging data from similar languages also in fine-tuning; (ii) verify whether the model’s performance can be improved by further pretraining on target language data. Our results show that continued pretraining is the most effective method to adapt a wav2vec 2.0 model for a new language and leads to considerable reduction in error rates. Furthermore, we find that if a model pretrained on a related speech variety or an unrelated language with similar phonological characteristics is available, multilingual fine-tuning using additional data from that language can have positive impact on speech recognition performance when there is very little labeled data in the target language.
•Downstream performance of a multilingual speech representation model on a new, underresourced language can be improved through multilingual fine-tuning and additional pretraining.•Continued pretraining on target language data leads to substantially lower error rates in automatic speech transcription.•Multilingual fine-tuning with additional data from a related or similar language helps when labeled target language data is scarce.
Stakeholders in software projects use issue trackers like JIRA or Bugzilla to capture and manage issues, including requirements, feature requests, and bugs. To ease issue navigation and structure ...project knowledge, stakeholders manually connect issues via links of certain types that reflect different dependencies, such as Epic-, Block-, Duplicate-, or Relate- links. Based on a large dataset of 16 JIRA repositories, we study the commonalities and differences in linking practices and link types across the repositories. We then investigate how state-of-the-art machine learning models can predict common link types. We observed significant differences across the repositories and link types, depending on how they are used and by whom. Additionally, we observed several inconsistencies, e.g., in how Duplicate links are used. We found that a transformer model trained on titles and descriptions of linked issues significantly outperforms other optimized models, achieving an encouraging average macro F1-score of 0.64 for predicting nine popular link types across all repositories (weighted F1-score of 0.73). For the specific Subtask- and Epic- links, the model achieves top F1-scores of 0.89 and 0.97, respectively. If we restrict the task to predict the mere existence of links, the average macro F1-score goes up to 0.95. In general, the shorter issue text, possibly indicating precise issues, seems to improve the prediction accuracy with a strong negative correlation of
-
0.73. We found that Relate-links often get confused with the other links, which suggests that they are likely used as default links in unclear cases. Our findings particularly on the quality and heterogeinity of issue link data have implications for researching and applying issue link prediction in practice.
Social scientists have long been interested in understanding the extent to which the typicalities of an object in concepts relate to its valuations by social actors. Answering this question has ...proven to be challenging because precise measurement reguires a feature-based description of objects. Yet, such descriptions are frequently unavailable. In this article, we introduce a method to measure typicality based on text data. Our approach involves training a deep-learning text classifier based on the BERT language representation and defining the typicality of an object in a concept in terms of the categorization probability produced by the trained classifier. Model training allows for the construction of a feature space adapted to the categorization task and of a mapping between feature combination and typicality that gives more weight to feature dimensions that matter more for categorization. We validate the approach by comparing the BERT-based typicality measure of book descriptions in literary genres with average human typicality ratings. The obtained correlation is higher than 0.85. Comparisons with other typicality measures used in prior research show that our BERT-based measure better reflects human typicality judgments.
Background
Semantic textual similarity (STS) is a natural language processing (NLP) task that involves assigning a similarity score to 2 snippets of text based on their meaning. This task is ...particularly difficult in the domain of clinical text, which often features specialized language and the frequent use of abbreviations.
Objective
We created an NLP system to predict similarity scores for sentence pairs as part of the Clinical Semantic Textual Similarity track in the 2019 n2c2/OHNLP Shared Task on Challenges in Natural Language Processing for Clinical Data. We subsequently sought to analyze the intermediary token vectors extracted from our models while processing a pair of clinical sentences to identify where and how representations of semantic similarity are built in transformer models.
Methods
Given a clinical sentence pair, we take the average predicted similarity score across several independently fine-tuned transformers. In our model analysis we investigated the relationship between the final model’s loss and surface features of the sentence pairs and assessed the decodability and representational similarity of the token vectors generated by each model.
Results
Our model achieved a correlation of 0.87 with the ground-truth similarity score, reaching 6th place out of 33 teams (with a first-place score of 0.90). In detailed qualitative and quantitative analyses of the model’s loss, we identified the system’s failure to correctly model semantic similarity when both sentence pairs contain details of medical prescriptions, as well as its general tendency to overpredict semantic similarity given significant token overlap. The token vector analysis revealed divergent representational strategies for predicting textual similarity between bidirectional encoder representations from transformers (BERT)–style models and XLNet. We also found that a large amount information relevant to predicting STS can be captured using a combination of a classification token and the cosine distance between sentence-pair representations in the first layer of a transformer model that did not produce the best predictions on the test set.
Conclusions
We designed and trained a system that uses state-of-the-art NLP models to achieve very competitive results on a new clinical STS data set. As our approach uses no hand-crafted rules, it serves as a strong deep learning baseline for this task. Our key contribution is a detailed analysis of the model’s outputs and an investigation of the heuristic biases learned by transformer models. We suggest future improvements based on these findings. In our representational analysis we explore how different transformer models converge or diverge in their representation of semantic signals as the tokens of the sentences are augmented by successive layers. This analysis sheds light on how these “black box” models integrate semantic similarity information in intermediate layers, and points to new research directions in model distillation and sentence embedding extraction for applications in clinical NLP.
Automatic item generation (AIG) has the potential to greatly expand the number of items for educational assessments, while simultaneously allowing for a more construct-driven approach to item ...development. However, the traditional item modeling approach in AIG is limited in scope to content areas that are relatively easy to model (such as math problems), and depends on highly skilled content experts to create each model. In this paper we describe the interactive reading task, a transformer-based deep language modeling approach for creating reading comprehension assessments. This approach allows a fully automated process for the creation of source passages together with a wide range of comprehension questions about the passages. The format of the questions allows automatic scoring of responses with high fidelity (e.g., selected response questions). We present the results of a large-scale pilot of the interactive reading task, with hundreds of passages and thousands of questions. These passages were administered as part of the practice test of the Duolingo English Test. Human review of the materials and psychometric analyses of test taker results demonstrate the feasibility of this approach for automatic creation of complex educational assessments.
In an age when open access to law enforcement files and judicial documents can erode individual privacy and confidentiality, miscreants can abuse this open access to personal information for ...blackmail, misinformation, and even social engineering. Yet, limiting access to law enforcement and court cases is a freedom-of-information violation. To address this tension, this collaborative action-research-based teaching case exemplifies how Italy’s Corte dei Conti (Court of Auditors) used artificial intelligence in the automated deidentification and anonymization of court documents in Italy’s public sector. This teaching case is aimed at undergraduate and graduate students learning about Artificial Intelligence (AI), Large Language Model (LLM) (e.g., ChatGPT) evolution, development, and operations. The case will help students learn the origin and evolution of AI transformer models and architectures, and discusses the GiusBERTo operation and process, highlighting opportunities and challenges. GiusBERTo, Italy’s custom-AI model, offers an innovative approach that walks a tightrope between anonymizing Italy’s judicial court documents without sacrificing context or information loss. The case ends with a series of questions, challenges, and potential for LLMs in data anonymization.
Accurate detection and classification of white blood cells, otherwise known as leukocytes, play a critical role in diagnosing and monitoring various illnesses. However, conventional methods, such as ...manual classification by trained professionals, must be revised in terms of accuracy, efficiency, and potential bias. Moreover, applying deep learning techniques to detect and classify white blood cells using microscopic images is challenging owing to limited data, resolution noise, irregular shapes, and varying colors from different sources. This study presents a novel approach integrating object detection and classification for numerous type-white blood cell. We designed a 2-way approach to use two types of images: WBC and nucleus. YOLO (fast object detection) and ViT (powerful image representation capabilities) are effectively integrated into 16 classes. The proposed model demonstrates an exceptional 96.449% accuracy rate in classification.
•We proposed a 2-way approach to use two types of WBC and nucleus images.•We presented a hybrid architecture that combines the strengths of YOLO and ViT.•Our model attains an accuracy of 96.49% for 16 classes, including rare classes.•Ablation analysis shows the value of combining object detection and ViT integration.
Combining symbolic and subsymbolic methods has become a promising strategy as research tasks in AI grow increasingly complicated and require higher levels of understanding. Targeted Aspect-based ...Financial Sentiment Analysis (TABFSA) is an example of such complicated tasks, as it involves processes like information extraction, information specification, and domain adaptation. However, little is known about the design principles of such hybrid models leveraging external lexical knowledge. To fill this gap, we define anterior, parallel, and posterior knowledge integration and propose incorporating multiple lexical knowledge sources strategically into the fine-tuning process of pre-trained transformer models for TABFSA. Experiments on the FiQA Task 1 and SemEval 2017 Task 5 datasets show that the knowledge-enabled models systematically improve upon their plain deep learning counterparts, and some outperform state-of-the-art results reported in terms of aspect sentiment analysis error. We discover that parallel knowledge integration is the most effective and domain-specific lexical knowledge is more important according to our ablation analysis.