Immunohistochemistry (IHC) is a widely used laboratory technique for cancer diagnosis, which selectively binds specific antibodies to target proteins in tissue samples and then makes the bound ...proteins visible through chemical staining. Deep learning approaches have the potential to be employed in quantifying tumor immune micro-environment (TIME) in digitized IHC histological slides. However, it lacks of publicly available IHC datasets explicitly collected for the in-depth TIME analysis.
In this paper, a notable Multiplex IHC Histopathological Image Classification (MIHIC) dataset is created based on manual annotations by pathologists, which is publicly available for exploring deep learning models to quantify variables associated with the TIME in lung cancer. The MIHIC dataset comprises of totally 309,698 multiplex IHC stained histological image patches, encompassing seven distinct tissue types: Alveoli, Immune cells, Necrosis, Stroma, Tumor, Other and Background. By using the MIHIC dataset, we conduct a series of experiments that utilize both convolutional neural networks (CNNs) and transformer models to benchmark IHC stained histological image classifications. We finally quantify lung cancer immune microenvironment variables by using the top-performing model on tissue microarray (TMA) cores, which are subsequently used to predict patients' survival outcomes.
Experiments show that transformer models tend to provide slightly better performances than CNN models in histological image classifications, although both types of models provide the highest accuracy of 0.811 on the testing dataset in MIHIC. The automatically quantified TIME variables, which reflect proportions of immune cells over stroma and tumor over tissue core, show prognostic value for overall survival of lung cancer patients.
To the best of our knowledge, MIHIC is the first publicly available lung cancer IHC histopathological dataset that includes images with 12 different IHC stains, meticulously annotated by multiple pathologists across 7 distinct categories. This dataset holds significant potential for researchers to explore novel techniques for quantifying the TIME and advancing our understanding of the interactions between the immune system and tumors.
Cardiac magnetic resonance image (MRI) segmentation has the features such as there is a lot of noise, the target areas are indistinguishable from the background, and the shape of the right ventricle ...is irregular. Although convolution operations are good at extracting local features, the U-shaped convolutional neural networks (CNN) structure hardly models long-distance dependency between pixels and can not achieve ideal segmentation results on cardiac MRI. To solve these problems, UConvTrans is proposed with a dual-flow U-shaped network by global and local information integration. First, the network applies the CNN branch to extract local features and capture global representations by Transformer branch, which retains local detailed features and suppresses the interference of noise and background features in cardiac MRI. Next, the bidirectional fusion module is proposed to fuse the features extracted by CNN and the Transformer with each other, enhancing the feature expression capability and improving the segme
Gender bias in artificial intelligence (AI) has emerged as a pressing concern with profound implications for individuals’ lives. This paper presents a comprehensive survey that explores gender bias ...in Transformer models from a linguistic perspective. While the existence of gender bias in language models has been acknowledged in previous studies, there remains a lack of consensus on how to measure and evaluate this bias effectively. Our survey critically examines the existing literature on gender bias in Transformers, shedding light on the diverse methodologies and metrics employed to assess bias. Several limitations in current approaches to measuring gender bias in Transformers are identified, encompassing the utilization of incomplete or flawed metrics, inadequate dataset sizes, and a dearth of standardization in evaluation methods. Furthermore, our survey delves into the potential ramifications of gender bias in Transformers for downstream applications, including dialogue systems and machine translation. We underscore the importance of fostering equity and fairness in these systems by emphasizing the need for heightened awareness and accountability in developing and deploying language technologies. This paper serves as a comprehensive overview of gender bias in Transformer models, providing novel insights and offering valuable directions for future research in this critical domain.
The object of the study is the process of identifying the state of a computer network. The subject of the study are the methods of identifying the state of computer networks. The purpose of the paper ...is to improve the efficacy of intrusion detection in computer networks by developing a method based on transformer models. The results obtained. The work analyzes traditional machine learning algorithms, deep learning methods and considers the advantages of using transformer models. A method for detecting intrusions in computer networks is proposed. This method differs from known approaches by utilizing the Vision Transformer for Small-size Datasets (ViTSD) deep learning algorithm. The method incorporates procedures to reduce the correlation of input data and transform data into a specific format required for model operations. The developed methods are implemented using Python and the GOOGLE COLAB cloud service with Jupyter Notebook. Conclusions. Experiments confirmed the efficiency of the proposed method. The use of the developed method based on the ViTSD algorithm and the data preprocessing procedure increases the model's accuracy to 98.7%. This makes it possible to recommend it for practical use, in order to improve the accuracy of identifying the state of a computer system.
Most of the models proposed in the literature for abstractive summarization are generally suitable for the English language but not for other languages. Multilingual models were introduced to address ...that language constraint, but despite their applicability being broader than that of the monolingual models, their performance is typically lower, especially for minority languages like Catalan. In this paper, we present a monolingual model for abstractive summarization of textual content in the Catalan language. The model is a Transformer encoder-decoder which is pretrained and fine-tuned specifically for the Catalan language using a corpus of newspaper articles. In the pretraining phase, we introduced several self-supervised tasks to specialize the model on the summarization task and to increase the abstractivity of the generated summaries. To study the performance of our proposal in languages with higher resources than Catalan, we replicate the model and the experimentation for the Spanish language. The usual evaluation metrics, not only the most used ROUGE measure but also other more semantic ones such as BertScore, do not allow to correctly evaluate the abstractivity of the generated summaries. In this work, we also present a new metric, called content reordering, to evaluate one of the most common characteristics of abstractive summaries, the rearrangement of the original content. We carried out an exhaustive experimentation to compare the performance of the monolingual models proposed in this work with two of the most widely used multilingual models in text summarization, mBART and mT5. The experimentation results support the quality of our monolingual models, especially considering that the multilingual models were pretrained with many more resources than those used in our models. Likewise, it is shown that the pretraining tasks helped to increase the degree of abstractivity of the generated summaries. To our knowledge, this is the first work that explores a monolingual approach for abstractive summarization both in Catalan and Spanish.
In the evolving landscape of microbiology and microbiome analysis, the integration of machine learning is crucial for understanding complex microbial interactions, and predicting and recognizing ...novel functionalities within extensive datasets. However, the effectiveness of these methods in microbiology faces challenges due to the complex and heterogeneous nature of microbial data, further complicated by low signal-to-noise ratios, context-dependency, and a significant shortage of appropriately labeled datasets. This study introduces the ProkBERT model family, a collection of large language models, designed for genomic tasks. It provides a generalizable sequence representation for nucleotide sequences, learned from unlabeled genome data. This approach helps overcome the above-mentioned limitations in the field, thereby improving our understanding of microbial ecosystems and their impact on health and disease.
ProkBERT models are based on transfer learning and self-supervised methodologies, enabling them to use the abundant yet complex microbial data effectively. The introduction of the novel Local Context-Aware (LCA) tokenization technique marks a significant advancement, allowing ProkBERT to overcome the contextual limitations of traditional transformer models. This methodology not only retains rich local context but also demonstrates remarkable adaptability across various bioinformatics tasks.
In practical applications such as promoter prediction and phage identification, the ProkBERT models show superior performance. For promoter prediction tasks, the top-performing model achieved a Matthews Correlation Coefficient (MCC) of 0.74 for
and 0.62 in mixed-species contexts. In phage identification, ProkBERT models consistently outperformed established tools like VirSorter2 and DeepVirFinder, achieving an MCC of 0.85. These results underscore the models' exceptional accuracy and generalizability in both supervised and unsupervised tasks.
The ProkBERT model family is a compact yet powerful tool in the field of microbiology and bioinformatics. Its capacity for rapid, accurate analyses and its adaptability across a spectrum of tasks marks a significant advancement in machine learning applications in microbiology. The models are available on GitHub (https://github.com/nbrg-ppcu/prokbert) and HuggingFace (https://huggingface.co/nerualbioinfo) providing an accessible tool for the community.
Reconciling Tap-Changing Transformer Models Cano, Jose M.; Mojumdar, Md Rejwanur Rashid; Orcajo, Gonzalo Alonso
IEEE transactions on power delivery,
12/2019, Volume:
34, Issue:
6
Journal Article
Peer reviewed
Open access
The model of the tap changing transformer used in classic power system studies, including load flow analysis or state estimation, is still somehow controversial. Two alternative formulations can be ...found in the literature, which have been adopted by the most important software packages. This work demonstrates that those formulations lead to similar results near the principal tap but to important discrepancies at extreme tap positions, with different impact depending on the power factor of the power flowing through the transformer. Moreover, a general model that fully explains those differences is proposed. The new model allows to adopt a third alternative that, without requiring further data than those used by traditional formulations, leads to highly improved results.
Recent contributions have shown that two widely used formulations of the tap-changing transformer model are controversial, as they generate dissimilar results depending on the selected tap and ...operating point. In recent works, the authors proposed a new model and demonstrated its consistency to reconcile this debate. It introduces a parameter which stands for the ratio between the impedances of the nominal and tapped winding of the transformer. However, this parameter is not provided with and cannot be obtained from standard datasheets, which compels the users to rely on rough approximations. To overcome this problem, an offline state-vector-augmented parameter estimation method capable of providing accurate estimates of transformer impedance ratios is proposed in this work. It is demonstrated that their use can effectively lead state estimators to better estimates of system states. This work also contributes with the derivatives of the different measurement functions in terms of the impedance ratios, which are essential for this or any other linearized state estimator. A multi-snapshot implementation is used to obtain a twofold advantage - increased measurement redundancy and improved accuracy of the estimated parameters. A detailed formulation of the implementation and several case studies are presented to demonstrate the validity of the proposal.
Insect monitoring has gained global public attention in recent years in the context of insect decline and biodiversity loss. Monitoring methods that can collect samples over a long period of time and ...independently of human influences are of particular importance. While these passive collection methods, e.g. suction traps, provide standardized and comparable data sets, the time required to analyze the large number of samples and trapped specimens is high. Another challenge is the necessary high level of taxonomic expertise required for accurate specimen processing. These factors create a bottleneck in specimen processing. In this context, machine learning, image recognition and artificial intelligence have emerged as promising tools to address the shortcomings of manual identification and quantification in the analysis of such trap catches. Aphids are important agricultural pests that pose a significant risk to several important crops and cause high economic losses through feeding damage and transmission of plant viruses. It has been shown that long-term monitoring of migrating aphids using suction traps can be used to make, adjust and improve predictions of their abundance so that the risk of plant viruses spreading through aphids can be more accurately predicted. With the increasing demand for alternatives to conventional pesticide use in crop protection, the need for predictive models is growing, e.g. as a basis for resistance development and as a measure for resistance management. In this context, advancing climate change has a strong influence on the total abundance of migrating aphids as well as on the peak occurrences of aphids within a year. Using aphids as a model organism, we demonstrate the possibilities of systematic monitoring of insect pests and the potential of future technical developments in the subsequent automated identification of individuals through to the use of case data for intelligent forecasting models. Using aphids as an example, we show the potential for systematic monitoring of insect pests through technical developments in the automated identification of individuals from static images (i.e. advances in image recognition software). We discuss the potential applications with regard to the automatic processing of insect case data and the development of intelligent prediction models.
Text Algorithms in Economics Ash, Elliott; Hansen, Stephen
Annual review of economics,
01/2023, Volume:
15, Issue:
1
Journal Article
Peer reviewed
Open access
This article provides an overview of the methods used for algorithmic text analysis in economics, with a focus on three key contributions. First, we introduce methods for representing documents as ...high-dimensional count vectors over vocabulary terms, for representing words as vectors, and for representing word sequences as embedding vectors. Second, we define four core empirical tasks that encompass most text-as-data research in economics and enumerate the various approaches that have been taken so far to accomplish these tasks. Finally, we flag limitations in the current literature, with a focus on the challenge of validating algorithmic output.