The artificial intelligence revolution has been spurred forward by the availability of large-scale datasets. In contrast, the paucity of large-scale medical datasets hinders the application of ...machine learning in healthcare. The lack of publicly available multi-centric and diverse datasets mainly stems from confidentiality and privacy concerns around sharing medical data. To demonstrate a feasible path forward in medical image imaging, we conduct a case study of applying a differentially private federated learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. We study the effects of IID and non-IID distributions along with the number of healthcare providers, i.e., hospitals and clinics, and the individual dataset sizes, using The Cancer Genome Atlas (TCGA) dataset, a public repository, to simulate a distributed environment. We empirically compare the performance of private, distributed training to conventional training and demonstrate that distributed training can achieve similar performance with strong privacy guarantees. We also study the effect of different source domains for histopathology images by evaluating the performance using external validation. Our work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis.
Artificial intelligence powered by deep neural networks has reached a level of complexity where it can be difficult or impossible to express how a model makes its decisions. This black-box problem is ...especially concerning when the model makes decisions with consequences for human well-being. In response, an emerging field called explainable artificial intelligence (XAI) aims to increase the interpretability, fairness, and transparency of machine learning. In this paper, we describe how cognitive psychologists can make contributions to XAI. The human mind is also a black box, and cognitive psychologists have over 150 years of experience modeling it through experimentation. We ought to translate the methods and rigor of cognitive psychology to the study of artificial black boxes in the service of explainability. We provide a review of XAI for psychologists, arguing that current methods possess a blind spot that can be complemented by the experimental cognitive tradition. We also provide a framework for research in XAI, highlight exemplary cases of experimentation within XAI inspired by psychological science, and provide a tutorial on experimenting with machines. We end by noting the advantages of an experimental approach and invite other psychologists to conduct research in this exciting new field.
Reasoning about graphs evolving over time is a challenging concept in many domains, such as bioinformatics, physics, and social networks. We consider a common case in which edges can be short term ...interactions (e.g., messaging) or long term structural connections (e.g., friendship). In practice, long term edges are often specified by humans. Human-specified edges can be both expensive to produce and suboptimal for the downstream task. To alleviate these issues, we propose a model based on temporal point processes and variational autoencoders that learns to infer temporal attention between nodes by observing node communication. As temporal attention drives between-node feature propagation, using the dynamics of node interactions to learn this key component provides more flexibility while simultaneously avoiding issues associated with human-specified edges. We also propose a bilinear transformation layer for pairs of node features instead of concatenation, typically used in prior work, and demonstrate its superior performance in all cases. In experiments on two datasets in the dynamic link prediction task, our model often outperforms the baseline model that requires a human-specified graph. Moreover, our learned attention is semantically interpretable and infers connections similar to actual graphs.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
In this paper we introduce a Transformer-based approach to video object segmentation (VOS). To address compounding error and scalability issues of prior work, we propose a scalable, end-to-end method ...for VOS called Sparse Spatiotemporal Transformers (SST). SST extracts per-pixel representations for each object in a video using sparse attention over spatiotemporal features. Our attention-based formulation for VOS allows a model to learn to attend over a history of multiple frames and provides suitable inductive bias for performing correspondence-like computations necessary for solving motion segmentation. We demonstrate the effectiveness of attention-based over recurrent networks in the spatiotemporal domain. Our method achieves competitive results on YouTube-VOS and DAVIS 2017 with improved scalability and robustness to occlusions compared with the state of the art. Code is available at https://github.com/dukebw/SSTVOS.
In applying machine learning to remote sensing problems, it is often the case that multiple training data sources, known as domains, are available for the same task. It is sample-inefficient to train ...separate models per domain, which motivates learning a single model from multiple sources. For example, the local climate zone (LCZ) classification problem that aims to produce per-pixel classifications of surface structure from remotely sensed images of urban and rural environments. These classification maps need to be generated for different cities at different times. To do this efficiently, available training data from different sources (i.e., cities) must be adapted for the task at hand. However, multisource domain adaptation (MDA) is a challenging problem and is particularly apparent when there are significant changes in the data distribution among these sources. In this article, we propose a scalable yet simple adaptive MDA (AMDA) framework to address this problem. AMDA is also capable of dealing with imbalanced data distributions among the sources more effectively than existing baselines. We also extend two techniques originally proposed for domain expansion (DE) to the task of DA. AMDA and the extended DE techniques are implemented and evaluated on the LCZ classification problem. Despite its simplicity, AMDA is able to achieve more than 12% improvement over the baseline.
Ecological camera traps are increasingly used by wildlife biologists to unobtrusively monitor an ecosystems animal population. However, manual inspection of the images produced is expensive, ...laborious, and time‐consuming. The success of deep learning systems using camera trap images has been previously explored in preliminary stages. These studies, however, are lacking in their practicality. They are primarily focused on extremely large datasets, often millions of images, and there is little to no focus on performance when tasked with species identification in new locations not seen during training. Our goal was to test the capabilities of deep learning systems trained on camera trap images using modestly sized training data, compare performance when considering unseen background locations, and quantify the gradient of lower bound performance to provide a guideline of data requirements in correspondence to performance expectations. We use a dataset provided by Parks Canada containing 47,279 images collected from 36 unique geographic locations across multiple environments. Images represent 55 animal species and human activity with high‐class imbalance. We trained, tested, and compared the capabilities of six deep learning computer vision networks using transfer learning and image augmentation: DenseNet201, Inception‐ResNet‐V3, InceptionV3, NASNetMobile, MobileNetV2, and Xception. We compare overall performance on “trained” locations where DenseNet201 performed best with 95.6% top‐1 accuracy showing promise for deep learning methods for smaller scale research efforts. Using trained locations, classifications with <500 images had low and highly variable recall of 0.750 ± 0.329, while classifications with over 1,000 images had a high and stable recall of 0.971 ± 0.0137. Models tasked with classifying species from untrained locations were less accurate, with DenseNet201 performing best with 68.7% top‐1 accuracy. Finally, we provide an open repository where ecologists can insert their image data to train and test custom species detection models for their desired ecological domain.
Here, we present the capabilities of deep learning computer vision system for species identification from camera trap images. We compare when background locations are found within the training data, while also excluded. In addition, we also provide guidelines for the number of images required for ecologists when considering their ecosystem of interest.
The ability of a researcher to re‐identify (re‐ID) an individual animal upon re‐encounter is fundamental for addressing a broad range of questions in the study of ecosystem function, community and ...population dynamics and behavioural ecology. Tagging animals during mark and recapture studies is the most common method for reliable animal re‐ID; however, camera traps are a desirable alternative, requiring less labour, much less intrusion and prolonged and continuous monitoring into an environment. Despite these advantages, the analyses of camera traps and video for re‐ID by humans are criticized for their biases related to human judgement and inconsistencies between analyses.
In this review, we describe a brief history of camera traps for re‐ID, present a collection of computer vision feature engineering methodologies previously used for animal re‐ID, provide an introduction to the underlying mechanisms of deep learning relevant to animal re‐ID, highlight the success of deep learning methods for human re‐ID, describe the few ecological studies currently utilizing deep learning for camera trap analyses and our predictions for near future methodologies based on the rapid development of deep learning methods.
For decades, ecologists with expertise in computer vision have successfully utilized feature engineering to extract meaningful features from camera trap images to improve the statistical rigor of individual comparisons and remove human bias from their camera trap analyses. Recent years have witnessed the emergence of deep learning systems which have demonstrated the accurate re‐ID of humans based on image and video data with near perfect accuracy. Despite this success, ecologists have yet to utilize these approaches for animal re‐ID.
By utilizing novel deep learning methods for object detection and similarity comparisons, ecologists can extract animals from an image/video data and train deep learning classifiers to re‐ID animal individuals beyond the capabilities of a human observer. This methodology will allow ecologists with camera/video trap data to reidentify individuals that exit and re‐enter the camera frame. Our expectation is that this is just the beginning of a major trend that could stand to revolutionize the analysis of camera trap data and, ultimately, our approach to animal ecology.
We tackle the challenge of learning a distribution over complex, realistic, indoor scenes. In this paper, we introduce Generative Scene Networks (GSN), which learns to decompose scenes into a ...collection of many local radiance fields that can be rendered from a free moving camera. Our model can be used as a prior to generate new scenes, or to complete a scene given only sparse 2D observations. Recent work has shown that generative models of radiance fields can capture properties such as multi-view consistency and view-dependent lighting. However, these models are specialized for constrained viewing of single objects, such as cars or faces. Due to the size and complexity of realistic indoor environments, existing models lack the representational capacity to adequately capture them. Our decomposition scheme scales to larger and more complex scenes while preserving details and diversity, and the learned prior enables high-quality rendering from viewpoints that are significantly different from observed viewpoints. When compared to existing models, GSN produces quantitatively higher-quality scene renderings across several different scene datasets.
DNA-based identification is vital for classifying biological specimens, yet methods to quantify the uncertainty of sequence-based taxonomic assignments are scarce. Challenges arise from noisy ...reference databases, including mislabelled entries and missing taxa. PROTAX addresses these issues with a probabilistic approach to taxonomic classification, advancing on methods that rely solely on sequence similarity. It provides calibrated probabilistic assignments to a partially populated taxonomic hierarchy, accounting for taxa that lack references and incorrect taxonomic annotation. While effective on smaller scales, global application of PROTAX necessitates substantially larger reference libraries, a goal previously hindered by computational barriers. We introduce PROTAX-GPU, a scalable algorithm capable of leveraging the global Barcode of Life Data System (>14 million specimens) as a reference database. Using graphics processing units (GPU) to accelerate similarity and nearest-neighbour operations and the JAX library for Python integration, we achieve over a 1000 × speedup compared with the central processing unit (CPU)-based implementation without compromising PROTAX's key benefits. PROTAX-GPU marks a significant stride towards real-time DNA barcoding, enabling quicker and more efficient species identification in environmental assessments. This capability opens up new avenues for real-time monitoring and analysis of biodiversity, advancing our ability to understand and respond to ecological dynamics. This article is part of the theme issue 'Towards a toolkit for global insect biodiversity monitoring'.