Transformers in Vision: A Survey Khan, Salman; Naseer, Muzammal; Hayat, Munawar ...
ACM computing surveys,
01/2022, Letnik:
54, Številka:
10s
Journal Article
Recenzirano
Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, ...Transformers enable modeling long dependencies between input sequence elements and support parallel processing of sequence as compared to recurrent networks, e.g., Long short-term memory. Different from convolutional networks, Transformers require minimal inductive biases for their design and are naturally suited as set-functions. Furthermore, the straightforward design of Transformers allows processing multiple modalities (e.g., images, videos, text, and speech) using similar processing blocks and demonstrates excellent scalability to very large capacity networks and huge datasets. These strengths have led to exciting progress on a number of vision tasks using Transformer networks. This survey aims to provide a comprehensive overview of the Transformer models in the computer vision discipline. We start with an introduction to fundamental concepts behind the success of Transformers, i.e., self-attention, large-scale pre-training, and bidirectional feature encoding. We then cover extensive applications of transformers in vision including popular recognition tasks (e.g., image classification, object detection, action recognition, and segmentation), generative modeling, multi-modal tasks (e.g., visual-question answering, visual reasoning, and visual grounding), video processing (e.g., activity recognition, video forecasting), low-level vision (e.g., image super-resolution, image enhancement, and colorization), and three-dimensional analysis (e.g., point cloud classification and segmentation). We compare the respective advantages and limitations of popular techniques both in terms of architectural design and their experimental value. Finally, we provide an analysis on open research directions and possible future works. We hope this effort will ignite further interest in the community to solve current challenges toward the application of transformer models in computer vision.
We propose distributed deep neural networks (DDNNs) over distributed computing hierarchies, consisting of the cloud, the edge (fog) and end devices. While being able to accommodate inference of a ...deep neural network (DNN) in the cloud, a DDNN also allows fast and localized inference using shallow portions of the neural network at the edge and end devices. When supported by a scalable distributed computing hierarchy, a DDNN can scale up in neural network size and scale out in geographical span. Due to its distributed nature, DDNNs enhance sensor fusion, system fault tolerance and data privacy for DNN applications. In implementing a DDNN, we map sections of a DNN onto a distributed computing hierarchy. By jointly training these sections, we minimize communication and resource usage for devices and maximize usefulness of extracted features which are utilized in the cloud. The resulting system has built-in support for automatic sensor fusion and fault tolerance. As a proof of concept, we show a DDNN can exploit geographical diversity of sensors to improve object recognition accuracy and reduce communication cost. In our experiment, compared with the traditional method of offloading raw sensor data to be processed in the cloud, DDNN locally processes most sensor data on end devices while achieving high accuracy and is able to reduce the communication cost by a factor of over 20x.
Time-series forecasting with deep learning: a survey
Philosophical transactions - Royal Society. Mathematical, Physical and engineering sciences/Philosophical transactions - Royal Society. Mathematical, physical and engineering sciences,
04/2021
Journal Article
A key element in solving real-life data science problems is selecting the types of models to use. Tree ensemble models (such as XGBoost) are usually recommended for classification and regression ...problems with tabular data. However, several deep learning models for tabular data have recently been proposed, claiming to outperform XGBoost for some use cases. This paper explores whether these deep models should be a recommended option for tabular data by rigorously comparing the new deep models to XGBoost on various datasets. In addition to systematically comparing their performance, we consider the tuning and computation they require. Our study shows that XGBoost outperforms these deep models across the datasets, including the datasets used in the papers that proposed the deep models. We also demonstrate that XGBoost requires much less tuning. On the positive side, we show that an ensemble of deep models and XGBoost performs better on these datasets than XGBoost alone.
•Deep neural networks are not good for all type of tabular data.•XGboost outperforms deep models on tabular data.•It harder to optimize deep neural networks compared to XGBoost.
Recent developments in many-body potential energy representation via deep learning have brought new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Here we ...describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics. Potential applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. DeePMD-kit is interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, i.e., LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. As an example of the many potential applications of the package, we use DeePMD-kit to learn the interatomic potential energy and forces of a water model using data obtained from density functional theory. We demonstrate that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.
Program Title: DeePMD-kit
Program Files doi:http://dx.doi.org/10.17632/hvfh9yvncf.1
Licensing provisions: LGPL
Programming language: Python/C++
Nature of problem: Modeling the many-body atomic interactions by deep neural network models. Running molecular dynamics simulations with the models.
Solution method: The Deep Potential for Molecular Dynamics (DeePMD) method is implemented based on the deep learning framework TensorFlow. Supports for using a DeePMD model in LAMMPS and i-PI, for classical and quantum (path integral) molecular dynamics are provided.
Additional comments including Restrictions and Unusual features: The code defines a data protocol such that the energy, force, and virial calculated by different third-party molecular simulation packages can be easily processed and used as model training data.
Nonlinear methods such as Deep Neural Networks (DNNs) are the gold standard for various challenging machine learning problems such as image recognition. Although these methods perform impressively ...well, they have a significant disadvantage, the lack of transparency, limiting the interpretability of the solution and thus the scope of application in practice. Especially DNNs act as black boxes due to their multilayer nonlinear structure. In this paper we introduce a novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements. Although our focus is on image classification, the method is applicable to a broad set of input data, learning tasks and network architectures. Our method called deep Taylor decomposition efficiently utilizes the structure of the network by backpropagating the explanations from the output to the input layer. We evaluate the proposed method empirically on the MNIST and ILSVRC data sets.
•A novel method to explain nonlinear classification decisions in terms of input variables is introduced.•The method is based on Taylor expansions and decomposes the output of a deep neural network in terms of input variables.•The resulting deep Taylor decomposition can be applied directly to existing neural networks without retraining.•The method is tested on two large-scale neural networks for image classification: BVLC CaffeNet and GoogleNet.
Deepfakes: Trick or treat? Kietzmann, Jan; Lee, Linda W.; McCarthy, Ian P. ...
Business horizons,
03/2020, Letnik:
63, Številka:
2
Journal Article
Recenzirano
Odprti dostop
Although manipulations of visual and auditory media are as old as media themselves, the recent entrance of deepfakes has marked a turning point in the creation of fake content. Powered by the latest ...technological advances in artificial intelligence and machine learning, deepfakes offer automated procedures to create fake content that is harder and harder for human observers to detect. The possibilities to deceive are endless—including manipulated pictures, videos, and audio—and organizations must be prepared as this will undoubtedly have a large societal impact. In this article, we provide a working definition of deepfakes together with an overview of its underlying technology. We classify different deepfake types and identify risks and opportunities to help organizations think about the future of deepfakes. Finally, we propose the R.E.A.L. framework to manage deepfake risks: Record original content to ensure deniability, Expose deepfakes early, Advocate for legal protection, and Leverage trust to counter credulity. Following these principles, we hope that our society can be more prepared to counter deepfake tricks as we appreciate deepfake treats.
Attention mechanisms can extract salient features in images, which has been proven to be effective for person re-identification. However, focusing on the saliency of an image is not enough. On the ...one hand, the salient features extracted from the model are not necessarily the features needed, e.g., a similar background may also be mistaken as salient features; on the other hand, various salient features are often more conducive to improving the performance of the model. Based on this, in this paper, a model that has joint weak saliency and attention aware is proposed, which can obtain more complete global features by weakening saliency features. The model then obtains diversified saliency features via attention diversity to improve the performance of the model. Experiments on commonly used datasets prove the effectiveness of the proposed method.
Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their ...performance on a test dataset. The quality of the test dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized for DL systems to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in traditional software, we first define a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs). Then we design a set of model-level mutation operators that directly inject faults into DL models without a training process. Eventually, the quality of test data could be evaluated from the analysis on to what extent the injected faults could be detected. The usefulness of the proposed mutation testing techniques is demonstrated on two public datasets, namely MNIST and CIFAR-10, with three DL models.
This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. As a tutorial paper, the ...set of methods covered here is not exhaustive, but sufficiently representative to discuss a number of questions in interpretability, technical challenges, and possible applications. The second part of the tutorial focuses on the recently proposed layer-wise relevance propagation (LRP) technique, for which we provide theory, recommendations, and tricks, to make most efficient use of it on real data.