Temporal Segment Networks for Action Recognition in Videos Wang, Limin; Xiong, Yuanjun; Wang, Zhe ...
IEEE transactions on pattern analysis and machine intelligence,
2019-Nov.-1, 2019-Nov, 2019-11-1, 20191101, Letnik:
41, Številka:
11
Journal Article
Recenzirano
Odprti dostop
We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structure with a ...new segment-based sampling and aggregation scheme. This unique design enables the TSN framework to efficiently learn action models by using the whole video. The learned models could be easily deployed for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the implementation of the TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on five challenging action recognition benchmarks: HMDB51 (71.0 percent), UCF101 (94.9 percent), THUMOS14 (80.1 percent), ActivityNet v1.2 (89.6 percent), and Kinetics400 (75.7 percent). In addition, using the proposed RGB difference as a simple motion representation, our method can still achieve competitive accuracy on UCF101 (91.0 percent) while running at 340 FPS. Furthermore, based on the proposed TSN framework, we won the video classification track at the ActivityNet challenge 2016 among 24 teams.
The need for interpretable and accountable intelligent systems grows along with the prevalence of
artificial intelligence
(
AI
) applications used in everyday life.
Explainable AI
(
XAI
) systems are ...intended to self-explain the reasoning behind system decisions and predictions. Researchers from different disciplines work together to define, design, and evaluate explainable systems. However, scholars from different disciplines focus on different objectives and fairly independent topics of XAI research, which poses challenges for identifying appropriate design and evaluation methodology and consolidating knowledge across efforts. To this end, this article presents a survey and framework intended to share knowledge and experiences of XAI design and evaluation methods across multiple disciplines. Aiming to support diverse design goals and evaluation methods in XAI research, after a thorough review of XAI related papers in the fields of machine learning, visualization, and human-computer interaction, we present a categorization of XAI design goals and evaluation methods. Our categorization presents the mapping between design goals for different XAI user groups and their evaluation methods. From our findings, we develop a framework with step-by-step design guidelines paired with evaluation methods to close the iterative design and evaluation cycles in multidisciplinary XAI teams. Further, we provide summarized ready-to-use tables of evaluation methods and recommendations for different goals in XAI research.
Multimodal Distributional Semantics Bruni, E.; Tran, N. K.; Baroni, M.
The Journal of artificial intelligence research,
01/2014, Letnik:
49
Journal Article
Recenzirano
Odprti dostop
Distributional semantic models derive computational representations of word meaning from the patterns of co-occurrence of words in text. Such models have been a success story of computational ...linguistics, being able to provide reliable estimates of semantic relatedness for the many semantic tasks requiring them. However, distributional models extract meaning information exclusively from text, which is an extremely impoverished basis compared to the rich perceptual sources that ground human semantic knowledge. We address the lack of perceptual grounding of distributional models by exploiting computer vision techniques that automatically identify discrete visual words in images, so that the distributional representation of a word can be extended to also encompass its co-occurrence with the visual words of images it is associated with. We propose a flexible architecture to integrate text- and image-based distributional information, and we show in a set of empirical tests that our integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.
Knowledge Distillation: A Survey Gou, Jianping; Yu, Baosheng; Maybank, Stephen J. ...
International journal of computer vision,
06/2021, Letnik:
129, Številka:
6
Journal Article
Recenzirano
Odprti dostop
In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to ...encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher–student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.
This paper tackles the challenge of automatically assessing physical rehabilitation exercises for patients who perform the exercises without clinician supervision. The objective is to provide a ...quality score to ensure correct performance and achieve desired results. To achieve this goal, a new graph-based model, the Dense Spatio-Temporal Graph Conv-GRU Network with Transformer, is introduced. This model combines a modified version of STGCN and transformer architectures for efficient handling of spatio-temporal data. The key idea is to consider skeleton data respecting its non-linear structure as a graph and detecting joints playing the main role in each rehabilitation exercise. Dense connections and GRU mechanisms are used to rapidly process large 3D skeleton inputs and effectively model temporal dynamics. The transformer encoder’s attention mechanism focuses on relevant parts of the input sequence, making it useful for evaluating rehabilitation exercises. The evaluation of our proposed approach on the KIMORE and UI-PRMD datasets highlighted its potential, surpassing state-of-the-art methods in terms of accuracy and computational time. This resulted in faster and more accurate learning and assessment of rehabilitation exercises. Additionally, our model provides valuable feedback through qualitative illustrations, effectively highlighting the significance of joints in specific exercises.
We are living through a revolutionary moment in Al history. Users from diverse walks of life are adopting and using Al systems for their everyday use cases at a pace that has never been seen before. ...However, with this proliferation, there is also a growing recognition that many of the central open problems within Al are connected to how the user interacts with these systems. To name two prominent examples, consider the problems of explainability and value alignment. Each problem has received considerable attention within the wider Al community, and much promising progress has been made in addressing each of these individual problems. However, each of these problems tends to be studied in isolation, using very different theoretical frameworks, while a closer look at each easily reveals striking similarities between the two problems. In this article, I wish to discuss the framework of human-aware Al (HAAI) that aims to provide a unified formal framework to understand and evaluate human-AI interaction. We will see how this framework can be used to both understand explainability and value alignment and how the framework also lays out potential novel avenues to address these problems.
This paper explores how AI policy documents mediate the stabilization of socio-technical assemblages. It does so by developing the theory-methods package of ‘discursive infrastructuring’ and applying ...it to the U.K.’s National AI Strategy. By centering the conceptual slipperiness of emerging technologies such as AI, this framework sheds light on how policy documents work to stabilize emerging socio-technical assemblages comprising specific actors, ideologies, flows of capital, and relationships of power. In the context of the National AI Strategy, discursive infrastructuring reveals how the document stabilises: AI as an autonomous and inevitable force; a technical/social dualism which privileges the technical over the social in driving innovation; the ‘heroic engineer’ as an individual, masculine and rational archetype; and, the U.K. as a dominant and modernising player on AI’s global stage. This assemblage does not only exist in the document’s words; it is translated into practice through the funding of institutions, the centring of technical pedagogies of AI, and the opening of visa routes for ‘globally mobile individuals’. The application of ‘discursive infrastructuring’ to the National AI Strategy thus elucidates the constitutive role of policy discourse in stabilising politically situated material-semiotic conceptions of AI.