Crowd behaviour analysis is an emerging research area. Due to its novelty, a proper taxonomy to organise its different sub-tasks is still missing. This paper proposes a taxonomic organisation of ...existing works following a pipeline, where sub-problems in last stages benefit from the results in previous ones. Models that employ Deep Learning to solve crowd anomaly detection, one of the proposed stages, are reviewed in depth, and the few works that address emotional aspects of crowds are outlined. The importance of bringing emotional aspects into the study of crowd behaviour is remarked, together with the necessity of producing real-world, challenging datasets in order to improve the current solutions. Opportunities for fusing these models into already functioning video analytics systems are proposed.
•Proposal of hierarchical taxonomy for crowd behaviour analysis subtasks.•Review and numeric comparison of Deep Learning models for crowd anomaly detection.•Discussion of current limitations in datasets and importance of going beyond.•Discussion of the importance of using emotional aspects in crowd behaviour analysis.•Proposals of fusion crowd analysis models into existing video analytics solutions.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•Study the performance of promising CNNs in the classification of coral texture images.•Analyze different types of transfer learning.•Analyze data augmentation on the performance of the coral ...classification model.•Experimental results outperform state-of-the-art methods needing human intervention.•Generalize the best approach to other coral texture datasets.
Display omitted
The recognition of coral species based on underwater texture images poses a significant difficulty for machine learning algorithms, due to the three following challenges embedded in the nature of this data: (1) datasets do not include information about the global structure of the coral; (2) several species of coral have very similar characteristics; and (3) defining the spatial borders between classes is difficult as many corals tend to appear together in groups. For this reasons, the classification of coral species has always required an aid from a domain expert. The objective of this paper is to develop an accurate classification model for coral texture images. Current datasets contain a large number of imbalanced classes, while the images are subject to inter-class variation. We have focused on the current small datasets and analyzed (1) several Convolutional Neural Network (CNN) architectures, (2) data augmentation techniques and (3) transfer learning approaches. We have achieved the state-of-the art accuracies using different variations of ResNet on the two small coral texture datasets, EILAT and RSMAS.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•Deep Neural Networks (DNNs) trained on very small datasets suffers from overfitting•This problem is greater when different classes share too many visual features•Class-inherent transformation ...generators improve the generalization capacity of DNN•Our approach is complementary to transfer learning and data augmentation
It is widely known that very small datasets produce overfitting in Deep Neural Networks (DNNs), i.e., the network becomes highly biased to the data it has been trained on. This issue is often alleviated using transfer learning, regularization techniques and/or data augmentation. This work presents a new approach, independent but complementary to the previous mentioned techniques, for improving the generalization of DNNs on very small datasets in which the involved classes share many visual features. The proposed model, called FuCiTNet (Fusion Class inherent Transformations Network), inspired by GANs, creates as many generators as classes in the problem. Each generator, k, learns the transformations that bring the input image into the k-class domain. We introduce a classification loss in the generators to drive the leaning of specific k-class transformations. Our experiments demonstrate that the proposed transformations improve the generalization of the classification model in three diverse datasets.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The latest Deep Learning (DL) models for detection and classification have achieved an unprecedented performance over classical machine learning algorithms. However, DL models are black-box methods ...hard to debug, interpret, and certify. DL alone cannot provide explanations that can be validated by a non technical audience such as end-users or domain experts. In contrast, symbolic AI systems that convert concepts into rules or symbols – such as knowledge graphs – are easier to explain. However, they present lower generalization and scaling capabilities. A very important challenge is to fuse DL representations with expert knowledge. One way to address this challenge, as well as the performance-explainability trade-off is by leveraging the best of both streams without obviating domain expert knowledge. In this paper, we tackle such problem by considering the symbolic knowledge is expressed in form of a domain expert knowledge graph. We present the eXplainable Neural-symbolic learning (X-NeSyL) methodology, designed to learn both symbolic and deep representations, together with an explainability metric to assess the level of alignment of machine and human expert explanations. The ultimate objective is to fuse DL representations with expert domain knowledge during the learning process so it serves as a sound basis for explainability. In particular, X-NeSyL methodology involves the concrete use of two notions of explanation, both at inference and training time respectively: (1) EXPLANet: Expert-aligned eXplainable Part-based cLAssifier NETwork Architecture, a compositional convolutional neural network that makes use of symbolic representations, and (2) SHAP-Backprop, an explainable AI-informed training procedure that corrects and guides the DL process to align with such symbolic representations in form of knowledge graphs. We showcase X-NeSyL methodology using MonuMAI dataset for monument facade image classification, and demonstrate that with our approach, it is possible to improve explainability at the same time as performance.
•EXplainable Neural-symbolic Learning methodology fuses deep learning and symbolic representations.•EXPLANet’s compositional part-based object detection and classification outperforms regular classification.•SHAP-Backprop aligns model output with expert knowledge in a knowledge graph.•SHAP Graph Edit Distance quantifies the alignment between a knowledge graph and neural representations.•X-NeSyL shows it is possible to improve over both explainability and performance.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•A labeled database for cold steel detection.•Selection of the best model for cold steel weapon detection.•A new brightness guided preprocessing procedure, called Darkening and Contrast at Learning ...and Test (DaCoLT).•A real time cold steel detection system for surveillance videos.
The automatic detection of cold steel weapons handled by one or multiple persons in surveillance videos can help reducing crimes. However, the detection of these metallic objects in videos faces an important problem: their surface reflectance under medium to high illumination conditions blurs their shapes in the image and hence makes their detection impossible. The objective of this work is two-fold: (i) To develop an automatic cold steel weapon detection model for video surveillance using Convolutional Neural Networks(CNN) and (ii) strengthen its robustness to light conditions by proposing a brightness guided preprocessing procedure called DaCoLT (Darkening and Contrast at Learning and Test stages). The obtained detection model provides excellent results as cold steel weapon detector and as automatic alarm system in video surveillance.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
Applying CNN-based object detection models to the task of weapon detection in video-surveillance is still producing a high number of false negatives. In this context, most existing works focus on one ...type of weapons, mainly firearms, and improve the detection using different pre- and post-processing strategies. One interesting approach that has not been explored in depth yet is the exploitation of the human pose information for improving weapon detection. This paper proposes a top-down methodology that first determines the hand regions guided by the human pose estimation then analyzes those regions using a weapon detection model. For an optimal localization of each hand region, we defined a new factor, called Adaptive pose factor, that takes into account the distance of the body from the camera. Our experiments show that this top-down Weapon Detection over Pose Estimation (WeDePE) methodology is more robust than the alternative bottom-up approach and state-of-the art detection models in both indoor and outdoor video-surveillance scenarios.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
The availability of high-resolution satellite images has accelerated the creation of new datasets designed to tackle broader remote sensing (RS) problems. Although popular tasks like scene ...classification have received significant attention, the recent release of the Land-1.0 RS dataset (https://doi.org/10.5281/zenodo.7858952) marks the initiation of endeavors to estimate land-use and land-cover (LULC) fraction values per RGB satellite image. This challenging problem involves estimating LULC composition, i.e., the proportion of different LULC classes from satellite imagery, with major applications in environmental monitoring, agricultural/urban planning, and climate change studies. Currently, supervised deep learning models-the state-of-the-art in image classification-require large volumes of labeled training data to provide good generalization. To face the challenges posed by the scarcity of labeled RS data, self-supervised learning (SSL) models have recently emerged, learning directly from unlabeled data by leveraging the underlying structure. This is the first paper to investigate the performance of SSL in LULC fraction estimation on RGB satellite patches using in-domain knowledge. We also performed a complementary analysis on LULC scene classification. Specifically, we pretrained Barlow Twins, MoCov2, SimCLR, and SimSiam SSL models with ResNet-18 using the Sentinel2GlobalLULC small RS dataset (https://doi.org/10.5281/zenodo.6941662) and then performed transfer learning to downstream tasks on Land-1.0. Our experiments demonstrate that SSL achieves competitive or slightly better results when trained on a smaller high-quality in-domain dataset of 194,877 samples compared to the supervised model trained on ImageNet-1k with 1,281,167 samples. This outcome highlights the effectiveness of SSL using in-distribution datasets, demonstrating efficient learning with fewer but more relevant data.
Much has been said about the fusion of bio-inspired optimization algorithms and Deep Learning models for several purposes: from the discovery of network topologies and hyperparametric configurations ...with improved performance for a given task, to the optimization of the model’s parameters as a replacement for gradient-based solvers. Indeed, the literature is rich in proposals showcasing the application of assorted nature-inspired approaches for these tasks. In this work we comprehensively review and critically examine contributions made so far based on three axes, each addressing a fundamental question in this research avenue: (a) optimization and taxonomy (Why?), including a historical perspective, definitions of optimization problems in Deep Learning, and a taxonomy associated with an in-depth analysis of the literature, (b) critical methodological analysis (How?), which together with two case studies, allows us to address learned lessons and recommendations for good practices following the analysis of the literature, and (c) challenges and new directions of research (What can be done, and what for?). In summary, three axes – optimization and taxonomy, critical analysis, and challenges – which outline a complete vision of a merger of two technologies drawing up an exciting future for this area of fusion research.
•We thoroughly examine the fusion between Deep Learning and bioinspired optimization.•Definitions and a taxonomy of Deep Learning optimization problems are provided.•We perform a critical methodological analysis of contributions made so far.•Learned lessons and recommendations are drawn from our analysis and two study cases.•Challenges and research directions are given in this fusion of technologies.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•This paper proposes a novel binocular image approach that makes the detection model focus on the area of interest.•We built a low cost symmetric dual camera system to compute the disparity map and ...exploit that information to improve the selection of candidate regions in the input frames.•The proposed approach reduces the number of false positives in the test videos by 49.47%.
Object detection models have known important improvements in the recent years. The state-of-the art detectors are end-to-end Convolutional Neural Network based models that reach good mean average precisions, around 73%, on benchmarks of high quality images. However, these models still produce a large number of false positives in low quality videos such as, surveillance videos. This paper proposes a novel image fusion approach to make the detection model focus on the area of interest where the action is more likely to happen in the scene. We propose building a low cost symmetric dual camera system to compute the disparity map and exploit this information to improve the selection of candidate regions from the input frames. From our results, the proposed approach not only reduces the number of false positives but also improves the overall performance of the detection model which make it appropriate for object detection in surveillance videos.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
An important part of art history can be discovered through the visual information in monument facades. However, the analysis of this visual information, i.e, morphology and architectural elements, ...requires high expert knowledge. An automatic system for identifying the architectural style or detecting the architectural elements of a monument based on one image will certainly help improving our knowledge in art and history. Building such tool is challenging as some styles share architectural elements, the bad conservation state of some monuments and the noise included in the image itself. The aim of this paper is to introduce MonuMAI (Monument with Mathematics and Artificial Intelligence) framework. In particular, (i) we designed MonuMAI dataset rich with expert knowledge considering the proposed architectural styles taxonomy and key elements relationship, which allows addressing several tasks, e.g., monument style classification and architectural elements detection, (ii) we developed MonuMAI deep learning pipeline based on lightweight MonuNet architecture for monument style classification and MonuMAI Key Elements Detection (MonuMAI-KED) model, and (iii) we built citizen science based MonuMAI mobile app that uses the proposed MonuMAI deep learning pipeline trained on MonuMAI dataset for performing in real life conditions. Our experiments show that both MonuNet architecture and the detection model achieve very good results under real life conditions.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP