While color information is known to provide rich discriminative clues for visual inference, most modern visual trackers limit themselves to the grayscale realm. Despite recent efforts to integrate ...color in tracking, there is a lack of comprehensive understanding of the role color information can play. In this paper, we attack this problem by conducting a systematic study from both the algorithm and benchmark perspectives. On the algorithm side, we comprehensively encode 10 chromatic models into 16 carefully selected state-of-the-art visual trackers. On the benchmark side, we compile a large set of 128 color sequences with ground truth and challenge factor annotations (e.g., occlusion). A thorough evaluation is conducted by running all the color-encoded trackers, together with two recently proposed color trackers. A further validation is conducted on an RGBD tracking benchmark. The results clearly show the benefit of encoding color information for tracking. We also perform detailed analysis on several issues, including the behavior of various combinations between color model and visual tracker, the degree of difficulty of each sequence for tracking, and how different challenge factors affect the tracking performance. We expect the study to provide the guidance, motivation, and benchmark for future work on encoding color in visual tracking.
In the billions of faces that are shaped by thousands of different cultures and ethnicities, one thing remains universal: the way emotions are expressed. To take the next step in human-machine ...interactions, a machine (e.g., a humanoid robot) must be able to clarify facial emotions. Allowing systems to recognize micro-expressions affords the machine a deeper dive into a person's true feelings, which will take human emotion into account while making optimal decisions. For instance, these machines will be able to detect dangerous situations, alert caregivers to challenges, and provide appropriate responses. Micro-expressions are involuntary and transient facial expressions capable of revealing genuine emotions. We propose a new hybrid neural network (NN) model capable of micro-expression recognition in real-time applications. Several NN models are first compared in this study. Then, a hybrid NN model is created by combining a convolutional neural network (CNN), a recurrent neural network (RNN, e.g., long short-term memory (LSTM)), and a vision transformer. The CNN can extract spatial features (within a neighborhood of an image), whereas the LSTM can summarize temporal features. In addition, a transformer with an attention mechanism can capture sparse spatial relations residing in an image or between frames in a video clip. The inputs of the model are short facial videos, while the outputs are the micro-expressions recognized from the videos. The NN models are trained and tested with publicly available facial micro-expression datasets to recognize different micro-expressions (e.g., happiness, fear, anger, surprise, disgust, sadness). Score fusion and improvement metrics are also presented in our experiments. The results of our proposed models are compared with that of literature-reported methods tested on the same datasets. The proposed hybrid model performs the best, where score fusion can dramatically increase recognition performance.
While Internet of Things (IoT) technology has been widely recognized as an essential part of Smart Cities, it also brings new challenges in terms of privacy and security. Access control (AC) is among ...the top security concerns, which is critical in resource and information protection over IoT devices. Traditional access control approaches, like Access Control Lists (ACL), Role-based Access Control (RBAC) and Attribute-based Access Control (ABAC), are not able to provide a scalable, manageable and efficient mechanism to meet the requirements of IoT systems. Another weakness in today’s AC is the centralized authorization server, which can cause a performance bottleneck or be the single point of failure. Inspired by the smart contract on top of a blockchain protocol, this paper proposes BlendCAC, which is a decentralized, federated capability-based AC mechanism to enable effective protection for devices, services and information in large-scale IoT systems. A federated capability-based delegation model (FCDM) is introduced to support hierarchical and multi-hop delegation. The mechanism for delegate authorization and revocation is explored. A robust identity-based capability token management strategy is proposed, which takes advantage of the smart contract for registration, propagation, and revocation of the access authorization. A proof-of-concept prototype has been implemented on both resources-constrained devices (i.e., Raspberry PI nodes) and more powerful computing devices (i.e., laptops) and tested on a local private blockchain network. The experimental results demonstrate the feasibility of the BlendCAC to offer a decentralized, scalable, lightweight and fine-grained AC solution for IoT systems.
Recent advances in wireless technologies have led to several autonomous deployments of such networks. As nodes across distributed networks must co-exist, it is important that all transmitters and ...receivers are aware of their radio frequency (RF) surroundings so that they can adapt their transmission and reception parameters to best suit their needs. To this end, machine learning techniques have become popular as they can learn, analyze and predict the RF signals and associated parameters that characterize the RF environment. However, in the presence of adversaries, malicious activities such as jamming and spoofing are inevitable, making most machine learning techniques ineffective in such environments. In this paper we propose the Radio Frequency Adversarial Learning (RFAL) framework for building a robust system to identify rogue RF transmitters by designing and implementing a generative adversarial net (GAN). We hope to exploit transmitter specific "signatures" like the in-phase (I) and quadrature (Q) imbalance (i.e., the I/Q imbalance ) present in all transmitters for this task, by learning feature representations using a deep neural network that uses the I/Q data from received signals as input. After detection and elimination of the adversarial transmitters RFAL further uses this learned feature embedding as "fingerprints" for categorizing the trusted transmitters. More specifically, we implement a generative model that learns the sample space of the I/Q values of known transmitters and uses the learned representation to generate signals that imitate the transmissions of these transmitters. We program 8 universal software radio peripheral (USRP) software defined radios (SDRs) as trusted transmitters and collect "over-the-air" raw I/Q data from them using a Realtek Software Defined Radio (RTL-SDR), in a laboratory setting. We also implement a discriminator model that discriminates between the trusted transmitters and the counterfeit ones with 99.9% accuracy and is trained in the GAN framework using data from the generator. Finally, after elimination of the adversarial transmitters, the trusted transmitters are classified using a convolutional neural network (CNN), a fully connected deep neural network (DNN) and a recurrent neural network (RNN) to demonstrate building of an end-to-end robust transmitter identification system with RFAL. Experimental results reveal that the CNN, DNN, and RNN are able to correctly distinguish between the 8 trusted transmitters with 81.6%, 94.6% and 97% accuracy respectively. We also show that better "trusted transmission" classification accuracy is achieved for all three types of neural networks when data from two different types of transmitters (different manufacturers) are used rather than when using the same type of transmitter (same manufacturer).
Recently, sparse representation has been applied to visual tracking to find the target with the minimum reconstruction error from a target template subspace. Though effective, these L1 trackers ...require high computational costs due to numerous calculations for l 1 minimization. In addition, the inherent occlusion insensitivity of the l 1 minimization has not been fully characterized. In this paper, we propose an efficient L1 tracker, named bounded particle resampling (BPR)-L1 tracker, with a minimum error bound and occlusion detection. First, the minimum error bound is calculated from a linear least squares equation and serves as a guide for particle resampling in a particle filter (PF) framework. Most of the insignificant samples are removed before solving the computationally expensive l 1 minimization in a two-step testing. The first step, named τ testing, compares the sample observation likelihood to an ordered set of thresholds to remove insignificant samples without loss of resampling precision. The second step, named max testing, identifies the largest sample probability relative to the target to further remove insignificant samples without altering the tracking result of the current frame. Though sacrificing minimal precision during resampling, max testing achieves significant speed up on top of τ testing. The BPR-L1 technique can also be beneficial to other trackers that have minimum error bounds in a PF framework, especially for trackers based on sparse representations. After the error-bound calculation, BPR-L1 performs occlusion detection by investigating the trivial coefficients in the l 1 minimization. These coefficients, by design, contain rich information about image corruptions, including occlusion. Detected occlusions are then used to enhance the template updating. For evaluation, we conduct experiments on three video applications: biometrics (head movement, hand holding object, singers on stage), pedestrians (urban travel, hallway monitoring), and cars in traffic (wide area motion imagery, ground-mounted perspectives). The proposed BPR-L1 method demonstrates an excellent performance as compared with nine state-of-the-art trackers on eleven challenging benchmark sequences.
We describe two advanced video analysis techniques, including video-indexed by voice annotations (VIVA) and multi-media indexing and explorer (MINER). VIVA utilizes analyst call-outs (ACOs) in the ...form of chat messages (voice-to-text) to associate labels with video target tracks, to designate spatial-temporal activity boundaries and to augment video tracking in challenging scenarios. Challenging scenarios include low-resolution sensors, moving targets and target trajectories obscured by natural and man-made clutter. MINER includes: (1) a fusion of graphical track and text data using probabilistic methods; (2) an activity pattern learning framework to support querying an index of activities of interest (AOIs) and targets of interest (TOIs) by movement type and geolocation; and (3) a user interface to support streaming multi-intelligence data processing. We also present an activity pattern learning framework that uses the multi-source associated data as training to index a large archive of full-motion videos (FMV). VIVA and MINER examples are demonstrated for wide aerial/overhead imagery over common data sets affording an improvement in tracking from video data alone, leading to 84% detection with modest misdetection/false alarm results due to the complexity of the scenario. The novel use of ACOs and chat Sensors 2014, 14 19844 messages in video tracking paves the way for user interaction, correction and preparation of situation awareness reports.
In the Information Age, the widespread usage of blackbox algorithms makes it difficult to understand how data is used. The practice of sensor fusion to achieve results is widespread, as there are ...many tools to further improve the robustness and performance of a model. In this study, we demonstrate the utilization of a Long Short-Term Memory (LSTM-CCA) model for the fusion of Passive RF (P-RF) and Electro-Optical (EO) data in order to gain insights into how P-RF data are utilized. The P-RF data are constructed from the in-phase and quadrature component (I/Q) data processed via histograms, and are combined with enhanced EO data via dense optical flow (DOF). The preprocessed data are then used as training data with an LSTM-CCA model in order to achieve object detection and tracking. In order to determine the impact of the different data inputs, a greedy algorithm (explainX.ai) is implemented to determine the weight and impact of the canonical variates provided to the fusion model on a scenario-by-scenario basis. This research introduces an explainable LSTM-CCA framework for P-RF and EO sensor fusion, providing novel insights into the sensor fusion process that can assist in the detection and differentiation of targets and help decision-makers to determine the weights for each input.
Situational awareness (SA) is defined as the perception of entities in the environment, comprehension of their meaning, and projection of their status in near future. From an Air Force perspective, ...SA refers to the capability to comprehend and project the current and future disposition of red and blue aircraft and surface threats within an airspace. In this article, we propose a model for SA and dynamic decision-making that incorporates artificial intelligence and dynamic data-driven application systems to adapt measurements and resources in accordance with changing situations. We discuss measurement of SA and the challenges associated with quantification of SA. We then elaborate a plethora of techniques and technologies that help improve SA ranging from different modes of intelligence gathering to artificial intelligence to automated vision systems. We then present different application domains of SA including battlefield, gray zone warfare, military- and air-base, homeland security and defense, and critical infrastructure. Finally, we conclude the article with insights into the future of SA.