In this study, we present a metainformation extraction pipeline and a Proof of Concept (PoC) implementation of a recognizer capable of classifying video titles, series titles as well as video genres ...from encrypted video streams. We show in a promising evaluation, using the Netflix and SVT Play catalogues as examples, that our PoC is capable of learning abstract data from the packet bursts visible in DASH encrypted streams (such as if a video fingerprint is rather a Drama or a Romance movie). This is, to the best of our knowledge, the first demonstration of successful extraction of video metainformation from coarse-grain encrypted network packet traces. While advocating updates in the DASH protocol in order to preserve viewers' privacy, our results also pave the way to future computer forensics systems capable of successful content classification over encrypted channels.
Adversarial attacks on machine learning models have proven to be a major contributor for the lack of actual deployment and adoption of ML in many practical use-cases. They have been found to be ...equally lethal in video streams as they are in images and texts. This finds particular relevance in the domain of autonomous driving tasks, which make use of object recognition neural architectures. In this paper, we take a step back from the cat and mouse chase of novel attacks and ad-hoc defenses and try to explain adversarial attacks from the perspective of the geometry of the high-dimensional spaces that the models operate in. Additionally, we make use of our idea of relating adversarial attacks to dimensionality to propose a counter-measure that uses dimension reduction. We have tested our proposition on state-of-the-art object detection and classification models on video streams including Faster-RCNN and YOLO and corresponding adversarial attacks on these models. Having optimally tuned the hyper-parameter associated with variability preservation upon dimension reduction using simple Singular Value Decomposition, we have shown that the performance of the robust version of the object detector models is within 2−3% of that on the clean samples, despite the presence of adversarial perturbation.
Face masks are necessary during the worldwide pandemic to prevent the transmission of infectious diseases. This research proposes a deep learning-based system for detecting face masks in live video ...feeds in real-time. The system's goal is to automatically determine if people on a video broadcast are hiding their faces behind masks. To do this, a deep convolutional neural network architecture is sued and is trained on a huge dataset of annotated photos that include both masked and unmasked faces. The existing systems struggle to handle large-scale deployment and real-time processing of video streams for face mask detection. The network was built to learn features that reliably identify masks or no masks on faces. The transfer learning method is employed for fine-tuning in order to enhance the network's ability to generalize. Further, this study employs a powerful detection pipeline that uses hardware acceleration and parallel processing approaches to deal with the real-time nature of video streams. Since the channel processes video frames in real-time, it can be used in places where fast detection is critical, such as hospitals, airports, and other public buildings. To measure the proposed system's efficacy, large trials are run over various video datasets, each with its unique combination of circumstances and camera angles. The findings show that the proposed Convolutional neural network method is effective, comparing with Dataset-1 and Dataset-2 with real-time processing speed and good accuracy. Solution is accurate 95% to 99% and efficient than the current techniques.
Edge computing (EC) is a promising paradigm for serving latency-sensitive video applications. However, massive compressed video transmission and analysis require considerable bandwidth and computing ...resources, posing enormous challenges for current multimedia frameworks. Novel multi-stream frameworks that incorporate feature streams are more practical. The reason is that feature streams containing compact video frame feature data have a lower bitrate and better serve machine vision tasks. Nevertheless, feature extraction by devices increases the latency and energy consumption of local computing. Therefore, how to offload suitable streams according to video task requirements and system resources is a challenging issue. This paper studies EC-based multi-stream adaptive offloading. We model the multi-stream offloading and computation problem to maximize system utility by jointly optimizing offloading decisions, computation resource allocation, and video frame sampling rates. Frame sampling rates, processing latency, and energy consumption are considered in system utility modeling. The formulated optimization problem is a mixed-integer programming (MIP) problem. We propose an efficient algorithm to address this MIP problem. The proposed algorithm relies on the Hungarian algorithm and improved greedy Markov approximation. The simulation results validate our proposed algorithm's superior performance.
Human action detection and recognition has become a significant topic in computer vision research over the last two decades. Intelligent techniques (such as machine learning and deep learning) have ...grown in popularity as a result of technological advancements in visual data. Due to the exponential growth of video data generated by surveillance cameras, intelligent systems are in great demand for detecting specific human activities. In this work, we critically examine the state-of-the-art (SOA) methodologies for human activity detection (HAD) approaches. The study illustrates a generic classification for HAD methods anda paradigm shift is observed from traditional to modern deep learning-based techniques. We also provide an analysis of publicly available datasets for the classification of human activities. The paper outlines various open research issues as well as future directions for HAD methods. Due to problems such as a dynamic and complex background, camera motion, occlusion, and bad weather, it is evident that detecting human behaviors in surveillance videos is a challenging task. Convolution neural networks (CNNs) are used in the bulk of HAD techniques, prompting researchers to look at sequence learning models like Recurrent Neural Networks (RNNs) and Long-Short-Term Neural Networks (LSTM). Furthermore, our analysis indicates that only a few research articles on the detection of anomalous behavior have been published, with the majority of the work focusing on human action detection. Furthermore, existing HAD deep learning models could be improved by including the notion of transfer learning, which saves training time and enhances accuracy.
Traditional fingerprint acquisition is limited to single-image capture and processing. With the advent of faster capture hardware, faster processors, and advances in video compression standards, ...newer systems can capture and exploit video signals for tasks that are difficult using a single image. We propose the use of fingerprint video sequences to investigate detecting two aspects of the dynamic behavior of fingerprints. Specifically, we are interested in the detection of distortion of fingerprint impressions due to excessive force and the detection of the positioning of fingers during image capture. These issues often lead to difficulties in establishing a precise match between acquired images. The proposed techniques investigate dynamic characteristics of fingerprints across video sequence frames. A significant advantage of our approach for distortion analysis is that it works directly on MPEG-1,-2 encoded fingerprint video bitstreams. The proposed methods have been tested on the NIST-24 live-scan fingerprint video database and the results are promising. We also describe a new concept called the "resultant biometrics", a new type of biometrics which has both a physiological, physical (e.g., force, torque, linear motion, rotation) component and/or a temporal characteristic, added by a subject to an existing biometric. This resultant biometric is both desirable and efficient in terms of easy modification of compromised biometrics and is harder to produce with spoof body parts.
Human detection in video streams is an important task in many applications including video surveillance. Surprisingly, only few papers have been devoted to this topic. This paper presents a new ...approach to detect humans in video streams. Our approach is based on the temporal information present in videos. A background subtraction algorithm is first used to segment the silhouettes of the users and the moving objects. Then a classification process in two steps determines for each connected component if it corresponds to the silhouette of a human or not. During the first step, a probabilistic information is computed for each pixel independently. The information from a subset of pixels is then gathered to predict the class of the observed silhouette. This paper presents the principles and some results obtained on real silhouettes. It is shown that our approach is efficient for the detection of humans in video streams.
Modelling complex events in unstructured data like videos not only requires detecting objects but also the spatiotemporal relationships among objects. Complex Event Processing (CEP) systems ...discretize continuous streams into fixed batches using windows and apply operators over these batches to detect patterns in real-time. To this end, we apply CEP techniques over video streams to identify spatiotemporal patterns by capturing window state. This work introduces a novel problem where an input video stream is converted to a stream of graphs which are aggregated to a single graph over a given state. Incoming video frames are converted to a timestamped Video Event Knowledge Graph (VEKG) 1 that maps objects to nodes and captures spatiotemporal relationships among object nodes. Objects coexist across multiple frames which leads to the creation of redundant nodes and edges at different time instances that results in high memory usage. There is a need for expressive and storage efficient graph model which can summarize graph streams in a single view. We propose Event Aggregated Graph (EAG), a summarized graph representation of VEKG streams over a given state. EAG captures different spatiotemporal relationships among objects using an Event Adjacency Matrix without replicating the nodes and edges across time instances. These enable the CEP system to process multiple continuous queries and perform frequent spatiotemporal pattern matching computations over a single summarised graph. Initial experiments show EAG takes 68.35% and 28.9% less space compared to baseline and state of the art graph summarization method respectively. EAG takes 5X less search time to detect pattern as compare to VEKG stream.
Paper deals with organization of reliable and effective (from CPU usage point of view) transmission of video streams from video cameras on mobile robot to control center, their handling and ...displaying to operator. Used hardware and software described. Adaptive control implemented using advanced software sequencer controller.
In some scenarios, we need to combine multiple video streams and pictures into a single video. In this paper, we develop an application to synthesize two channel video stream and two channel ...pictures. Our application can be used for synthetic display of multiple network video streams and offline video synthetic storage. As the single thread decoding of multiple video streams takes a lot of time, we propose the use of a multithreading producer-consumer pattern. It makes full use of the advantages of multi-core CPU and improves the programming efficiency. In our experiments, we compare video synthesis through single thread and multithreading. Multithreading is much faster than single threading.