Many existing methods for frame deletion detection attempt to detect abnormal periodical artifacts in video stream, however, due to a number of reasons, the periodical artifacts can not always be ...reliably detected. In this paper, we propose a new method for frame deletion detection. Rather than detecting abnormal periodical artifacts, we devise two features to measure the magnitude of variation in prediction residual and the number of intra macro blocks. Based on the devised features, we propose a fused index to capture abnormal abrupt changes in video streams. We create a dataset which consists of 6 subsets, and test the detection capability of our method in both video level and GOP (Group of Pictures) level. The experimental results show that the proposed method performs stably under various configurations.
With increasing availability and use of Internet of Things (IoT) devices such as sensors and video cameras, large amounts of streaming data is now being produced at high velocity. Applications which ...require low latency response such as video surveillance, augmented reality and autonomous vehicles demand a swift and efficient analysis of this data. Existing approaches employ cloud infrastructure to store and perform machine learning-based analytics on this data. This centralized approach has limited ability to support real-time analysis of large-scale streaming data due to network bandwidth and latency constraints between data source and cloud. We propose RealEdgeStream (RES) an edge enhanced stream analytics system for large-scale, high performance data analytics. The proposed approach investigates the problem of video stream analytics by proposing (i) filtration and (ii) identification phases. The filtration phase reduces the amount of data by filtering low-value stream objects using configurable rules. The identification phase uses deep learning inference to perform analytics on the streams of interest. The phases consist of stages which are mapped onto available in-transit and cloud resources using a placement algorithm to satisfy the Quality of Service (QoS) constraints identified by a user. We demonstrate that for a 10K element data streams, with a frame rate of 15-100 per second, the job completion in the proposed system takes 49 percent less time and saves 99 percent bandwidth compared to a centralized cloud-only based approach.
Video streaming service delivery is a challenging task for mobile network operators. Knowing which services clients are using could help ensure a specific quality of service and manage the users' ...experience. Additionally, mobile network operators could apply throttle, traffic prioritization, or differentiated pricing. However, due to the growth of encrypted Internet traffic, it has become difficult for network operators to recognize the type of service used by their clients. In this article, we propose and evaluate a method for recognizing video streams solely based on the shape of the bitstream on a cellular network communication channel. To classify bitstreams, we used a convolutional neural network that was trained on a dataset of download and upload bitstreams collected by the authors. We demonstrate that our proposed method achieves an accuracy of over 90% in recognizing video streams from real-world mobile network traffic data.
Detecting actions in videos have been widely applied in on-device applications, such as cars, robots, etc. Practical on-device videos are always untrimmed with both action and background. It is ...desirable for a model to both recognize the class of action and localize the temporal position where the action happens. Such a task is called temporal action location (TAL), which is always trained on the cloud where multiple untrimmed videos are collected and labeled. It is desirable for a TAL model to continuously and locally learn from new data, which can directly improve the action detection precision while protecting customers’ privacy. However, directly training a TAL model on the device is nontrivial. To train a TAL model which can precisely recognize and localize each action, tremendous video samples with temporal annotations are required. However, annotating videos frame by frame is exorbitantly time consuming and expensive. Although weakly supervised temporal action localization (W-TAL) has been proposed to learn from untrimmed videos with only video-level labels, such an approach is also not suitable for on-device learning scenarios. In practical on-device learning applications, data are collected in streaming. For example, the camera on the device keeps collecting video frames for hours or days, and the actions of nearly all classes are included in a single long video stream. Dividing such a long video stream into multiple video segments requires lots of human effort, which hinders the exploration of applying the TAL tasks to realistic on-device learning applications. To enable W-TAL models to learn from a long, untrimmed streaming video, we propose an efficient video learning approach that can directly adapt to new environments. We first propose a self-adaptive video dividing approach with a contrast score-based segment merging approach to convert the video stream into multiple segments. Then, we explore different sampling strategies on the TAL tasks to request as few labels as possible. To the best of our knowledge, we are the first attempt to directly learn from the on-device, long video stream. Experimental results on the THUMOS’14 dataset show that the performance of our approach is comparable to the current W-TAL state-of-the-art (SOTA) work without any laborious manual video splitting.
Most of the video content on the Internet today is distributed through online streaming platforms. To ensure user privacy, data transmissions are often encrypted using cryptographic protocols. In ...previous research, we first experimentally validated the idea that the amount of transmitted data belonging to a particular video stream is not constant over time or that it changes periodically and forms a specific fingerprint. Based on the knowledge of the fingerprint of a specific video stream, this video stream can be subsequently identified. Over several months of intensive work, our team has created a large dataset containing a large number of video streams that were captured by network traffic probes during their playback by end users. The video streams were deliberately chosen to fall thematically into pre-selected categories. We selected two primary platforms for streaming - PeerTube and YouTube The first platform was chosen because of the possibility of modifying any streaming parameters, while the second one was chosen because it is used by many people worldwide. Our dataset can be used to create and train machine learning models or heuristic algorithms, allowing encrypted video stream identification according to their content resp. type category or specifically.
In recent years, crowdsourced livecast has seen remarkable progress due to the interactivity and real-time nature, playing an essential role in multimedia applications in the post-epidemic era. Given ...the delay sensitivity, large viewing volumes, and heterogeneous viewing patterns, the traditional video streaming methods fail to provide the optimized quality of experience (QoE) for viewers using the minimum system cost over an edge-assisted service architecture. The emerging technology of mobile edge computing (MEC) offers a new perspective of reducing user latency and enhancing the quality of dispatched videos in a promising way. In this paper, we propose Proffler, an integrated framework that addresses this problem through effective stream caching at the network edge server. We first examine the underlying correlations in viewing patterns across different regions and propose a novel transformer-based algorithm, Chili-TF, that achieves accurate viewer request prediction, even for regions with insufficient data. We then design a scalable algorithm, U2VR, that achieves near-optimal video stream allocation as well as viewer scheduling. Extensive real-data-driven experiments further confirm that Proffler can achieve improvements of 20%-55% in average QoE compared to state-of-the-art solutions.
•We propose an approach to compensate the visual defects caused by metamorphopsia•Our approach enables interactive measurement of distortion in user’s visual field•We compensate the warped visual ...field through a real-time processing of video streams•We conducted an experiment on 17 patients affected by metamorphopsia•The results show the proposed system is able to reduce visual field distortion
Advances in Augmented Reality technologies and, particularly, the availability of video see-through enabled head mounted displays (HMD), are allowing to devise new strategies to help individuals with visual impairments in daily life. In this work, an approach is proposed to compensate a serious visual impairment, known as metamorphopsia, a vision disorder characterized by deformed images. The goal is to provide patients with a digitally restored visual field, through real-time processing of video see-through streams captured from the HMD. To this regard, we present two contributions, respectively, an interactive discrete modeling of patient’s eye-specific vision distortion and a compensation of the latter by means of corresponding real-time counter-distortion of incoming frames. Our approach, indeed, maps each of the video streams acquired by the stereoscopic video see-through cameras aboard the headset on a 2D polygonal mesh which is then counter-warped by moving its vertices based on the previously built distortion model and then displayed, restored, on the HMD’s screen. First user evaluations report promising results along with usability issues related to HMD technology.
Forensic examination of digital audio, video, and images frequently requires transforming multimedia data from one format to another. The transformative activity may cause changes to the ...administrative elements of the file but leave the multimedia streams unchanged and intact. However, the forensic science community has a method knowledge gap in accurately determining if the multimedia streams changed or remained the same in the transformative processes. This paper illustrates the practical use of multimedia stream hashing as a forensic method for verifying multimedia content. A universal stream hashing tool decodes the multimedia stream data at rest in a file container. Subsequently, it calculates the data stream hash using reference video, audio, and image codecs. This paper illustrates that the multimedia stream hashing method can accurately confirm the integrity of digital images, videos, and audio following transmission, transcoding, or re‐containerization. Our findings confirmed that stream hashing could accurately detect changes in multimedia streams during transcoding. Furthermore, the stream hashing method can also accurately detect matching multimedia streams. In addition, this paper verified the forensic use of the multimedia stream hash method while establishing the error rate for its use. The hash algorithms used in stream hashing have zero false negative rates by design. However, the false positive (error rate) is extremely low and depends on hashing algorithm. Finally, we recommend the forensic science community adopt the multimedia stream hashing method as an initial testing method. The method can verify a multimedia stream's conversion (transcoding) from one codec to another using FFmpeg.
Highlights
Re‐encoding of multimedia streams must be transparent for forensic examinations.
Multimedia stream hashing is a stream integrity verification method used in forensic science.
Multimedia stream hashing has an extremely low potential for a false positive (error rate).
Video analytics plays a crucial role in the development of smart estates and cities. Applications such as garbage dumping detection, lift monitoring, safety surveillance, etc. rely on video ...analytics, and require fast response time. Traditional cloud-based systems are ill-suited for these applications due to their limitations in handling large volume of video data with low latency. In contrast, the IoT-Edge-Cloud paradigm is better suited for such applications, but it presents challenges such as system heterogeneity, and resource allocation and orchestration. There is a need for an efficient platform for distributed video stream processing where resource orchestration and scalability aspects are tailored to smart estate applications.In this paper, we present ViEdge, an edge-based platform for video analytics applications in smart estates. It is highly adaptable and scalable, making it ideal for various deployments in such environments. Our implementation of ViEdge utilizes Kubernetes (K8s) for resource management and orchestration, and Apache Storm for distributed video stream processing. To study ViEdge's customization capabilities, we evaluated its performance on a heterogeneous edge testbed. We observed increased latency in Apache Storm when integrated with Kubernetes, affecting overall application performance. However, by developing a heuristic-based scheduler, we demonstrate that ViEdge effectively reduces end-to-end latency and enhances frame processing rates.
Video streams can come from various sources, such as surveillance cameras, live events, drones, and video-sharing platforms. Video stream mining is challenging due to the extensive resources needed ...to analyze and extract useful information from continuous video data streams. This situation could result in overwhelmed resources, which causes the system to stall. One of the ways to suffice the requirement is to provide larger resources, which leads to more costs. This research develops a data stream mining called the Resource-Aware Video Streaming (RAViS) framework to adapt to the limited resources (a Raspberry Pi) to run an object detection system using the YOLO algorithm. We validate the framework by capturing video streaming to simulate data streams. The video frames are processed using a deep-learning model to recognize the presence of a person(s) in a room. The RAViS framework adapts the object detection system to the availability of Raspberry Pi resources, such as CPU, RAM, and internal storage. The adaptation aims to increase the availability of resources to perform object detection of streamed video. The experimental results indicate that the RAViS framework can adapt the detection system to resource availability while maintaining accuracy.
•A framework can ensure the availability of a computer with limited resources for running an object detection system using deep learning algorithms.•The framework constantly monitors the computer's memory, CPU, and storage, and provides feedback to the object detection system for adjusting its parameters to optimize resource utilization.•This approach enables the object detection system to operate continuously with the required resources, thus ensuring its accuracy and effectiveness.
Display omitted