In the realm of real-time communications, WebRTC-based multimedia applications are increasingly prevalent as these can be smoothly integrated within Web browsing sessions. The browsing experience is ...then significantly improved with respect to scenarios where browser add-ons and/or plug-ins are used; still, the end user's Quality of Experience (QoE) in WebRTC sessions may be affected by network impairments, such as delays and losses. Due to the variability in user perceptions under different communications scenarios, comprehending and enhancing the resulting service quality is a complex endeavor. To address this, we present a dataset that provides a comprehensive perspective on the conversational quality of a two-party WebRTC-based audiovisual telemeeting service. This dataset was gathered through subjective evaluations involving 20 subjects across 15 different test conditions (TCs). A specialized system was developed to induce controlled network disruptions such as delay, jitter, and packet loss rate, which adversely affected the communication between the parties. This methodology offered an insight into user perceptions under various network impairments. The dataset encompasses a blend of objective and subjective data including ACR (Absolute Category Rating) subjective scores, WebRTC-internals parameters, facial expressions features, and speech features. Consequently, it serves as a substantial contribution to the improvement of WebRTC-based video call systems, offering practical and real-world data that can drive the development of more robust and efficient multimedia communication systems, thereby enhancing the user's experience.
At present, multi-party WebRTC videoconferencing between peers with heterogenous network resources and terminals is enabled over the best-effort Internet using a central selective forwarding unit ...(SFU), where each peer sends a scalable encoded video stream to the SFU. This connection model avoids the upload bandwidth bottleneck associated with mesh connections; however, it increases peer delay and overall network load (resource consumption) in addition to requiring investment in servers since all video traffic must go through SFU servers. To this effect, we propose a new multi-party WebRTC service model over future 5G networks, where a video service provider (VSP) collaborates with a network service providers (NSP) to offer an NSP-managed service to stream scalable video layers using software-defined networking (SDN)-assisted Internet protocol (IP) multicasting between peers using NSP infrastructure. In the proposed service model, each peer sends a scalable coded video upstream, which is selectively duplicated and forwarded as layer streams at SDN switches in the network, instead of at a central SFU, in a multi-party WebRTC session managed by multicast trees maintained by the SDN controller. Experimental results show that the proposed SDN-assisted IP multicast service architecture is more efficient than the SFU model in terms of end-to-end service delay and overall network resource consumption, while avoiding peer upload bandwidth bottleneck and distributing traffic more evenly across the network. The proposed architecture enables efficient provisioning of premium managed WebRTC services over bandwidth-reserved SDN slices to provide videoconferencing experience with guaranteed video quality over 5G networks.
Mixed Reality (MR) video fusion system fuses video imagery with 3D scenes. It makes the scene much more realistic and helps the users understand the video contents and temporalspatial correlation ...between them, thus reducing the user’s cognitive load. Nowadays, MR video fusion has been used in various applications. However, video fusion systems require powerful client machines because video streaming delivery, stitching, and rendering are computation-intensive. Moreover, huge bandwidth usage is also another critical factor that affects the scalability of video fusion systems. The framework proposed in this paper overcomes this client limitation by utilizing remote rendering. Furthermore, the framework we built is based on browsers. Therefore, the user could try the MR video fusion system with a laptop or even pad, no extra plug-ins or application programs need to be installed. Several experiments on diverse metrics demonstrate the effectiveness of the proposed framework.
Since the pandemic began, institutions have been implementing work/study-from-home. They mostly use VoIP applications to do online meetings. This study is trying to discover what metrics are the most ...preferred to measure the quality of VoIP applications by reviewing 38 relevant studies. We found that the most preferred Quality of Experience (QoE) metrics are Mean Opinion Score (MOS), Perceptual Evaluation of Speech Quality (PESQ), and Peak Signal-to-noise Ratio (PSNR). While the most preferred Quality of Service (QoS) metrics are jitter, bandwidth, packet loss, and throughput. We also found that there are not many new studies done on evaluating SIP quality.
In this paper, we consider a system for WebRTC live streaming of real-time captured point cloud data. In particular, to this end, we consider various serialization formats that need to be performed ...in advance. We also propose a new serialization format in this paper and show its effectiveness by comparing it with the conventional formats.
The exponential growth in data generated by satellites, radars, sensors, and analysis and reanalysis from model outputs for the hydrological domain requires efficient real-time data management and ...distribution mechanisms. This paper introduces HydroRTC, a web-based data transfer and communication library designed to accelerate large-scale data sharing and analysis. Leveraging next-generation web technologies like WebSockets, WebRTC and Node.js, the library enables seamless peer-to-peer sharing, smart data transmission, and large dataset streaming. Three primary scenarios are presented as use cases, demonstrating the potential of HydroRTC as server-to-peer with intelligent data scheduling and large data streaming, peer-to-peer data sharing, and peer-to-server for data exchange. HydroRTC offers a promising solution for collaborative infrastructures in the hydrological and environmental domain, allowing real-time and high-throughput data sharing and transfer for enhancing research efficiency and collaboration capabilities.
•HydroRTC accelerates large-scale data sharing with next-gen web technologies.•Three primary scenarios: server-to-peer, peer-to-peer, peer-to-server data exchange.•Promising solution for collaborative infrastructures in hydrological and environmental domains.•Exponential growth in data necessitates efficient real-time management mechanisms.•Leverages WebSockets, WebRTC, Node.js for seamless peer-to-peer sharing.
WebRTC has quickly become popular as a video conferencing platform, partly due to the fact that many browsers support it. WebRTC utilizes the Google Congestion Control (GCC) algorithm to provide ...congestion control for realtime communications over UDP. The performance during a WebRTC call may be influenced by several factors, including the underlyingWebRTC implementation, the device and network characteristics, and the network topology. In this paper, we perform a thorough performance evaluation of WebRTC both in emulated synthetic network conditions as well as in real wired and wireless networks. Our evaluation shows that WebRTC streams have a slightly higher priority than TCP flows when competing with cross traffic. In general, while in several of the considered scenarios WebRTC performed as expected, we observed important cases where there is room for improvement. These include the wireless domain and the newly added support for the video codecs VP9 and H.264 that does not perform as expected.
Cost-effective load testing of WebRTC applications Gortázar, Francisco; Gallego, Micael; Maes-Bermejo, Michel ...
The Journal of systems and software,
November 2022, 2022-11-00, Letnik:
193
Journal Article
Recenzirano
Odprti dostop
Video conference applications and systems implementing the WebRTC W3C standard are becoming more popular and demanded year after year, and load testing them is of paramount importance to ensure they ...can cope with demand. However, this is an expensive activity, usually involving browsers to emulate users.
: to propose browser-less alternative strategies for load testing WebRTC services, and to study performance and costs of those strategies when compared with traditional ones.
(a) Exploring the limits of existing and novel strategies for load testing WebRTC services from a single machine. (b) Comparing the common strategy of using browsers with the best of our proposed strategies in terms of cost in a load testing scenario.
We observed that, using identical machines, our proposed strategies are able to emulate more users than traditional strategies. We also found a huge saving in expenditure for load testing, as our strategy suppose a saving of 96% with respect to usual browser-based strategies. We also found there are almost no differences between the traditional strategies considered.
We provide details on scalability of different load testing strategies in terms of users emulated, as well as CPU and memory used. We could reduce the expenditure of load tests of WebRTC applications.
The usage of live streaming services has led to a substantial increase in live video traffic. However, the perceived quality of experience of users is frequently limited by variations in the upstream ...bandwidth of streamers. To address this issue, several adaptive bitrate (ABR) algorithms have been developed to mitigate bandwidth variations. Nevertheless, the ability of users to enjoy high-quality live streams remains limited. While neural-enhanced approaches, such as super-resolution, offer significant quality improvements, frame-oriented super-resolution leads to excessive inference delay that violates the real-time feature of live streaming. In response, we propose ViChaser, which examines block-oriented super-resolution for live streaming. ViChaser performs neural super-resolution on potential blocks of interest in the media server, corresponding to the user's viewpoint, and uses online learning to adapt to the dynamic content of the video. Additionally, ViChaser utilizes the Lyapunov framework to efficiently allocate uplink bandwidth for original low-quality live video and high-quality labels. The experimental results demonstrate that ViChaser achieves 1.2-1.5 dB higher video quality in Peak-Signal-to-Noise-Ratio than WebRTC and increases processing speed by 11-16 fps relative to LiveNAS.
A robust and language agnostic Voice Activity Detection (VAD) is crucial for Digital Entertainment Content (DEC). Primary examples of DEC include movies and TV series. Some ways in which VAD systems ...are used for DEC creation include augmenting subtitle creation, subtitle drift detection and correction, and audio diarisation. Majority of the previous work on VAD focuses on scenarios that: (a) have minimal background noise, and (b) where the audio content is delivered in English language. However, movies and TV shows can: (a) have substantial amounts of non-voice background signal (e.g. musical score and environmental sounds), and (b) are released worldwide in a variety of languages. This makes most of the previous standard VAD approaches not readily applicable for DEC related applications. Furthermore, there does not exist a comprehensive analysis of Deep Neural Network’s (DNN) performance for the task of VAD applied to DEC. In this work, we present a thorough survey on DNN based VADs on DEC data in terms of their accuracy, Area Under Curve (AUC), noise sensitivity, and language agnostic behaviour. For our analysis we use 1100 proprietary DEC videos spanning 450 h of content in 9 languages and 5 + genres, making our study the largest of its kind ever published. The key findings of our analysis are: (a) even high quality timed-text or subtitle 22subtitles and timed-text are used interchangeably in the manuscript files contain significant levels of label-noise (up to 15%). Despite high label noise, deep networks are robust and are able to retain high AUCs (∼ 0.94). (b) Using larger labelled dataset can substantially increase neural VAD model’s True Positive Rate (TPR) with up to 1.3% and 18% relative improvement over current state-of-the-art methods in Hebbar et al. (2019) and Chaudhuri et al. (2018) respectively. This effect is more pronounced in noisy environments such as music and environmental sounds. This insight is particularly instructive while prioritizing domain specific labelled data acquisition versus exploring model structure and complexity. (c) Currently available sequence based neural models show similar levels of competence in terms of their language agnostic behaviour for VAD at high Signal-to-Noise Ratios (SNRs) and for clean speech, (d) Deep models exhibit varied performance across different SNRs with CLDNN (Zazo et al., 2016) being the most robust, and (e) models with comparatively larger number of parameters (∼2 M) are less robust to input noise as opposed to models having smaller number of parameters (∼ 0.5 M).