A noiseprint is a camera-related artifact that can be extracted from an image to serve as a powerful tool for several forensic tasks. The noiseprint is built with a deep learning data-driven approach ...that is trained to produce unique noise residuals with clear traces of camera-related artifacts. This data-driven approach results in a complex relationship that governs the noiseprint with the input image, making it challenging to attack. This article proposes a novel neural noiseprint transfer framework for noiseprint-based counter forensics. Given an authentic image and a forged image, the proposed framework synthesizes a newly generated image that is visually imperceptible to the forged image, but its noiseprint is very close to the noiseprint of the authentic one, to make it appear as if it is authentic and thus renders the noiseprint-based forensics ineffective. Based on deep content and noiseprint representations of the forged and authentic images, we implement the proposed framework in two different approaches. The first is an optimization-based approach that synthesizes the generated image by minimizing the difference between its content representation with the content representation of the forged image while, at the same time, minimizing the noiseprint representation difference from the authentic one. The second approach is a noiseprint injection-based approach, which first trains a novel neural noiseprint-injector network that can inject the noiseprint of an image into another one. Then, the trained noiseprint-injector is used to inject the noiseprint from the authentic image into the forged one to produce the generated image. The proposed approaches are generic and do not require training for specific images or camera models. Both approaches are evaluated on several datasets against two common forensic tasks: the forgery localization and camera source identification tasks. In the two tasks, the proposed approaches are able to significantly reduce several forensic accuracy scores compared with two noiseprint-based forensics methods while at the same time producing high-fidelity images. On the DSO-1 dataset, the reduction in the forensic accuracy scores has an average of 75%, while the produced images have an average PSNR of 31.5 dB and SSIM of 0.9. The source code of the proposed approaches is available on GitHub (https://github.com/ahmed-elliethy/nnt).
Model predictive control (MPC) is an optimal control method that predicts the future states of the system being controlled and estimates the optimal control inputs that drive the predicted states to ...the required reference. The computations of the MPC are performed at pre-determined sample instances over a finite time horizon. The number of sample instances and the horizon length determine the performance of the MPC and its computational cost. A long horizon with a large sample count allows the MPC to better estimate the inputs when the states have rapid changes over time, which results in better performance but at the expense of high computational cost. However, this long horizon is not always necessary, especially for slowly-varying states. In this case, a short horizon with less sample count is preferable as the same MPC performance can be obtained but at a fraction of the computational cost. In this paper, we propose an adaptive regression-based MPC that predicts the best minimum horizon length and the sample count from several features extracted from the time-varying changes of the states. The proposed technique builds a synthetic dataset using the system model and utilizes the dataset to train a support vector regressor that performs the prediction. The proposed technique is experimentally compared with several state-of-the-art techniques on both linear and non-linear models. The proposed technique shows a superior reduction in computational time with a reduction of about 35–65% compared with the other techniques without introducing a noticeable loss in performance.
Simultaneous processing of multiple multimedia appears in many applications. However, there is a lack of a generalized hardware platform that fits all application needs from the number to the format ...of the input and output multimedia. The processing is also associated with synchronization problems such as startup delays and deviating frame rates of the multimedia. This paper presents a flexible platform with co-design of hardware and software for the applications specific needs. On the hardware side, it presents modular and scalable architecture that considers: the required number of input and output multimedia signals, the mixed analog and digital multimedia signals and their processing hardware components crosstalk to minimize the signal-to-noise ratio on the platform, and finally the low power consumption. On the processing side, a synchronization module is proposed and efficiently implemented to handle the startup delays and the deviating frame rates of the input multimedia signals. The system hardware and software were implemented for two case studies. A case study for fusion of multimedia signals of different modalities (visible and near infra-red (RGBN)), that is needed for modern smart phone cameras, is presented. Another case study for producing a 4K format required for larger displays is included, that stitches 9 high-definition videos simultaneously. The multimedia pipeline: decoding, processing, encoding were all realized and implemented successfully. The system performed in real-time of 30 frames per second. The platform end-to-end signal-to-noise ratio where above 56 and reaching 102 decibels, and the power consumption was below 2 Watts, making it suitable for real-time embedded multimedia systems.
Vehicle tracking in wide area motion imagery (WAMI) relies on associating vehicle detections across multiple WAMI frames to form tracks corresponding to individual vehicles. The temporal window ...length, i.e., the number M of sequential frames, over which associations are collectively estimated poses a tradeoff between accuracy and computational complexity. A larger M improves performance because the increased temporal context enables the use of motion models and allows occlusions and spurious detections to be handled better. The number of total hypotheses tracks, on the other hand, grows exponentially with increasing M, making larger values of M computationally challenging to tackle. In this paper, we introduce stochastic progressive association across multiple frames, an iterative approach that progressively grows M with each iteration to improve estimated tracks by exploiting the enlarged temporal context while keeping computation manageable through two novel approaches for pruning association hypotheses. First, guided by a road network, accurately co-registered to the WAMI frames, we disregard unlikely associations that do not agree with the road network. Second, as M is progressively enlarged at each iteration, the related increase in association hypotheses is limited by revisiting only the subset of association possibilities rendered open by stochastically determined dis-associations for the previous iteration. The stochastic dis-association at each iteration maintains each estimated association according to an estimated probability for confidence, obtained via a probabilistic model. Associations at each iteration are then estimated globally over the M frames by (approximately) solving a binary integer programming problem for selecting a set of compatible tracks. Vehicle tracking results obtained over test WAMI data sets indicate that our proposed approach provides significant performance improvements over the state-of-the-art alternatives.
To enrich large-scale visual analytics applications enabled by aerial wide area motion imagery (WAMI), we propose a novel methodology for accurately registering a geo-referenced vector roadmap to ...WAMI by using the locations of detected vehicles and determining a parametric transform that aligns these locations with the network of roads in the roadmap. Specifically, the problem is formulated in a probabilistic framework, explicitly allowing for spurious detections that do not correspond to on-road vehicles. The registration is estimated via the expectation-maximization (EM) algorithm as the planar homography that minimizes the sum of weighted squared distances between the homography-mapped detection locations and the corresponding closest point on the road network, where the weights are estimated posterior probabilities of detections being on-road vehicles. The weighted distance minimization is efficiently performed using the distance transform with the Levenberg-Marquardt nonlinear least-squares minimization procedure, and the fraction of spurious detections is estimated within the EM framework. The proposed method effectively sidesteps the challenges of feature correspondence estimation, applies directly to different imaging modalities, is robust to spurious detections, and is also more appropriate than feature matching for a planar homography. Results over three WAMI data sets captured by both visual and infrared sensors indicate the effectiveness of the proposed methodology: both visual comparison and numerical metrics for the registration accuracy are significantly better for the proposed method as compared with the existing alternatives.
For specular regions (SRs), the assumption of brightness (or other) constancy between images corresponding to multiple views of a scene breaks down. As a consequence, optical-flow (OF) based ...motion-estimation (ME) algorithms that rely on constancy assumptions fail for specular regions. At the same time estimation of SRs in an image is also prone to errors, particularly to false positives from bright regions in the scene. In this paper, motivated by the fact that specular regions are typically encountered in image regions corresponding to portions of relatively smooth 3D surfaces, we propose an algorithm for improving ME and SRs localization via joint processing. Initial estimates of OF and of the SRs are obtained by conventional methods. The estimate of the SRs is updated using inconsistency of the OF with respect to the neighboring region to reinforce true positives and to reject false positives. The OF is then re-computed with a modified energy functional that, in effect, emphasizes regularization in a spatially adaptive neighborhood of the SRs to improve the estimated OF. Experimental results on synthetic and real image pairs demonstrate that the proposed algorithm offers a significant improvement in both SRs localization and ME over recently proposed methods for tackling these problems.
Parametric chamfer alignment (PChA) is commonly employed for aligning an observed set of points with a corresponding set of reference points. PChA estimates optimal geometric transformation ...parameters that minimize an objective function formulated as the sum of the squared distances from each transformed observed point to its closest reference point. A distance transform enables efficient computation of the (squared) distances, and the objective function minimization is commonly performed via the Levenberg–Marquardt (LM) nonlinear least squares iterative optimization algorithm. The point-wise computations of the objective function, gradient, and Hessian approximation required for the LM iterations make PChA computationally demanding for large-scale datasets. We propose an acceleration of the PChA via a parallelized and pipelined realization that is particularly well suited for large-scale datasets and for modern GPU architectures. Specifically, we partition the observed points among the GPU blocks and decompose the expensive LM calculations in correspondence with the GPU’s single-instruction multiple-thread architecture to significantly speed up this bottleneck step for PChA on large-scale datasets. Additionally, by reordering computations, we propose a novel pipelining of the LM algorithm that offers further speedup by exploiting the low arithmetic latency of the GPU compared with its high global memory access latency. Results obtained on two different platforms for both 2D and 3D large-scale point datasets from our ongoing research demonstrate that the proposed PChA GPU implementation provides a significant speedup over its single CPU counterpart.
Abstract This paper introduces a novel steganographic technique for H.264 video that achieves an outstanding performance against state-of-the-art steganalysis techniques while maintaining real-time ...encoding performance constraints. The proposed technique embeds the secret message by altering the motion vectors (MVs) while preserving their local optimality and consistency feature to withstand the recently emerged steganalysis methods. Thanks to its macro-block (MB) basis architecture, the proposed technique satisfies the real-time constraints, eliminating the need to wait for the whole frame or group of pictures (GOP) and avoiding the need to perform any additional re-encoding step(s). Additionally, altering the MVs is performed in the motion estimation (ME) sub-pixel-refinement stage through a rule-based scheme that ensures each MB’s compatibility for embedding without detection by the aforementioned steganalysis methods. The proposed technique is integrated with the OpenH264 real-time video encoder and evaluated on widely used video sequences. The results prove that the proposed technique achieves a significant security performance against the steganalysis methods while maintaining an acceptable embedding rate, outperforming other state-of-the-art MV-based steganographic methods in real-time constrained environments. The proposed technique adds about $$1-2\%$$ 1 - 2 % overhead beyond the encoder running time. The source code is publicly available here: https://github.com/HassanMohamedGit/OpenH264-RealTime-steg.
Near-infrared (NIR) band sensors capture digital images of scenes under special conditions such as haze, fog, overwhelming light or mist, where visible (VS) band sensors get occluded. However, the ...NIR images contain poor textures and colors of different objects in the scene, on the contrary to the VS images. In this article, we propose a simple yet effective fusion approach that combines both VS and NIR images to produce an enhanced fused image that contains better scene details and similar colors to the VS image. The proposed approach first estimates a fusion map from the relative difference of local contrasts of the VS and NIR images. Then, the approach extracts non-spectral spatial details from the NIR image and finally, the extracted details are weighted according to the fusion map and injected into the VS image to produce the enhanced fused image. The proposed approach adaptively transfers the useful details from the NIR image that contributes to the enhancement of the fused image. It produces realistic fused images by preserving the colors of the VS image and constitutes simple and non-iterative calculations with O(n) complexity. The effectiveness of the proposed approach is experimentally verified by comparisons to four different state-of-the-art VS-NIR fusion approaches in terms of computational complexity and quality of the obtained enhanced fused images. The quality is evaluated using two-color distortion measures and a novel aggregation of several blind image quality assessment measures. The proposed approach shows superior performance as it produces enhanced fused images and preserves the quality even when the NIR images suffer from loss of texture or blurriness degradations, with acceptable fast execution time. Source code of the proposed approach is available online.