Remarkable properties of Compressed sensing (CS) has led researchers to utilize it in various other fields where a solution to an underdetermined system of linear equations is needed. One such ...application is in the area of array signal processing e.g. in signal denoising and Direction of Arrival (DOA) estimation. From the two prominent categories of CS recovery algorithms, namely convex optimization algorithms and greedy sparse approximation algorithms, we investigate the application of greedy sparse approximation algorithms to estimate DOA in the uniform linear array (ULA) environment. We conduct an empirical investigation into the behavior of the two state-of-the-art greedy algorithms: OMP and CoSaMP. This investigation takes into account the various scenarios such as varying degrees of noise level and coherency between the sources. We perform simulations to demonstrate the performances of these algorithms and give a brief analysis of the results.
Subspace based techniques for direction of arrival (DOA) estimation need large amount of snapshots to detect source directions accurately. This poses a problem in the form of computational burden on ...practical applications. The introduction of compressive sensing (CS) to solve this issue has become a norm in the last decade. In this paper, a novel CS beamformer root-MUSIC algorithm is presented with a revised optimal measurement matrix bound. With regards to this algorithm, the effect of signal subspace deviation under low snapshot scenario (e.g. target tracking) is analysed. The CS beamformer greatly reduces computational complexity without affecting resolution of the algorithm, works on par with root-MUSIC under low snapshot scenario and also, gives an option of non-uniform linear array sensors unlike the case of root-MUSIC algorithm. The effectiveness of the algorithm is demonstrated with simulations under various scenarios.
ALANET Gupta, Akash; Aich, Abhishek; Roy-Chowdhury, Amit K.
Proceedings of the 28th ACM International Conference on Multimedia,
10/2020
Conference Proceeding
Odprti dostop
Existing works address the problem of generating high frame-rate sharp videos by separately learning the frame deblurring and frame interpolation modules. Most of these approaches have a strong prior ...assumption that all the input frames are blurry whereas in a real-world setting, the quality of frames varies. Moreover, such approaches are trained to perform either of the two tasks - deblurring or interpolation - in isolation, while many practical situations call for both. Different from these works, we address a more realistic problem of high frame-rate sharp video synthesis with no prior assumption that input is always blurry. We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos with no prior knowledge of input frames being blurry or not, thereby performing the task of both deblurring and interpolation. We hypothesize that information from the latent representation of the consecutive frames can be utilized to generate optimized representations for both frame deblurring and frame interpolation. Specifically, we employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame. The optimized representation learnt using these attention modules help the model to generate and interpolate sharp frames. Extensive experiments on standard datasets demonstrate that our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem. The project page is available at https://agupt013.github.io/ALANET.html.
Vision transformer based models bring significant improvements for image segmentation tasks. Although these architectures offer powerful capabilities irrespective of specific segmentation tasks, ...their use of computational resources can be taxing on deployed devices. One way to overcome this challenge is by adapting the computation level to the specific needs of the input image rather than the current one-size-fits-all approach. To this end, we introduce ECO-M2F or EffiCient TransfOrmer Encoders for Mask2Former-style models. Noting that the encoder module of M2F-style models incur high resource-intensive computations, ECO-M2F provides a strategy to self-select the number of hidden layers in the encoder, conditioned on the input image. To enable this self-selection ability for providing a balance between performance and computational efficiency, we present a three step recipe. The first step is to train the parent architecture to enable early exiting from the encoder. The second step is to create an derived dataset of the ideal number of encoder layers required for each training example. The third step is to use the aforementioned derived dataset to train a gating network that predicts the number of encoder layers to be used, conditioned on the input image. Additionally, to change the computational-accuracy tradeoff, only steps two and three need to be repeated which significantly reduces retraining time. Experiments on the public datasets show that the proposed approach reduces expected encoder computational cost while maintaining performance, adapts to various user compute resources, is flexible in architecture configurations, and can be extended beyond the segmentation task to object detection.
Subspace based techniques for direction of arrival (DOA) estimation need large amount of snapshots to detect source directions accurately. This poses a problem in the form of computational burden on ...practical applications. The introduction of compressive sensing (CS) to solve this issue has become a norm in the last decade. In this paper, a novel CS beamformer root-MUSIC algorithm is presented with a revised optimal measurement matrix bound. With regards to this algorithm, the effect of signal subspace deviation under low snapshot scenario (e.g. target tracking) is analysed. The CS beamformer greatly reduces computational complexity without affecting resolution of the algorithm, works on par with root-MUSIC under low snapshot scenario and also, gives an option of non-uniform linear array sensors unlike the case of root-MUSIC algorithm. The effectiveness of the algorithm is demonstrated with simulations under various scenarios.
The issue of reducing computational burden and its complexity while maintaining the adequate resolution has been a major problem in the area of DOA estimation. Compressive Sensing (CS) has been used ...in this regard to a great effect to address this issue. In recent works, it has been shown that by exploiting the sparsity of the observations obtained from the sensors, the computations can be greatly reduced without affecting resolution of the algorithm. This is done by introducing CS beamformers (CSB) to the picture. The dimension measurement matrix suggested by traditional CS theory however is found to be sub-optimal as CS beamformer doesn't use the regular CS recovery methods while implementing it. In this paper, we introduce a new strict and improved bound to the dimensions of the measurement matrix to be used in the CS beamformer MUSIC algorithm which decrease the number of measurements still further. We provide simulations to demonstrate the results and a comparison with the original algorithm hence showing that this bound is superior for CSB-MUSIC algorithm.
We aim to train a multi-task model such that users can adjust the desired compute budget and relative importance of task performances after deployment, without retraining. This enables optimizing ...performance for dynamically varying user needs, without heavy computational overhead to train and save models for various scenarios. To this end, we propose a multi-task model consisting of a shared encoder and task-specific decoders where both encoder and decoder channel widths are slimmable. Our key idea is to control the task importance by varying the capacities of task-specific decoders, while controlling the total computational cost by jointly adjusting the encoder capacity. This improves overall accuracy by allowing a stronger encoder for a given budget, increases control over computational cost, and delivers high-quality slimmed sub-architectures based on user's constraints. Our training strategy involves a novel 'Configuration-Invariant Knowledge Distillation' loss that enforces backbone representations to be invariant under different runtime width configurations to enhance accuracy. Further, we present a simple but effective search algorithm that translates user constraints to runtime width configurations of both the shared encoder and task decoders, for sampling the sub-architectures. The key rule for the search algorithm is to provide a larger computational budget to the higher preferred task decoder, while searching a shared encoder configuration that enhances the overall MTL performance. Various experiments on three multi-task benchmarks (PASCALContext, NYUDv2, and CIFAR100-MTL) with diverse backbone architectures demonstrate the advantage of our approach. For example, our method shows a higher controllability by ~33.5% in the NYUD-v2 dataset over prior methods, while incurring much less compute cost.
Most cross-domain unsupervised Video Anomaly Detection (VAD) works assume that at least few task-relevant target domain training data are available for adaptation from the source to the target ...domain. However, this requires laborious model-tuning by the end-user who may prefer to have a system that works ``out-of-the-box." To address such practical scenarios, we identify a novel target domain (inference-time) VAD task where no target domain training data are available. To this end, we propose a new `Zero-shot Cross-domain Video Anomaly Detection (zxvad)' framework that includes a future-frame prediction generative model setup. Different from prior future-frame prediction models, our model uses a novel Normalcy Classifier module to learn the features of normal event videos by learning how such features are different ``relatively" to features in pseudo-abnormal examples. A novel Untrained Convolutional Neural Network based Anomaly Synthesis module crafts these pseudo-abnormal examples by adding foreign objects in normal video frames with no extra training cost. With our novel relative normalcy feature learning strategy, zxvad generalizes and learns to distinguish between normal and abnormal frames in a new target domain without adaptation during inference. Through evaluations on common datasets, we show that zxvad outperforms the state-of-the-art (SOTA), regardless of whether task-relevant (i.e., VAD) source training data are available or not. Lastly, zxvad also beats the SOTA methods in inference-time efficiency metrics including the model size, total parameters, GPU energy consumption, and GMACs.
Image enhancement approaches often assume that the noise is signal independent, and approximate the degradation model as zero-mean additive Gaussian. However, this assumption does not hold for ...biomedical imaging systems where sensor-based sources of noise are proportional to signal strengths, and the noise is better represented as a Poisson process. In this work, we explore a sparsity and dictionary learning-based approach and present a novel self-supervised learning method for single-image denoising where the noise is approximated as a Poisson process, requiring no clean ground-truth data. Specifically, we approximate traditional iterative optimization algorithms for image denoising with a recurrent neural network that enforces sparsity with respect to the weights of the network. Since the sparse representations are based on the underlying image, it is able to suppress the spurious components (noise) in the image patches, thereby introducing implicit regularization for denoising tasks through the network structure. Experiments on two bio-imaging datasets demonstrate that our method outperforms the state-of-the-art approaches in terms of PSNR and SSIM. Our qualitative results demonstrate that, in addition to higher performance on standard quantitative metrics, we are able to recover much more subtle details than other compared approaches. Our code is made publicly available at https://github.com/tacalvin/Poisson2Sparse
Existing works address the problem of generating high frame-rate sharp videos by separately learning the frame deblurring and frame interpolation modules. Most of these approaches have a strong prior ...assumption that all the input frames are blurry whereas in a real-world setting, the quality of frames varies. Moreover, such approaches are trained to perform either of the two tasks - deblurring or interpolation - in isolation, while many practical situations call for both. Different from these works, we address a more realistic problem of high frame-rate sharp video synthesis with no prior assumption that input is always blurry. We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos with no prior knowledge of input frames being blurry or not, thereby performing the task of both deblurring and interpolation. We hypothesize that information from the latent representation of the consecutive frames can be utilized to generate optimized representations for both frame deblurring and frame interpolation. Specifically, we employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame. The optimized representation learnt using these attention modules help the model to generate and interpolate sharp frames. Extensive experiments on standard datasets demonstrate that our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.