In this paper, the challenge of fast stereo matching for embedded systems is tackled. Limited resources, e.g. memory and processing power, and most importantly real-time capability on embedded ...systems for robotic applications, do not permit the use of most sophisticated stereo matching approaches. The strengths and weaknesses of different matching approaches have been analyzed and a well-suited solution has been found in a Census-based stereo matching algorithm. The novelty of the algorithm used is the explicit adaption and optimization of the well-known Census transform in respect to embedded real-time systems in software. The most important change in comparison with the classic Census transform is the usage of a sparse Census mask which halves the processing time with nearly unchanged matching quality. This is due the fact that large sparse Census masks perform better than small dense masks with the same processing effort. The evidence of this assumption is given by the results of experiments with different mask sizes. Another contribution of this work is the presentation of a complete stereo matching system with its correlation-based core algorithm, the detailed analysis and evaluation of the results, and the optimized high speed realization on different embedded and PC platforms. The algorithm handles difficult areas for stereo matching, such as areas with low texture, very well in comparison to state-of-the-art real-time methods. It can successfully eliminate false positives to provide reliable 3D data. The system is robust, easy to parameterize and offers high flexibility. It also achieves high performance on several, including resource-limited, systems without losing the good quality of stereo matching. A detailed performance analysis of the algorithm is given for optimized reference implementations on various commercial of the shelf (COTS) platforms, e.g. a PC, a DSP and a GPU, reaching a frame rate of up to 75 fps for 640×480 images and 50 disparities. The matching quality and processing time is compared to other algorithms on the Middlebury stereo evaluation website reaching a middle quality and top performance rank. Additional evaluation is done by comparing the results with a very fast and well-known sum of absolute differences algorithm using several Middlebury datasets and real-world scenarios.
Many types of Convolutional Neural Network (CNN) models and training methods have been proposed in recent years aiming to provide efficiency for embedded and edge devices with limited computation and ...memory resources. The wide variety of architectures makes this a complex task that has to balance generality with efficiency. Among the most interesting camera-sensor architectures are Pixel Processor Arrays (PPAs). This study presents two methods that are useful for embedded CNNs in general but particularly suitable for PPAs. The first is for training purely binarized CNNs, the second is for deploying larger models with a model swapping paradigm that loads model components dynamically. Specifically, this study trains and implements networks with batch normalization and adaptive threshold for binary activations. Then, we convert batch normalization and binary activations into a bias matrix which can be parallelly implemented by an add/sub operation. For dynamic model swapping, we propose to decompose applications that are beyond the capacity of a PPA into sub-tasks that can be solved by tree networks that can be loaded dynamically as needed. We demonstrate our approaches to various tasks including classification, localization, and coarse segmentation on a highly resource constrained PPA sensor-processor.
In recent years, the demand for alternative medical diagnostics of the human kidney or renal is growing, and some of the reasons behind this relate to its non-invasive, early, real-time, and ...pain-free mechanism. The chronic kidney problem is one of the major kidney problems, which require an early-stage diagnosis. Therefore, in this work, we have proposed and developed an Intelligent Iris-based Chronic Kidney Identification System (ICKIS). The ICKIS takes an image of human iris as input and on the basis of iridology a deep neural network model on a GPU-based supercomputing machine is applied. The deep neural network models are trained while using 2000 subjects that have healthy and chronic kidney problems. While testing the proposed ICKIS on 2000 separate subjects (1000 healthy and 1000 chronic kidney problems), the system achieves iris-based chronic kidney assessment with an accuracy of 96.8%. In the future, we will work to improve our AI algorithm and try data-set cleaning, so that accuracy can be increased by more efficiently learning the features.
Summary
Advances in control engineering and material science made it possible to develop small-scale unmanned aerial vehicles (UAVs) equipped with cameras and sensors. These UAVs enable us to obtain ...a bird's eye view of the environment. Having access to an aerial view over large areas is helpful in disaster situations, where often only incomplete and inconsistent information is available to the rescue team. In such situations, airborne cameras and sensors are valuable sources of information helping us to build an "overview" of the environment and to assess the current situation. This paper reports on our ongoing research on deploying small-scale, battery-powered and wirelessly connected UAVs carrying cameras for disaster management applications. In this "aerial sensor network" several UAVs fly in formations and cooperate to achieve a certain mission. The ultimate goal is to have an aerial imaging system in which UAVs build a flight formation, fly over a disaster area such as wood fire or a large traffic accident, and deliver high-quality sensor data such as images or videos. These images and videos are communicated to the ground, fused, analyzed in real-time, and finally delivered to the user. In this paper we introduce our aerial sensor network and its application in disaster situations. We discuss challenges of such aerial sensor networks and focus on the optimal placement of sensors. We formulate the coverage problem as integer linear program (ILP) and present first evaluation results.
Stereo vision can deliver a dense 3D reconstruction of the environment in real-time for driver assistance as well as autonomous driving. Semi-Global Matching (SGM) is a popular method of choice for ...solving this task which is already in use for production vehicles. Despite the enormous progress in the field and the high level of performance of modern stereo methods, one key challenge remains: robust stereo vision in automotive scenarios during rain, snow and darkness. Under these circumstances, current methods generate strong temporal noise, many disparity outliers and false positives on object level. These problems are addressed in this work by regularizing stereo vision via prior information. We formulate a temporal prior and a scene prior, which we apply to SGM in order to overcome the deficiencies. The temporal prior integrates knowledge from the previous disparity map to exploit the high temporal correlation, the scene prior exploits knowledge of a representative traffic scene. Using these priors, the object detection rate improves significantly on a driver assistance dataset of 3000 frames including bad weather while reducing the rate of erroneous object detections. We also outperform the ECCV Robust Vision Challenge 2012 winner, iSGM, on this dataset. In addition, results are presented for the KITTI dataset, even showing improvements under good weather conditions when exploiting the temporal prior.
We also show that the temporal and scene priors are easy and efficient to implement on a hybrid CPU/reconfigurable hardware platform. The use of these priors can be extended to other application areas such as mobile robotics.
•A stereo vision solution for adverse weather is proposed.•Temporal and scene priors stabilize the disparity estimation process.•A false-positive-rate reduction by a factor of 3 for bad weather is obtained.•A real-time implementation on an embedded platform is presented.
•Developed a low-power and high-performance medical visual processing system (ViPS).•ViPS takes images with complex data structures from medical imaging interfaces.•It manages complex images by using ...on-chip Specialized Medical Memory.•To process the application ViPS uses heterogeneous multi-core processors.•ViPS provides easy to use programming model for medical applications.
Imaging has become an essential tool in modern medicine science. Numerous powerful platforms to register, store, analyze and process medical imaging applications appear in recent years. In this article, we have designed an advanced visual processing system (ViPS) that stores and processes complex and multi-dimensional medical imaging application. The ViPS provides a user-friendly programming environment and high-performance architecture for data acquisition, registration, storage, analysis and performs segmentation, filtering, and recognition of complex real-time complex and multidimensional medical images or videos. The proposed architecture is highly reliable concerning cost, performance, and power. ViPS is designed and evaluated on a Xilinx Virtex-7 FPGA VC707 Evaluation Kit. The performance of ViPS is compared with the Heterogeneous Multi-Processing Odroid XU3 board and GPU Jetson TK1 Embedded Development Kit with 192 CUDA cores based graphic systems. In comparison to the Heterogeneous Multi-core and GPU based Graphics systems, the results show that ViPS improves 2.4 and 1.4 times system performance respectively, for iridology application. While executing real-time complex images reconstruction at 2× and 1.25× of higher frame rate, the ViPS achieves 15.2× and 5.26× of performance improvement while running various image processing algorithms. The ViPS gets 3.01× and 1.13× of speedup for video processing algorithms and draws 1.55 and 2.27 times less dynamic power.
Eyes of Things Deniz, Oscar; Vallez, Noelia; Espinosa-Aranda, Jose L ...
Sensors (Basel, Switzerland),
05/2017, Letnik:
17, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Embedded systems control and monitor a great deal of our reality. While some "classic" features are intrinsically necessary, such as low power consumption, rugged operating ranges, fast response and ...low cost, these systems have evolved in the last few years to emphasize connectivity functions, thus contributing to the Internet of Things paradigm. A myriad of sensing/computing devices are being attached to everyday objects, each able to send and receive data and to act as a unique node in the Internet. Apart from the obvious necessity to process at least some data at the edge (to increase security and reduce power consumption and latency), a major breakthrough will arguably come when such devices are endowed with some level of autonomous "intelligence". Intelligent computing aims to solve problems for which no efficient exact algorithm can exist or for which we cannot conceive an exact algorithm. Central to such intelligence is Computer Vision (CV), i.e., extracting meaning from images and video. While not everything needs CV, visual information is the richest source of information about the real world: people, places and things. The possibilities of embedded CV are endless if we consider new applications and technologies, such as deep learning, drones, home robotics, intelligent surveillance, intelligent toys, wearable cameras, etc. This paper describes the Eyes of Things (EoT) platform, a versatile computer vision platform tackling those challenges and opportunities.
There is tremendous scope for improving the energy efficiency of embedded vision systems by incorporating programmable region-of-interest (ROI) readout in the image sensor design. In this work, we ...study how ROI programmability can be leveraged for vision applications by anticipating where the ROI will be located in future frames and switching pixels off outside of this region. We refer to this process of ROI prediction and corresponding sensor configuration as adaptive subsampling. Our adaptive subsampling algorithms comprise an object detector and an ROI predictor (Kalman filter) which operate in conjunction to optimize the energy efficiency of the vision pipeline with the end task being object tracking. To further facilitate the implementation of our adaptive algorithms in real systems, we select a candidate algorithm and map it onto an FPGA. Leveraging Xilinx Vitis AI tools, we designed and accelerated a YOLO object detector-based adaptive subsampling algorithm. In order to further improve the algorithm post-deployment, we evaluated several competing baselines on the OTB100 and LaSOT datasets. We found that coupling the ECO tracker with the Kalman filter has a competitive AUC score of 0.4568 and 0.3471 on the OTB100 and LaSOT datasets respectively. Further, the power efficiency of this algorithm is on par with, and in a couple of instances superior to, the other baselines. The ECO-based algorithm incurs a power consumption of approximately 4 W averaged across both datasets while the YOLO-based approach requires power consumption of approximately 6 W (as per our power consumption model). In terms of accuracy-latency tradeoff, the ECO-based algorithm provides near-real-time performance (19.23 FPS) while managing to attain competitive tracking precision.
Computer vision applications have a large disparity in operations, data representation and memory access patterns from the early vision stages to the final classification and recognition stages. A ...hardware system for computer vision has to provide high flexibility without compromising performance, exploiting massively spatial-parallel operations but also keeping a high throughput on data-dependent and complex program flows. Furthermore, the architecture must be modular, scalable and easy to adapt to the needs of different applications. Keeping this in mind, a hybrid SIMD/MIMD architecture for embedded computer vision is proposed. It consists of a coprocessor designed to provide fast and flexible computation of demanding image processing tasks of vision applications. A 32-bit 128-unit device was prototyped on a Virtex-6 FPGA which delivers a peak performance of 19.6 GOP/s and 7.2 W of power dissipation.
Image sensors with programmable region-of-interest (ROI) readout are a new sensing technology important for energyefficient embedded computer vision. In particular, ROIs can subsample the number of ...pixels being readout while performing single object tracking in a video. In this paper, we develop adaptive sampling algorithms which perform joint object tracking and predictive video subsampling. We utilize an object detection consisting of either mean shift tracking or a neural network, coupled with a Kalman filter for prediction. We show that our algorithms achieve mean average precision of 0.70 or higher on a dataset of 20 videos in software. Further, we implement hardware acceleration of mean shift tracking with Kalman filter adaptive subsampling on an FPGA. Hardware results show a 23 × improvement in clock cycles and latency as compared to baseline methods and achieves 38FPS real-time performance. This research points to a new domain of hardware-software co-design for adaptive video subsampling in embedded computer vision.