Self-driving cars: A survey Badue, Claudine; Guidolini, Rânik; Carneiro, Raphael Vivacqua ...
Expert systems with applications,
03/2021, Volume:
165
Journal Article
Peer reviewed
We survey research on self-driving cars published in the literature focusing on autonomous cars developed since the DARPA challenges, which are equipped with an autonomy system that can be ...categorized as SAE level 3 or higher. The architecture of the autonomy system of self-driving cars is typically organized into the perception system and the decision-making system. The perception system is generally divided into many subsystems responsible for tasks such as self-driving-car localization, static obstacles mapping, moving obstacles detection and tracking, road mapping, traffic signalization detection and recognition, among others. The decision-making system is commonly partitioned as well into many subsystems responsible for tasks such as route planning, path planning, behavior selection, motion planning, and control. In this survey, we present the typical architecture of the autonomy system of self-driving cars. We also review research on relevant methods for perception and decision making. Furthermore, we present a detailed description of the architecture of the autonomy system of the self-driving car developed at the Universidade Federal do Espírito Santo (UFES), named Intelligent Autonomous Robotics Automobile (IARA). Finally, we list prominent self-driving car research platforms developed by academia and technology companies, and reported in the media.
•Recently developments of autonomous driving from academic and industry point of view.•Breakdown of the main aspects comprising autonomous driving and their evolution.•Autonomous driving architecture review and proposal.
Decreasing costs of vision sensors and advances in embedded hardware boosted lane related research – detection, estimation, tracking, etc. – in the past two decades. The interest in this topic has ...increased even more with the demand for advanced driver assistance systems (ADAS) and self-driving cars. Although extensively studied independently, there is still need for studies that propose a combined solution for the multiple problems related to the ego-lane, such as lane departure warning (LDW), lane change detection, lane marking type (LMT) classification, road markings detection and classification, and detection of adjacent lanes (i.e., immediate left and right lanes) presence. In this paper, we propose a real-time Ego-Lane Analysis System (ELAS) capable of estimating ego-lane position, classifying LMTs and road markings, performing LDW and detecting lane change events. The proposed vision-based system works on a temporal sequence of images. Lane marking features are extracted in perspective and Inverse Perspective Mapping (IPM) images that are combined to increase robustness. The final estimated lane is modeled as a spline using a combination of methods (Hough lines with Kalman filter and spline with particle filter). Based on the estimated lane, all other events are detected. To validate ELAS and cover the lack of lane datasets in the literature, a new dataset with more than 20 different scenes (in more than 15,000 frames) and considering a variety of scenarios (urban road, highways, traffic, shadows, etc.) was created. The dataset was manually annotated and made publicly available to enable evaluation of several events that are of interest for the research community (i.e., lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes). Moreover, the system was also validated quantitatively and qualitatively on other public datasets. ELAS achieved high detection rates in all real-world events and proved to be ready for real-time applications.
Display omitted
•An accurate real-time real-world Ego-Lane Analysis System (ELAS)•Novel manually annotated lane dataset with more than 20 scenes (+15,000 frames)•We publicly released code and novel dataset.
•Handling Pedestrians in Self-Driving Cars using Image Tracking and Frenét Frames.•The method is safer and more efficient than systems without tracking functionality.•Tracking pedestrians enables ...early decision capability.•Our self-driving car was evaluated in both simulated and real-world scenarios.
Display omitted
The development of intelligent autonomous cars is of great interest. A particular and challenging problem is to handle pedestrians, for example, crossing or walking along the road. Since pedestrians are one of the most fragile elements in traffic, a reliable pedestrian detection and handling system is mandatory. The current pedestrian handling system of our autonomous cars suffers from the limitation of the pure detection-based systems, i.e., it limits the autonomous car system to make decisions based only on the very present moment. This work improves the pedestrian handling systems by incorporating an object tracker with the aim of predicting the pedestrian’s behavior. With this knowledge, the autonomous car can better decide the time to stop and to start moving, providing a more comfortable, efficient, and safer driving experience. The proposed method was augmented with a path generator, based on Frenét Frames, and incorporated to our self-driving car in order to enable a better decision making and to enable overtaking pedestrians. The behaviour of our self-driving car was evaluated in both simulated and real-world scenarios. Results showed the proposed system is safer and more efficient than the system without tracking functionality due to the early decision capability.
An important logistics application of robotics involves manipulators that pick-and-place objects placed in warehouse shelves. A critical aspect of this task corresponds to detecting the pose of a ...known object in the shelf using visual data. Solving this problem can be assisted by the use of an RGBD sensor, which also provides depth information beyond visual data. Nevertheless, it remains a challenging problem since multiple issues need to be addressed, such as low illumination inside shelves, clutter, texture-less and reflective objects as well as the limitations of depth sensors. This letter provides a new rich dataset for advancing the state-of-the-art in RGBD-based 3D object pose estimation, which is focused on the challenges that arise when solving warehouse pick-and-place tasks. The publicly available dataset includes thousands of images and corresponding ground truth data for the objects used during the first Amazon Picking Challenge at different poses and clutter conditions. Each image is accompanied with ground truth information to assist in the evaluation of algorithms for object detection. To show the utility of the dataset, a recent algorithm for RGBD-based pose estimation is evaluated in this letter. Given the measured performance of the algorithm on the dataset, this letter shows how it is possible to devise modifications and improvements to increase the accuracy of pose estimation algorithms. This process can be easily applied to a variety of different methodologies for object pose detection and improve performance in the domain of warehouse pick-and-place.
High-resolution satellite imagery has been increasingly used on remote sensing classification problems. One of the main factors is the availability of this kind of data. Despite the high ...availability, very little effort has been placed on the zebra crossing classification problem. In this letter, crowdsourcing systems are exploited in order to enable the automatic acquisition and annotation of a large-scale satellite imagery database for crosswalks related tasks. Then, this data set is used to train deep-learning-based models in order to accurately classify satellite images that contain or not contain zebra crossings. A novel data set with more than 240000 images from 3 continents, 9 countries, and more than 20 cities was used in the experiments. The experimental results showed that freely available crowdsourcing data can be used to accurately (97.11%) train robust models to perform crosswalk classification on a global scale.
Unsupervised domain adaptation for object detection addresses the adaption of detectors trained in a source domain to work accurately in an unseen target domain. Recently, methods approaching the ...alignment of the intermediate features proven to be promising, achieving state-of-the-art results. However, these methods are laborious to implement and hard to interpret. Although promising, there is still room for improvements to close the performance gap toward the upper-bound (when training with the target data). In this work, we propose a method to generate an artificial dataset in the target domain to train an object detector. We employed two unsupervised image translators (CycleGAN and an AdaIN-based model) using only annotated data from the source domain and non-annotated data from the target domain. Our key contributions are the proposal of a less complex yet more effective method that also has an improved interpretability. Results on real-world scenarios for autonomous driving show significant improvements, outperforming state-of-the-art methods in most cases, further closing the gap toward the upper-bound.
•A simple yet effective method for detecting objects on unsupervised domain adaptation.•Artificially generated images are useful for unsupervised domain adaptation.•An extensive comparison with the state-of-the-art is provided.•Experiments in three scenarios: synthetic data, adverse weather, and cross-camera.
•Exploitation of crowdsourcing platforms, such as OpenStreetMap and Google StreetView.•Automatic acquisition and annotation of a large-scale database (+500,000 images).•Deep learning (ConvNet) ...applied on the crosswalk classification problem.•Cross-database evaluation indicates the system is ready for real-world applications.
The proposed system exploits crowdsourcing platforms to automatically acquire and annotate a large-scale dataset, and train a Convolutional Neural Network to perform crosswalk classification.
Display omitted
Correctly identifying crosswalks is an essential task for the driving activity and mobility autonomy. Many crosswalk classification, detection and localization systems have been proposed in the literature over the years. These systems use different perspectives to tackle the crosswalk classification problem: satellite imagery, cockpit view (from the top of a car or behind the windshield), and pedestrian perspective. Most of the works in the literature are designed and evaluated using small and local datasets, i.e. datasets that present low diversity. Scaling to large datasets imposes a challenge for the annotation procedure. Moreover, there is still need for cross-database experiments in the literature because it is usually hard to collect the data in the same place and conditions of the final application. In this paper, we present a crosswalk classification system based on deep learning. For that, crowdsourcing platforms, such as OpenStreetMap and Google Street View, are exploited to enable automatic training via automatic acquisition and annotation of a large-scale database. Additionally, this work proposes a comparison study of models trained using fully-automatic data acquisition and annotation against models that were partially annotated. Cross-database experiments were also included in the experimentation to show that the proposed methods enable use with real world applications. Our results show that the model trained on the fully-automatic database achieved high overall accuracy (94.12%), and that a statistically significant improvement (to 96.30%) can be achieved by manually annotating a specific part of the database. Finally, the results of the cross-database experiments show that both models are robust to the many variations of image and scenarios, presenting a consistent behavior.
•Localization with occupancy or reflectivity grid maps is more accurate.•Semantic grid maps lead to stable and reasonably accurate localization.•Localization with colour grid maps failed due to ...changes in illumination.•Entropy correlation coefficient is not a good metric for comparing colour maps.•The two-step mapping technique was successfully employed in all experiments.
The localization of self-driving cars is needed for several tasks such as keeping maps updated, tracking objects, and planning. Localization algorithms often take advantage of maps for estimating the car pose. Since maintaining and using several maps is computationally expensive, it is important to analyze which type of map is more adequate for each application. In order to contribute with this analysis, in this work, we compare the accuracy of a particle filter localization when using occupancy, reflectivity, color, or semantic grid maps. To the best of our knowledge, such evaluation is missing in the literature. For building semantic and color grid maps, point clouds from a Light Detection and Ranging (LiDAR) sensor are fused with images captured by a front-facing camera. Semantic information is extracted from images with the deep neural network DeepLabv3+. Experiments are performed in varied environments, under diverse conditions of illumination and traffic. Results show that occupancy grid maps lead to more accurate localization, followed by reflectivity grid maps. In most scenarios, the localization with semantic grid maps kept the position tracking without catastrophic losses, but with errors from 2 to 3 times bigger than the previous. Color grid maps led to inaccurate and unstable localization in most scenarios even using a robust metric, the entropy correlation coefficient, for comparing online data and the map.
•Use non-realistic computer graphics to generate training samples for object detection.•Investigate the impact of context when training deep models with synthetic samples.•Experiments are performed ...in several well-known traffic light datasets.•Our approach achieves results comparable to those that use real-world training data.
Display omitted
Deep neural networks come as an effective solution to many problems associated with autonomous driving. By providing real image samples with traffic context to the network, the model learns to detect and classify elements of interest, such as pedestrians, traffic signs, and traffic lights. However, acquiring and annotating real data can be extremely costly in terms of time and effort. In this context, we propose a method to generate artificial traffic-related training data for deep traffic light detectors. This data is generated using basic non-realistic computer graphics to blend fake traffic scenes on top of arbitrary image backgrounds that are not related to the traffic domain. Thus, a large amount of training data can be generated without annotation efforts. Furthermore, it also tackles the intrinsic data imbalance problem in traffic light datasets, caused mainly by the low amount of samples of the yellow state. Experiments show that it is possible to achieve results comparable to those obtained with real training data from the problem domain, yielding an average mAP and an average F1-score which are each nearly 4 p.p. higher than the respective metrics obtained with a real-world reference model.
Agricultural losses due to post-harvest diseases can reach up to 30% of total production. Detecting diseases in fruits at an early stage is crucial to mitigate losses and ensure the quality and ...health of fruits. However, this task is challenging due to the different formats, sizes, shapes, and colors that the same disease can present. Convolutional neural networks have been proposed to address this issue, but most studies use self-built datasets with few samples per disease, hindering reproducibility and comparison of techniques. To address these challenges, the authors proposed a novel image dataset comprising 23,158 examples divided into nine classes of papaya fruit diseases, and a robust papaya fruit disease detector called Yolo-Papaya based on the YoloV7 detector with the implementation of a convolutional block attention module (CBAM) attention mechanism. This detector achieved an overall mAP (mean average precision) of 86.2%, with a performance of over 98% in classes such as “healthy fruits” and “Phytophthora blight”. The proposed detector and dataset can be used in practical applications for fruit quality control and are consolidated as a robust benchmark for the task of papaya fruit disease detection. The image dataset and all source code used in this study are available to the academic community on the project page, enabling reproducibility of the study and advancement of research in this domain.