Decreasing costs of vision sensors and advances in embedded hardware boosted lane related research – detection, estimation, tracking, etc. – in the past two decades. The interest in this topic has ...increased even more with the demand for advanced driver assistance systems (ADAS) and self-driving cars. Although extensively studied independently, there is still need for studies that propose a combined solution for the multiple problems related to the ego-lane, such as lane departure warning (LDW), lane change detection, lane marking type (LMT) classification, road markings detection and classification, and detection of adjacent lanes (i.e., immediate left and right lanes) presence. In this paper, we propose a real-time Ego-Lane Analysis System (ELAS) capable of estimating ego-lane position, classifying LMTs and road markings, performing LDW and detecting lane change events. The proposed vision-based system works on a temporal sequence of images. Lane marking features are extracted in perspective and Inverse Perspective Mapping (IPM) images that are combined to increase robustness. The final estimated lane is modeled as a spline using a combination of methods (Hough lines with Kalman filter and spline with particle filter). Based on the estimated lane, all other events are detected. To validate ELAS and cover the lack of lane datasets in the literature, a new dataset with more than 20 different scenes (in more than 15,000 frames) and considering a variety of scenarios (urban road, highways, traffic, shadows, etc.) was created. The dataset was manually annotated and made publicly available to enable evaluation of several events that are of interest for the research community (i.e., lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes). Moreover, the system was also validated quantitatively and qualitatively on other public datasets. ELAS achieved high detection rates in all real-world events and proved to be ready for real-time applications.
Display omitted
•An accurate real-time real-world Ego-Lane Analysis System (ELAS)•Novel manually annotated lane dataset with more than 20 scenes (+15,000 frames)•We publicly released code and novel dataset.
High-resolution satellite imagery has been increasingly used on remote sensing classification problems. One of the main factors is the availability of this kind of data. Despite the high ...availability, very little effort has been placed on the zebra crossing classification problem. In this letter, crowdsourcing systems are exploited in order to enable the automatic acquisition and annotation of a large-scale satellite imagery database for crosswalks related tasks. Then, this data set is used to train deep-learning-based models in order to accurately classify satellite images that contain or not contain zebra crossings. A novel data set with more than 240000 images from 3 continents, 9 countries, and more than 20 cities was used in the experiments. The experimental results showed that freely available crowdsourcing data can be used to accurately (97.11%) train robust models to perform crosswalk classification on a global scale.
Unsupervised domain adaptation for object detection addresses the adaption of detectors trained in a source domain to work accurately in an unseen target domain. Recently, methods approaching the ...alignment of the intermediate features proven to be promising, achieving state-of-the-art results. However, these methods are laborious to implement and hard to interpret. Although promising, there is still room for improvements to close the performance gap toward the upper-bound (when training with the target data). In this work, we propose a method to generate an artificial dataset in the target domain to train an object detector. We employed two unsupervised image translators (CycleGAN and an AdaIN-based model) using only annotated data from the source domain and non-annotated data from the target domain. Our key contributions are the proposal of a less complex yet more effective method that also has an improved interpretability. Results on real-world scenarios for autonomous driving show significant improvements, outperforming state-of-the-art methods in most cases, further closing the gap toward the upper-bound.
•A simple yet effective method for detecting objects on unsupervised domain adaptation.•Artificially generated images are useful for unsupervised domain adaptation.•An extensive comparison with the state-of-the-art is provided.•Experiments in three scenarios: synthetic data, adverse weather, and cross-camera.
•Exploitation of crowdsourcing platforms, such as OpenStreetMap and Google StreetView.•Automatic acquisition and annotation of a large-scale database (+500,000 images).•Deep learning (ConvNet) ...applied on the crosswalk classification problem.•Cross-database evaluation indicates the system is ready for real-world applications.
The proposed system exploits crowdsourcing platforms to automatically acquire and annotate a large-scale dataset, and train a Convolutional Neural Network to perform crosswalk classification.
Display omitted
Correctly identifying crosswalks is an essential task for the driving activity and mobility autonomy. Many crosswalk classification, detection and localization systems have been proposed in the literature over the years. These systems use different perspectives to tackle the crosswalk classification problem: satellite imagery, cockpit view (from the top of a car or behind the windshield), and pedestrian perspective. Most of the works in the literature are designed and evaluated using small and local datasets, i.e. datasets that present low diversity. Scaling to large datasets imposes a challenge for the annotation procedure. Moreover, there is still need for cross-database experiments in the literature because it is usually hard to collect the data in the same place and conditions of the final application. In this paper, we present a crosswalk classification system based on deep learning. For that, crowdsourcing platforms, such as OpenStreetMap and Google Street View, are exploited to enable automatic training via automatic acquisition and annotation of a large-scale database. Additionally, this work proposes a comparison study of models trained using fully-automatic data acquisition and annotation against models that were partially annotated. Cross-database experiments were also included in the experimentation to show that the proposed methods enable use with real world applications. Our results show that the model trained on the fully-automatic database achieved high overall accuracy (94.12%), and that a statistically significant improvement (to 96.30%) can be achieved by manually annotating a specific part of the database. Finally, the results of the cross-database experiments show that both models are robust to the many variations of image and scenarios, presenting a consistent behavior.
•Handling Pedestrians in Self-Driving Cars using Image Tracking and Frenét Frames.•The method is safer and more efficient than systems without tracking functionality.•Tracking pedestrians enables ...early decision capability.•Our self-driving car was evaluated in both simulated and real-world scenarios.
Display omitted
The development of intelligent autonomous cars is of great interest. A particular and challenging problem is to handle pedestrians, for example, crossing or walking along the road. Since pedestrians are one of the most fragile elements in traffic, a reliable pedestrian detection and handling system is mandatory. The current pedestrian handling system of our autonomous cars suffers from the limitation of the pure detection-based systems, i.e., it limits the autonomous car system to make decisions based only on the very present moment. This work improves the pedestrian handling systems by incorporating an object tracker with the aim of predicting the pedestrian’s behavior. With this knowledge, the autonomous car can better decide the time to stop and to start moving, providing a more comfortable, efficient, and safer driving experience. The proposed method was augmented with a path generator, based on Frenét Frames, and incorporated to our self-driving car in order to enable a better decision making and to enable overtaking pedestrians. The behaviour of our self-driving car was evaluated in both simulated and real-world scenarios. Results showed the proposed system is safer and more efficient than the system without tracking functionality due to the early decision capability.
•Use non-realistic computer graphics to generate training samples for object detection.•Investigate the impact of context when training deep models with synthetic samples.•Experiments are performed ...in several well-known traffic light datasets.•Our approach achieves results comparable to those that use real-world training data.
Display omitted
Deep neural networks come as an effective solution to many problems associated with autonomous driving. By providing real image samples with traffic context to the network, the model learns to detect and classify elements of interest, such as pedestrians, traffic signs, and traffic lights. However, acquiring and annotating real data can be extremely costly in terms of time and effort. In this context, we propose a method to generate artificial traffic-related training data for deep traffic light detectors. This data is generated using basic non-realistic computer graphics to blend fake traffic scenes on top of arbitrary image backgrounds that are not related to the traffic domain. Thus, a large amount of training data can be generated without annotation efforts. Furthermore, it also tackles the intrinsic data imbalance problem in traffic light datasets, caused mainly by the low amount of samples of the yellow state. Experiments show that it is possible to achieve results comparable to those obtained with real training data from the problem domain, yielding an average mAP and an average F1-score which are each nearly 4 p.p. higher than the respective metrics obtained with a real-world reference model.
•Simple, yet powerful, method to copy a black-box CNN model with random natural images.•Some constraints are waived and copy attacks are performed with less information.•Understanding copy attacks ...with random natural images.•Throughout evaluation of copycat models created with random natural images.
Convolutional neural networks have been successful lately enabling companies to develop neural-based products, which demand an expensive process, involving data acquisition and annotation; and model generation, usually requiring experts. With all these costs, companies are concerned about the security of their models against copies and deliver them as black-boxes accessed by APIs. Nonetheless, we argue that even black-box models still have some vulnerabilities. In a preliminary work, we presented a simple, yet powerful, method to copy black-box models by querying them with natural random images. In this work, we consolidate and extend the copycat method: (i) some constraints are waived; (ii) an extensive evaluation with several problems is performed; (iii) models are copied between different architectures; and, (iv) a deeper analysis is performed by looking at the copycat behavior. Results show that natural random images are effective to generate copycats for several problems.
•An interactive framework for reconstruction of strip-shredded documents.•The user lock and forbid pairs automatically selected by the recommender module.•Four query strategies for recommending the ...pairs of shreds to be annotated.•A novel methodology to assess the human impact on the quality of a reconstruction.•Annotating 25% of the shreds can yield an error reduction of more than 40%.
Display omitted
The advances in machine learning – particularly in deep learning – have enabled automatizing the reconstruction of shredded documents with significant accuracy. However, despite the recent remarkable results, the state-of-the-art on fully automatic reconstruction still has room for improvement, mainly due to imprecision on the evaluation of how the shreds fit each other (compatibility/cost evaluation). To tackle this problem, we propose a human-in-the-loop reconstruction framework that takes user inputs to improve the solutions (permutation of shreds). In our approach, the user verifies whether adjacent shreds of a solution are also adjacent in the original document. Unlike the current literature, our framework includes a recommender module that automatically selects pairs of shreds to be analyzed by a human. Four recommendation strategies were proposed and evaluated. Results achieved by coupling deep learning reconstruction methods into our framework have shown that introducing the human in the loop can reduce errors by more than 40%.
•Paper shreds matching via self-supervised deep learning.•Training with simulated cuts is effective for real-shredded documents.•A new public dataset with 100 strip-shredded documents (2292 ...shreds).•Accurate (over 90% accuracy) reconstruction of 100 mixed shredded documents.
The reconstruction of shredded documents consists of coherently arranging fragments of paper (shreds) to recover the original document(s). A great challenge in computational reconstruction is to properly evaluate the compatibility between the shreds. While traditional pixel-based approaches are not robust to real shredding, more sophisticated solutions compromise significantly time performance. The solution presented in this work extends our previous deep learning method for single-page reconstruction to a more realistic/complex scenario: the reconstruction of several mixed shredded documents at once. In our approach, the compatibility evaluation is modeled as a two-class (valid or invalid) pattern recognition problem. The model is trained in a self-supervised manner on samples extracted from simulated-shredded documents, which obviates manual annotation. Experimental results on three datasets – including a new collection of 100 strip-shredded documents produced for this work – have shown that the proposed method outperforms the competing ones on complex scenarios, achieving accuracy superior to 90%.