The separation of music signals is a very challenging task, especially in case of polyphonic chamber music signals because of the similar frequency ranges and sound characteristics of the different ...instruments to separate. In this work, a joint separation approach in the time domain with a U-Net architecture is extended to incorporate additional time-dependent instrument activity information for improved instrument track extractions. Different stages are investigated to integrate the additional information, but an input before the deepest encoder block achieves best separation results as well as highest robustness against randomly wrong labels. This approach outperforms a label integration by multiplication and the input of a static instrument label. Targeted data augmentation by incoherent mixtures is used for a trio example of violin, trumpet, and flute to improve separation results. Moreover, an alternative separation approach with one independent separation model for each instrument is investigated, which enables a more flexible architecture. In this case, an input after the deepest encoder block achieves best separation results, but the robustness is slightly reduced compared to the joint model. The improvements by additional information on active instruments are verified by using real instrument activity predictions for both the joint and the independent separation approaches.
Multi-frequency techniques with temporally encoded pattern sequences are used in phase-measuring methods of 3D optical metrology to suppress phase noise but lead to ambiguities that can only be ...resolved by phase unwrapping. However, classical phase unwrapping methods do not use all the information to unwrap all measurements simultaneously and do not consider the periodicity of the phase, which can lead to errors. We present an approach that optimally reconstructs the phase on a pixel-by-pixel basis using a probabilistic modeling approach. The individual phase measurements are modeled using circular probability densities. Maximizing the compound density of all measurements yields the optimal decoding. Since the entire information of all phase measurements is simultaneously used and the wrapping of the phases is implicitly compensated, the reliability can be greatly increased. In addition, a spatio-temporal phase unwrapping is introduced by a probabilistic modeling of the local pixel neighborhoods. This leads to even higher robustness against noise than the conventional methods and thus to better measurement results.
Machine learning (ML) is a key technology in smart manufacturing as it provides insights into complex processes without requiring deep domain expertise. This work deals with deep learning algorithms ...to determine a 3D reconstruction from a single 2D grayscale image. The potential of 3D reconstruction can be used for quality control because the height values contain relevant information that is not visible in 2D data. Instead of 3D scans, estimated depth maps based on a 2D input image can be used with the advantage of a simple setup and a short recording time. Determining a 3D reconstruction from a single input image is a difficult task for which many algorithms and methods have been proposed in the past decades. In this work, three deep learning methods, namely stacked autoencoder (SAE), generative adversarial networks (GANs) and U-Nets are investigated, evaluated and compared for 3D reconstruction from a 2D grayscale image of laser-welded components. In this work, different variants of GANs are tested, with the conclusion that Wasserstein GANs (WGANs) are the most robust approach among them. To the best of our knowledge, the present paper considers for the first time the U-Net, which achieves outstanding results in semantic segmentation, in the context of 3D reconstruction tasks. Unlike the U-Net, which uses standard convolutions, the stacked dilated U-Net (SDU-Net) applies stacked dilated convolutions. Of all the 3D reconstruction approaches considered in this work, the SDU-Net shows the best performance, not only in terms of evaluation metrics but also in terms of computation time. Due to the comparably small number of trainable parameters and the suitability of the architecture for strong data augmentation, a robust model can be generated with only a few training data.
Deep learning undoubtedly has had a huge impact on the computer vision community in recent years. In light field imaging, machine learning-based applications have significantly outperformed their ...conventional counterparts. Furthermore, multi- and hyperspectral light fields have shown promising results in light field-related applications such as disparity or shape estimation. Yet, a multispectral light field dataset, enabling data-driven approaches, is missing. Therefore, we propose a new synthetic multispectral light field dataset with depth and disparity ground truth. The dataset consists of a training, validation and test dataset, containing light fields of randomly generated scenes, as well as a challenge dataset rendered from hand-crafted scenes enabling detailed performance assessment. Additionally, we present a Python framework for light field deep learning. The goal of this framework is to ensure reproducibility of light field deep learning research and to provide a unified platform to accelerate the development of new architectures. The dataset is made available under dx.doi.org/10.21227/y90t-xk47 . The framework is maintained at gitlab.com/iiit-public/lfcnn .
The technology of hairpin welding, which is frequently used in the automotive industry, entails high-quality requirements in the welding process. It can be difficult to trace the defect back to the ...affected weld if a non-functioning stator is detected during the final inspection. Often, a visual assessment of a cooled weld seam does not provide any information about its strength. However, based on the behavior during welding, especially about spattering, conclusions can be made about the quality of the weld. In addition, spatter on the component can have serious consequences. In this paper, we present in-process monitoring of laser-based hairpin welding. Using an in-process image analyzed by a neural network, we present a spatter detection method that allows conclusions to be drawn about the quality of the weld. In this way, faults caused by spattering can be detected at an early stage and the affected components sorted out. The implementation is based on a small data set and under consideration of a fast process time on hardware with limited computing power. With a network architecture that uses dilated convolutions, we obtain a large receptive field and can therefore consider feature interrelation in the image. As a result, we obtain a pixel-wise classifier, which allows us to infer the spatter areas directly on the production lines.
Privacy-preserving high-quality people detection is a vital computer vision task for various indoor scenarios, e.g. people counting, customer behavior analysis, ambient assisted living or smart ...homes. In this work a novel approach for people detection in multiple overlapping depth images is proposed. We present a probabilistic framework utilizing a generative scene model to jointly exploit the multi-view image evidence, allowing us to detect people from arbitrary viewpoints. Our approach makes use of mean-field variational inference to not only estimate the maximum a posteriori (MAP) state but to also approximate the posterior probability distribution of people present in the scene. Evaluation shows state-of-the-art results on a novel data set for indoor people detection and tracking in depth images from the top-view with high perspective distortions. Furthermore it can be demonstrated that our approach (compared to the the mono-view setup) successfully exploits the multi-view image evidence and robustly converges in only a few iterations.
Dynamic Vision Sensors differ from conventional cameras in that only intensity changes of individual pixels are perceived and transmitted as an asynchronous stream instead of an entire frame. The ...technology promises, among other things, high temporal resolution and low latencies and data rates. While such sensors currently enjoy much scientific attention, there are only little publications on practical applications. One field of application that has hardly been considered so far, yet potentially fits well with the sensor principle due to its special properties, is automatic visual inspection. In this paper, we evaluate current state-of-the-art processing algorithms in this new application domain. We further propose an algorithmic approach for the identification of ideal time windows within an event stream for object classification. For the evaluation of our method, we acquire two novel datasets that contain typical visual inspection scenarios, i.e., the inspection of objects on a conveyor belt and during free fall. The success of our algorithmic extension for data processing is demonstrated on the basis of these new datasets by showing that classification accuracy of current algorithms is highly increased. By making our new datasets publicly available, we intend to stimulate further research on application of Dynamic Vision Sensors in machine vision applications.
As mixed traffic between automated vehicles and human drivers in inner city becomes more prevalent in the near future understanding and predicting drivers' behavior is important. Additionally, there ...is a wide variety of inner city intersections. They can differ greatly in traffic density, visibility, number of objects and many more aspects. This difference in complexity has an influence on the behavior of human drivers at intersections. To further understand the effect of complexity we conducted a naturalistic driving field study in inner city traffic with 34 participants. We focused on unsignalized intersections because there is a greater range of possibly ambiguous situations at such intersections than compared to e.g. an intersection regulated by traffic lights. Features describing the behavior (commit distance, drop in velocity and the minimal velocity) are extracted from the driven trajectories. Additionally, we define intersection complexity by several features describing an intersection. These features include both the static (street, visible and driveable width, the visibility of the other streets and the number of trees) and the dynamic environment (entry location and turning direction, numbers of vehicles, vehicles with interaction, vehicles with priority, vehicles having to yield and pedestrians). Based on those we show that the entry location and the turning direction have a significant effect on the behavior features. Additionally, we show that the typical behavior of human drivers can be predicted by the features describing an intersection's complexity. Finally, the feature set is reduced in dimensionality for a more condensed intersection description. For that we test reduced feature sets as well as feature sets from an autoencoder and show that prediction is feasible with them as well.
Due to decreasing hardware prices, machine learning is becoming increasingly interesting for industrial applications such as automatic visual inspection (AVI). This paper presents a metaheuristic ...approach to the automatic generation of a well suited convolutional neural network (CNN) based on differential evolution. This makes it possible to find a suitable architecture of a CNN for a given task with little prior knowledge. Another aim is to reduce the resources needed in the inference as much as possible. Therefore, we choose a function that considers both the accuracy and the resources used to measure the fitness of a CNN. For typical industrial datasets, we obtain CNNs with an accuracy of more than 98 % on average within relatively short processing time.
This work is concerned with model-based remote gaze estimation through monocular video oculography using the pupil center and corneal reflections in the context of manual work. Based on simulations, ...the influence of undetected corneal glints on the estimated direction of gaze is quantified and discussed.