Neural Architecture Search (NAS) has demonstrated state-of-the-art performance on various computer vision tasks. Despite the superior performance achieved, the efficiency and generality of existing ...methods are highly valued due to their high computational complexity and low generality. In this paper, we propose an efficient and unified NAS framework termed DDPNAS via dynamic distribution pruning, facilitating a theoretical bound on accuracy and efficiency. In particular, we first sample architectures from a joint categorical distribution. Then the search space is dynamically pruned and its distribution is updated every few epochs. With the proposed efficient network generation method, we directly obtain the optimal neural architectures on given constraints, which is practical for on-device models across diverse search spaces and constraints. The architectures searched by our method achieve remarkable top-1 accuracies, 97.56 and 77.2 on CIFAR-10 and ImageNet (mobile settings), respectively, with the fastest search process,
i.e.
, only 1.8 GPU hours on a Tesla V100. Codes for searching and network generation are available at:
https://openi.pcl.ac.cn/PCL_AutoML/XNAS.
Rain streaks, particularly in heavy rain, not only degrade visibility but also make many computer vision algorithms fail to function properly. In this paper, we address this visibility problem by ...focusing on single-image rain removal, even in the presence of dense rain streaks and rain-streak accumulation, which is visually similar to mist or fog. To achieve this, we introduce a new rain model and a deep learning architecture. Our rain model incorporates a binary rain map indicating rain-streak regions, and accommodates various shapes, directions, and sizes of overlapping rain streaks, as well as rain accumulation, to model heavy rain. Based on this model, we construct a multi-task deep network, which jointly learns three targets: the binary rain-streak map, rain streak layers, and clean background, which is our ultimate output. To generate features that can be invariant to rain steaks, we introduce a contextual dilated network, which is able to exploit regional contextual information. To handle various shapes and directions of overlapping rain streaks, our strategy is to utilize a recurrent process that progressively removes rain streaks. Our binary map provides a constraint and thus additional information to train our network. Extensive evaluation on real images, particularly in heavy rain, shows the effectiveness of our model and architecture.
•We provides a synthesis of the literature on ML for reliability & safety applications.•ML can provide novel, more accurate insights than traditional reliability tools.•We outline future ...opportunities and challenges for ML in these applications.•We include a discussion of deep learning to highlight its popularity and advantages.
Machine learning (ML) pervades an increasing number of academic disciplines and industries. Its impact is profound, and several fields have been fundamentally altered by it, autonomy and computer vision for example; reliability engineering and safety will undoubtedly follow suit. There is already a large but fragmented literature on ML for reliability and safety applications, and it can be overwhelming to navigate and integrate into a coherent whole. In this work, we facilitate this task by providing a synthesis of, and a roadmap to this ever-expanding analytical landscape and highlighting its major landmarks and pathways. We first provide an overview of the different ML categories and sub-categories or tasks, and we note several of the corresponding models and algorithms. We then look back and review the use of ML in reliability and safety applications. We examine several publications in each category/sub-category, and we include a short discussion on the use of Deep Learning to highlight its growing popularity and distinctive advantages. Finally, we look ahead and outline several promising future opportunities for leveraging ML in service of advancing reliability and safety considerations. Overall, we argue that ML is capable of providing novel insights and opportunities to solve important challenges in reliability and safety applications. It is also capable of teasing out more accurate insights from accident datasets than with traditional analysis tools, and this in turn can lead to better informed decision-making and more effective accident prevention.
Machine Vision (MV), like other digital technologies, is a critical component of Industry 4.0. The high volume of data accessible by visual equipment can well quickly detect & flag faulty goods while ...recognising their defects, thereby allowing rapid and efficient intervention in industry 4.0. The versions of MV are essential for efficient production on a scale with applications in quality assurance, enforcement, and inventory management. The removal of human error simultaneously minimises the probability of a mistake. This paper briefly discusses MV and how it helps Industry 4.0. Various collaborative features and smart technologies of MV for Industry 4.0 are diagrammatically presented. Further, the authors have identified and discussed twenty significant applications of MV for Industry 4.0. In Industry 4.0 and associated digital industry transition, every step in the process, including manufacturing, inventory control of the supply chain, and more, involves a different and innovative approach. One of the aims is to develop MV capable of seeing, communicating, and working with more accuracy better than human beings. Enabling robots to perceive and help people in dynamic systems provides the way for many opportunities. In the smart plant of the future, MV plays a significant role, in which automated production lines will adapt themselves to optimise productivity, performance, and profitability.
Dioptric cameras with conventional perspective projection have well established analytical properties. However, they suffer from perspective distortions and only have a limited field of view. ...Catadioptric cameras offer panoramic imaging. Their extensive field of view together with projection specific image analysis, can simplify many computer vision tasks. Several properties of catadioptric projection for geometric primitives such as points and lines have been addressed and have also been used for calibration. However, higher order geometric properties are yet to be investigated. Such analysis is complicated by the specifics of the warping of the scene by catadioptric projection. One such property, that is the subject of this work, is the Regiomontanus angle maximization relative to the effective viewpoint of the sensor. This work considers catadioptric sensors with paraboloidal mirrors, that is, paracatadioptric sensors. Analytical ray tracing of a simplified 1D world object gives its projection in the image and an expression for its length. The optimization of the length of the projection results in a third degree equation for the Regiomontanus distance that can be solved explicitly. The Khayyam geometric solution of this equation provides the Regiomontanus distance of maximum subtended projection for these cameras. Applications of these results in various contexts are presented and discussed.
In agriculture science, automation increases the quality, economic growth and productivity of the country. The export market and quality evaluation are affected by assorting of fruits and vegetables. ...The crucial sensory characteristic of fruits and vegetables is appearance that impacts their market value, the consumer’s preference and choice. Although, the sorting and grading can be done by human but it is inconsistent, time consuming, variable, subjective, onerous, expensive and easily influenced by surrounding. Hence, an astute fruit grading system is needed. In recent years, various algorithms for sorting and grading are done by various researchers using computer vision. This paper presents a detailed overview of various methods i.e. preprocessing, segmentation, feature extraction, classification which addressed fruits and vegetables quality based on color, texture, size, shape and defects. In this paper, a critical comparison of different algorithm proposed by researchers for quality inspection of fruits and vegetables has been carried out.
•We propose a flexible HSCF neuron model, which adaptively changes the positions and directions of the one-dimensional simplex, as well as the radius of the hyperspheres. Thus, higher variability was ...assured for constructing the geometries, which helps to mine the potential data distribution.•A novel CE_VC loss function was proposed by constructing a volume-coverage loss function, which compresses the volume of the hyper-sausage to the hit, and thus the intra-class compactness of samples is assured.•We introduce a network learning algorithm that primarily conducts a divisive iteration method to determine the optimal hyperparameters adaptively.•Experiments using several datasets demonstrate the effectiveness and generalization ability of the proposed HSCF neuron in achieving excellent performance, including classification accuracy, complexity and computation.
Recently, deep neural networks (DNNs) promote mainly by network architectures and loss functions; however, the development of neuron models has been quite limited. In this study, inspired by the mechanism of human cognition, a hyper-sausage coverage function (HSCF) neuron model possessing a high flexible plasticity. Then, a novel cross-entropy and volume-coverage (CE_VC) loss is defined, which compresses the volume of the hyper-sausage to the hilt, and helps alleviate confusion among different classes, thus ensuring the intra-class compactness of the samples. Finally, a divisive iteration method is introduced, which considers each neuron model as a weak classifier, and iteratively increases the number of weak classifiers. Thus, the optimal number of the HSCF neuron is adaptively determined and an end-to-end learning framework is constructed. In particular, to improve the classification performance, the HSCF neuron can be applied to classical DNNs. Comprehensive experiments on eight datasets in several domains demonstrate the effectiveness of the proposed method. The proposed method exhibits the feasibility of boosting DNNs with neuron plasticity and provides a novel perspective for further developments in DNNs. The source code is available at https://github.com/Tough2011/HSCFNet.git .
Computer vision is an interdisciplinary domain for object detection. Object detection relay is a vital part in assisting surveillance, vehicle detection and pose estimation. In this work, we proposed ...a novel deep you only look once (deep YOLO V3) approach to detect the multi-object. This approach looks at the entire frame during the training and test phase. It followed a regression-based technique that used a probabilistic model to locate objects. In this, we construct 106 convolution layers followed by 2 fully connected layers and 812 × 812 × 3 input size to detect the drones with small size. We pre-train the convolution layers for classification at half the resolution and then double the resolution for detection. The number of filters of each layer will be set to 16. The number of filters of the last scale layer is more than 16 to improve the small object detection. This construction uses up-sampling techniques to improve undesired spectral images into the existing signal and rescaling the features in specific locations. It clearly reveals that the up-sampling detects small objects. It actually improves the sampling rate. This YOLO architecture is preferred because it considers less memory resource and computation cost rather than more number of filters. The proposed system is designed and trained to perform a single type of class called drone and the object detection and tracking is performed with the embedded system-based deep YOLO. The proposed YOLO approach predicts the multiple bounding boxes per grid cell with better accuracy. The proposed model has been trained with a large number of small drones with different conditions like open field, and marine environment with complex background.
Abstract
Accurate discriminative regions proposal has an important effect for fine-grained image recognition. The vision transformer (ViT) brings about a striking effect in computer vision duo to its ...innate muti-head self-attention mechanism. However, the attention maps are gradually similar after certain layers and since ViT adds classification token for perform classification, it is unable to effectively select discriminative image patches for fine-grained image classification. To accurately detect discriminative regions, we propose a novel network AMTrans, which efficiently increases layers to learn diverse features and utilizes integrated raw attention maps to capture more salient feature. Specifically, we employ DeepViT as backbone to solve the attention collapse issue. Then, we fuse each head attention weight within each layer to produce attention weight map. After that, we alternatively use recurrent residual refinement blocks to promote salient feature detection and then utilize semantic grouping method to propose the discriminative feature region. A lot of experiments prove that AMTrans acquires the SOTA performance on three widely used fine-grained datasets under the same settings, involving Stanford-Cars, Stanford-Dogs and CUB-200-2011.
In this article, we propose a novel method for infrared and visible image fusion where we develop nest connection-based network and spatial/channel attention models. The nest connection-based network ...can preserve significant amounts of information from input data in a multiscale perspective. The approach comprises three key elements: encoder, fusion strategy, and decoder, respectively. In our proposed fusion strategy, spatial attention models and channel attention models are developed that describe the importance of each spatial position and of each channel with deep features. First, the source images are fed into the encoder to extract multiscale deep features. The novel fusion strategy is then developed to fuse these features for each scale. Finally, the fused image is reconstructed by the nest connection-based decoder. Experiments are performed on publicly available data sets. These exhibit that our proposed approach has better fusion performance than other state-of-the-art methods. This claim is justified through both subjective and objective evaluations. The code of our fusion method is available at https://github.com/hli1221/imagefusion-nestfuse .