Ghosting artifacts caused by moving objects and misalignments are a key challenge in constructing high dynamic range (HDR) images. Current methods first register the input low dynamic range (LDR) ...images using optical flow before merging them. This process is error-prone, and often causes ghosting in the resulting merged image. We propose a novel dual-attention-guided end-to-end deep neural network, called DAHDRNet, which produces high-quality ghost-free HDR images. Unlike previous methods that directly stack the LDR images or features for merging, we use dual-attention modules to guide the merging according to the reference image. DAHDRNet thus exploits both spatial attention and feature channel attention to achieve ghost-free merging. The spatial attention modules automatically suppress undesired components caused by misalignments and saturation, and enhance the fine details in the non-reference images. The channel attention modules adaptively rescale channel-wise features by considering the inter-dependencies between channels. The dual-attention approach is applied recurrently to further improve feature representation, and thus alignment. A dilated residual dense block is devised to make full use of the hierarchical features and increase the receptive field when hallucinating missing details. We employ a hybrid loss function, which consists of a perceptual loss, a total variation loss, and a content loss to recover photo-realistic images. Although DAHDRNet is not flow-based, it can be applied to flow-based registration to reduce artifacts caused by optical-flow estimation errors. Experiments on different datasets show that the proposed DAHDRNet achieves state-of-the-art quantitative and qualitative results.
Genomic prediction has become a powerful modelling tool for assessing line performance in plant and livestock breeding programmes. Among the genomic prediction modelling approaches, linear based ...models have proven to provide accurate predictions even when the number of genetic markers exceeds the number of data samples. However, breeding programmes are now compiling data from large numbers of lines and test environments for analyses, rendering these approaches computationally prohibitive. Machine learning (ML) now offers a solution to this problem through the construction of fully connected deep learning architectures and high parallelisation of the predictive task. However, the fully connected nature of these architectures immediately generates an over-parameterisation of the network that needs addressing for efficient and accurate predictions. In this research we explore the use of an ML architecture governed by variational Bayesian sparsity in its initial layers that we have called VBS-ML. The use of VBS-ML provides a mechanism for feature selection of important markers linked to the trait, immediately reducing the network over-parameterisation. Selected markers then propagate to the remaining fully connected feed-forward components of the ML network to form the final genomic prediction. We illustrated the approach with four large Australian wheat breeding data sets that range from 2665 lines to 10375 lines genotyped across a large set of markers. For all data sets, the use of the VBS-ML architecture improved genomic prediction accuracy over legacy linear based modelling approaches. An ML architecture governed under a variational Bayesian paradigm was shown to improve genomic prediction accuracy over legacy modelling approaches. This VBS-ML approach can be used to dramatically decrease the parameter burden on the network and provide a computationally feasible approach for improving genomic prediction conducted with large breeding population numbers and genetic markers.
Exploiting intrinsic structures in sparse signals underpin the recent progress in compressive sensing (CS). The key for exploiting such structures is to achieve two desirable properties: generality ...(i.e., the ability to fit a wide range of signals with diverse structures) and adaptability (i.e., being adaptive to a specific signal). Most existing approaches, however, often only achieve one of these two properties. In this paper, we propose a novel adaptive Markov random field sparsity prior for CS, which not only is able to capture a broad range of sparsity structures, but also can adapt to each sparse signal through refining the parameters of the sparsity prior with respect to the compressed measurements. To maximize the adaptability, we also propose a new sparse signal estimation, where the sparse signals, support, noise, and signal parameter estimation are unified into a variational optimization problem, which can be effectively solved with an alternative minimization scheme. Extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed method in recovery accuracy, noise tolerance, and runtime.
Abstract
Multi principal element alloys (MPEAs) comprise an atypical class of metal alloys. MPEAs have been demonstrated to possess several exceptional properties, including, as most relevant to the ...present study a high corrosion resistance. In the context of MPEA design, the vast number of potential alloying elements and the staggering number of elemental combinations favours a computational alloy design approach. In order to computationally assess the prospective corrosion performance of MPEA, an approach was developed in this study. A density functional theory (DFT) – based Monte Carlo method was used for the development of MPEA ‘structure’; with the AlCrTiV alloy used as a model. High-throughput DFT calculations were performed to create training datasets for surface activity/selectivity towards different adsorbate species: O
2-
, Cl
-
and H
+
. Machine-learning (ML) with combined representation was then utilised to predict the adsorption and vacancy energies as descriptors for surface activity/selectivity. The capability of the combined computational methods of MC, DFT and ML, as a virtual electrochemical performance simulator for MPEAs was established and may be useful in exploring other MPEAs.
Channel pruning is attracting increasing attention in the deep model compression community due to its capability of significantly reducing computation and memory footprints without special support ...from specific software and hardware. A challenge of channel pruning is designing efficient and effective criteria to select channels to prune. A widely used criterion is minimal performance degeneration, e.g., loss changes before and after pruning being the smallest. To accurately evaluate the truth performance degeneration requires retraining the survived weights to convergence, which is prohibitively slow. Hence existing pruning methods settle to use previous weights (without retraining) to evaluate the performance degeneration. However, we observe that the loss changes differ significantly with and without retraining. It motivates us to develop a technique to evaluate true loss changes without retraining, using which to select channels to prune with more reliability and confidence. We first derive a closed-form estimator of the true loss change per mask change, using influence functions without retraining. Influence function is a classic technique from robust statistics that reveals the impacts of a training sample on the model's prediction and is repurposed by us to assess impacts on true loss changes. We then show how to assess the importance of all channels simultaneously and develop a novel global channel pruning algorithm accordingly. We conduct extensive experiments to verify the effectiveness of the proposed algorithm, which significantly outperforms the competing channel pruning methods on both image classification and object detection tasks. One of the attractive properties of our algorithm is that it automatically obtains the prune percentage without the cumbersome yet commonly used sensitivity analysis by local pruning. To the best of our knowledge, we are the first that shows evaluating true loss changes for pruning without retraining is possible. This finding will open up opportunities for a series of new paradigms to emerge that differ from existing pruning methods. The code is available at https://github.com/hrcheng1066/IFSO
In this work, we present a method to improve the efficiency and robustness of the previous model-free Reinforcement Learning (RL) algorithms for the task of object-goal visual navigation. Despite ...achieving state-of-the-art results, one of the major drawbacks of those approaches is the lack of a forward model that informs the agent about the potential consequences of its actions, i.e., being model-free. In this work, we augment the model-free RL with such a forward model that can predict a representation of a future state, from the beginning of a navigation episode, if the episode were to be successful. Furthermore, in order for efficient training, we develop an algorithm to integrate a replay buffer into the model-free RL that alternates between training the policy and the forward model. We call our agent ForeSI; ForeSI is trained to imagine a future latent state that leads to success. By explicitly imagining such a state, during the navigation, our agent is able to take better actions leading to two main advantages: first, in the absence of an object detector, ForeSI presents a more robust policy, i.e., it leads to about 5% absolute improvement on the Success Rate (SR); second, when combined with an off-the-shelf object detector to help better distinguish the target object, our method leads to about 3% absolute improvement on the SR and about 2% absolute improvement on Success weighted by inverse Path Length (SPL), i.e., presents higher efficiency.
In this paper we propose a dilated convolutional model for music melody extraction. Taking variable-q transforms (VQTs) as inputs, it first uses consecutive layers of convolution to capture local ...temporal-frequency patterns, and then a single layer of dilated convolution to capture global frequency patterns contributed by the pitches and harmonics of active notes. Compared with the contrast model without dilation, the proposed model can remarkably cut down the computational cost, and at the same time does not compromise the performance. Its advantages over existing models are two fold. First, it performs best on most datasets, for both general and vocal melody extraction. Second, it can achieve the best performance with least training data.
•We propose a novel deep neural network based on Transformer, named TransHRNet, which connects the different resolution streams in parallel and repeatedly exchanges the information across ...resolutions.•An Effective Transformer (EffTrans) is introduced to promote the performance, which uses the group linear transformations with an expand-reduce strategy and the spatial-reduction attention layer to further reduce the resource cost.•Our proposed method achieves higher performance than SoTA efficient medical image segmentation method with comparable computation cost.
Most recent 3D medical image segmentation methods adopt convolutional neural networks (CNNs) that rely on deep feature representation and achieve adequate performance. However, due to the convolutional architectures having limited receptive fields, they cannot explicitly model the long-range dependencies in the medical image. Recently, Transformer can benefit from global dependencies using self-attention mechanisms and learn highly expressive representations. Some works were designed based on the Transformers, but the existing Transformers suffer from extreme computational and memories, and they cannot take full advantage of the powerful feature representations in 3D medical image segmentation. In this paper, we aim to connect the different resolution streams in parallel and propose a novel network, named Transformer based High Resolution Network (TransHRNet), with an Effective Transformer (EffTrans) block, which has sufficient feature representation even at high feature resolutions. Given a 3D image, the encoder first utilizes CNN to extract the feature representations to capture the local information, and then the different feature maps are reshaped elaborately for tokens that are fed into each Transformer stream in parallel to learn the global information and repeatedly exchange the information across streams. Unfortunately, the proposed framework based on the standard Transformer needs a huge amount of computation, thus we introduce a deep and effective Transformer to deliver better performance with fewer parameters. The proposed TransHRNet is evaluated on the Multi-Atlas Labeling Beyond the Cranial Vault (BCV) dataset that consists of 11 major human organs and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Experimental results show that it performs better than the convolutional and other related Transformer-based methods on the 3D multi-organ segmentation tasks. Code is available at https://github.com/duweidai/TransHRNet.
•A practical vibration-based CNN combined FE model and field-testingdata is proposed for in-service steel railway bridges.•A data augmentation strategy is proposed to classify various acceleration ...responses related to damage scenarios.•t-Distributed Stochastic Neighbour Embedding (t-SNE) and Gradient weighted Class Activation Mapping (Grad-CAM) are used for visualization of feature extraction and feature mapping.•The proposed method can be used to assess the health state of structure to fulfil the regulatory, such as Australian Standard, and safety aspects of steel railway bridges.•The proposed techniques can be utilized for rail infrastructure management without the need of involving specialized expertise and complex signal processing.
Railway bridges exposed to extreme environmental conditions can gradually lose their effective cross-section at critical locations and cause catastrophic failure. This paper has proposed a practical vibration-based deep learning approach for damage classification of various extents and degrees of cross section losses due to damages like corrosion in operational railway bridges using vibration-based Convolutional Neural Networks (CNN)s. Firstly, field testing of an in-service railway bridge is conducted and the modal parameters of the bridge are obtained to validate the developed Finite Element (FE) model of the bridge. In the next phase, corrosion scenarios of the main bridge members are generated as various quantities of cross section losses of these members by the validated FE following the Australian Standard AS7636. In the deep learning part, a 1D CNN aligned with novel specific data augmentation strategies is developed to classify various acceleration responses related to each damage scenario simulated by the validated FE. Furthermore, a visualization of feature extraction and feature mapping using t-Distributed Stochastic Neighbour Embedding (t-SNE) and Gradient-weighted Class Activation Mapping (Grad-CAM) is illustrated. The case studies on the simulated and field data-validated FE model results applying background noises and variations, and the real field testing data suggest that the proposed method can reach a perfect damage classification close to an accuracy of 100%.