Single image haze removal is a challenging ill-posed problem. Existing methods use various constraints/priors to get plausible dehazing solutions. The key to achieve haze removal is to estimate a ...medium transmission map for an input hazy image. In this paper, we propose a trainable end-to-end system called DehazeNet, for medium transmission estimation. DehazeNet takes a hazy image as input, and outputs its medium transmission map that is subsequently used to recover a haze-free image via atmospheric scattering model. DehazeNet adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing. Specifically, the layers of Maxout units are used for feature extraction, which can generate almost all haze-relevant features. We also propose a novel nonlinear activation function in DehazeNet, called bilateral rectified linear unit, which is able to improve the quality of recovered haze-free image. We establish connections between the components of the proposed DehazeNet and those used in existing methods. Experiments on benchmark images show that DehazeNet achieves superior performance over existing methods, yet keeps efficient and easy to use.
Unsupervised domain adaptation (UDA) is to make predictions for unlabeled data on a target domain, given labeled data on a source domain whose distribution shifts from the target one. Mainstream UDA ...methods learn aligned features between the two domains, such that a classifier trained on the source features can be readily applied to the target ones. However, such a transferring strategy has a potential risk of damaging the intrinsic discrimination of target data. To alleviate this risk, we are motivated by the assumption of structural domain similarity, and propose to directly uncover the intrinsic target discrimination via discriminative clustering of target data. We constrain the clustering solutions using structural source regularization that hinges on our assumed structural domain similarity. Technically, we use a flexible framework of deep network based discriminative clustering that minimizes the KL divergence between predictive label distribution of the network and an introduced auxiliary one; replacing the auxiliary distribution with that formed by ground-truth labels of source data implements the structural source regularization via a simple strategy of joint network training. We term our proposed method as Structurally Regularized Deep Clustering (SRDC), where we also enhance target discrimination with clustering of intermediate network features, and enhance structural regularization with soft selection of less divergent source examples. Careful ablation studies show the efficacy of our proposed SRDC. Notably, with no explicit domain alignment, SRDC outperforms all existing methods on three UDA benchmarks.
Generalized Face Super-Resolution Jia, Kui; Gong, Shaogang
IEEE transactions on image processing,
06/2008, Letnik:
17, Številka:
6
Journal Article
Recenzirano
Odprti dostop
Existing learning-based face super-resolution (hallucination) techniques generate high-resolution images of a single facial modality (i.e., at a fixed expression, pose and illumination) given one or ...set of low-resolution face images as probe. Here, we present a generalized approach based on a hierarchical tensor (multilinear) space representation for hallucinating high-resolution face images across multiple modalities, achieving generalization to variations in expression and pose. In particular, we formulate a unified tensor which can be reduced to two parts: a global image-based tensor for modeling the mappings among different facial modalities, and a local patch-based multiresolution tensor for incorporating high-resolution image details. For realistic hallucination of unregistered low-resolution faces contained in raw images, we develop an automatic face alignment algorithm capable of pixel-wise alignment by iteratively warping the probing face to its projection in the space of training face images. Our experiments show not only performance superiority over existing benchmark face super-resolution techniques on single modal face hallucination, but also novelty of our approach in coping with multimodal hallucination and its robustness in automatic alignment under practical imaging conditions.
Multi-Task CNN Model for Attribute Prediction Abdulnabi, Abrar H.; Gang Wang; Jiwen Lu ...
IEEE transactions on multimedia,
2015-Nov., 2015-11-00, 20151101, Letnik:
17, Številka:
11
Journal Article
Recenzirano
Odprti dostop
This paper proposes a joint multi-task learning algorithm to better predict attributes in images using deep convolutional neural networks (CNN). We consider learning binary semantic attributes ...through a multi-task CNN model, where each CNN will predict one binary attribute. The multi-task learning allows CNN models to simultaneously share visual knowledge among different attribute categories. Each CNN will generate attribute-specific feature representations, and then we apply multi-task learning on the features to predict their attributes. In our multi-task framework, we propose a method to decompose the overall model's parameters into a latent task matrix and combination matrix. Furthermore, under-sampled classifiers can leverage shared statistics from other classifiers to improve their performance. Natural grouping of attributes is applied such that attributes in the same group are encouraged to share more knowledge. Meanwhile, attributes in different groups will generally compete with each other, and consequently share less knowledge. We show the effectiveness of our method on two popular attribute datasets.
In this paper, we propose a very simple deep learning network for image classification that is based on very basic data processing components: 1) cascaded principal component analysis (PCA); 2) ...binary hashing; and 3) blockwise histograms. In the proposed architecture, the PCA is employed to learn multistage filter banks. This is followed by simple binary hashing and block histograms for indexing and pooling. This architecture is thus called the PCA network (PCANet) and can be extremely easily and efficiently designed and learned. For comparison and to provide a better understanding, we also introduce and study two simple variations of PCANet: 1) RandNet and 2) LDANet. They share the same topology as PCANet, but their cascaded filters are either randomly selected or learned from linear discriminant analysis. We have extensively tested these basic networks on many benchmark visual data sets for different tasks, including Labeled Faces in the Wild (LFW) for face verification; the MultiPIE, Extended Yale B, AR, Facial Recognition Technology (FERET) data sets for face recognition; and MNIST for hand-written digit recognition. Surprisingly, for all tasks, such a seemingly naive PCANet model is on par with the state-of-the-art features either prefixed, highly hand-crafted, or carefully learned by deep neural networks (DNNs). Even more surprisingly, the model sets new records for many classification tasks on the Extended Yale B, AR, and FERET data sets and on MNIST variations. Additional experiments on other public data sets also demonstrate the potential of PCANet to serve as a simple but highly competitive baseline for texture classification and object recognition.
Many machine learning problems are concerned with discovering or associating common patterns in data of multiple views or modalities. Multi-view learning is one of the methods to achieve such goals. ...Recent methods propose deep multi-view networks via adaptation of generic deep neural networks (DNNs), which concatenate features of individual views at intermediate network layers (i.e., fusion layers). In this paper, we study the problem of multi-view learning in such end-to-end networks. We take a regularization approach via multi-view learning criteria, and propose a novel, effective, and efficient neuron-wise correlation-maximizing regularizer. We implement our proposed regularizers collectively as a correlation-regularized network layer (CorrReg). CorrReg can be applied to either fully-connected or convolutional fusion layers, simply by replacing them with their CorrReg counterparts. By partitioning neurons of a hidden layer in generic DNNs into multiple subsets, we also consider a multi-view feature learning perspective of generic DNNs. Such a perspective enables us to study deep multi-view learning in the context of regularized network training, for which we present control experiments of benchmark image classification to show the efficacy of our proposed CorrReg. To investigate how CorrReg is useful for practical multi-view learning problems, we conduct experiments of RGB-D object/scene recognition and multi-view-based 3D object recognition, using networks with fusion layers that concatenate intermediate features of individual modalities or views for subsequent classification. Applying CorrReg to fusion layers of these networks consistently improves classification performance. In particular, we achieve the new state of the art on the benchmark RGB-D object and RGB-D scene datasets. We make the implementation of CorrReg publicly available.
Tibial dyschondroplasia (TD) is the most-prevalent leg disorder in fast-growing chickens; it is intractable and characterized by abnormal endochondral bone formation of proximal tibial growth-plates ...(TGPs). Previous studies have shown that bone is a highly vascularized tissue dependent on the coordinated coupling between angiogenesis and osteogenesis, but the underlying mechanisms of bone formation and bone remodeling are poorly defined in TD chickens. Here, we observed that inhibition of vasculogenesis and angiogenesis remarkably impaired vascular invasion in the hypertrophic chondrocyte zone of the TGPs, resulting in the massive death of chondrocytes due to a shortage of blood vessels and nutrients. Moreover, the balance of the OPG (osteoprotegerin)/RANKL (receptor activator of nuclear factor-kB ligand) system is also severely disrupted during the osteogenesis process while coupling with angiogenesis, both of which eventually lead to abnormal endochondral bone formation in TD chickens. Thus, the process of vascular formation in endochondral bone appears to initiate the pathological changes in TD, and improvement of this process during coupling with osteogenesis may be a potential therapeutic approach to treat this intractable disease.
In this paper, we propose a framework of transforming images from a source image space to a target image space, based on learning coupled dictionaries from a training set of paired images. The ...framework can be used for applications such as image super-resolution and estimation of image intrinsic components (shading and albedo). It is based on a local parametric regression approach, using sparse feature representations over learned coupled dictionaries across the source and target image spaces. After coupled dictionary learning, sparse coefficient vectors of training image patch pairs are partitioned into easily retrievable local clusters. For any test image patch, we can fast index into its closest local cluster and perform a local parametric regression between the learned sparse feature spaces. The obtained sparse representation (together with the learned target space dictionary) provides multiple constraints for each pixel of the target image to be estimated. The final target image is reconstructed based on these constraints. The contributions of our proposed framework are three-fold. 1) We propose a concept of coupled dictionary learning based on coupled sparse coding which requires the sparse coefficient vectors of a pair of corresponding source and target image patches to have the same support, i.e., the same indices of nonzero elements. 2) We devise a space partitioning scheme to divide the high-dimensional but sparse feature space into local clusters. The partitioning facilitates extremely fast retrieval of closest local clusters for query patches. 3) Benefiting from sparse feature-based image transformation, our method is more robust to corrupted input data, and can be considered as a simultaneous image restoration and transformation process. Experiments on intrinsic image estimation and super-resolution demonstrate the effectiveness and efficiency of our proposed method.
A significant proportion of patients (10%-20%) with acute pancreatitis develop severe acute pancreatitis characterized by pancreatic necrosis, systemic inflammation, and organ failure, commonly ...requiring intensive care unit (ICU) admission. In this specific population, nutrition therapy is more challenging than that in the general ICU population, primarily because of inevitable gastrointestinal involvement by pancreatic inflammation. In this review, we discussed several key aspects of nutrition therapy in this population, including key pathophysiology that may impede nutrition therapy, the timing and implementation of enteral nutrition and parenteral nutrition, the importance of specific nutrient supplements, and the long-term outcomes that may be addressed by nutrition therapy.
Human actions in video sequences are three-dimensional (3D) spatio-temporal signals characterizing both the visual appearance and motion dynamics of the involved humans and objects. Inspired by the ...success of convolutional neural networks (CNN) for image classification, recent attempts have been made to learn 3D CNNs for recognizing human actions in videos. However, partly due to the high complexity of training 3D convolution kernels and the need for large quantities of training videos, only limited success has been reported. This has triggered us to investigate in this paper a new deep architecture which can handle 3D signals more effectively. Specifically, we propose factorized spatio-temporal convolutional networks (FstCN) that factorize the original 3D convolution kernel learning as a sequential process of learning 2D spatial kernels in the lower layers (called spatial convolutional layers), followed by learning 1D temporal kernels in the upper layers (called temporal convolutional layers). We introduce a novel transformation and permutation operator to make factorization in FstCN possible. Moreover, to address the issue of sequence alignment, we propose an effective training and inference strategy based on sampling multiple video clips from a given action video sequence. We have tested FstCN on two commonly used benchmark datasets (UCF-101 and HMDB-51). Without using auxiliary training videos to boost the performance, FstCN outperforms existing CNN based methods and achieves comparable performance with a recent method that benefits from using auxiliary training videos.