Dynamic texture (DT) is an extension of texture to the temporal domain. Description and recognition of DTs have attracted growing attention. In this paper, a novel approach for recognizing DTs is ...proposed and its simplifications and extensions to facial image analysis are also considered. First, the textures are modeled with volume local binary patterns (VLBP), which are an extension of the LBP operator widely used in ordinary texture analysis, combining motion and appearance. To make the approach computationally simple and easy to extend, only the co-occurrences of the local binary patterns on three orthogonal planes (LBP-TOP) are then considered. A block-based method is also proposed to deal with specific dynamic events such as facial expressions in which local information and its spatial locations should also be taken into account. In experiments with two DT databases, DynTex and Massachusetts Institute of Technology (MIT), both the VLBP and LBP-TOP clearly outperformed the earlier approaches. The proposed block-based method was evaluated with the Cohn-Kanade facial expression database with excellent results. The advantages of our approach include local processing, robustness to monotonic gray-scale changes, and simple computation
Low rank and sparse representation based methods, which make few specific assumptions about the background, have recently attracted wide attention in background modeling. With these methods, moving ...objects in the scene are modeled as pixel-wised sparse outliers. However, in many practical scenarios, the distributions of these moving parts are not truly pixel-wised sparse but structurally sparse. Meanwhile a robust analysis mechanism is required to handle background regions or foreground movements with varying scales. Based on these two observations, we first introduce a class of structured sparsity-inducing norms to model moving objects in videos. In our approach, we regard the observed sequence as being constituted of two terms, a low-rank matrix (background) and a structured sparse outlier matrix (foreground). Next, in virtue of adaptive parameters for dynamic videos, we propose a saliency measurement to dynamically estimate the support of the foreground. Experiments on challenging well known data sets demonstrate that the proposed approach outperforms the state-of-the-art methods and works effectively on a wide range of complex videos.
Visual speech information plays an important role in lipreading under noisy conditions or for listeners with a hearing impairment. In this paper, we present local spatiotemporal descriptors to ...represent and recognize spoken isolated phrases based solely on visual input. Spatiotemporal local binary patterns extracted from mouth regions are used for describing isolated phrase sequences. In our experiments with 817 sequences from ten phrases and 20 speakers, promising accuracies of 62% and 70% were obtained in speaker-independent and speaker-dependent recognition, respectively. In comparison with other methods on AVLetters database, the accuracy, 62.8%, of our method clearly outperforms the others. Analysis of the confusion matrix for 26 English letters shows the good clustering characteristics of visemes for the proposed descriptors. The advantages of our approach include local processing and robustness to monotonic gray-scale changes. Moreover, no error prone segmentation of moving lips is needed.
Recently, the recognition task of spontaneous facial micro-expressions has attracted much attention with its various real-world applications. Plenty of handcrafted or learned features have been ...employed for a variety of classifiers and achieved promising performances for recognizing micro-expressions. However, the micro-expression recognition is still challenging due to the subtle spatiotemporal changes of micro-expressions. To exploit the merits of deep learning, we propose a novel deep recurrent convolutional networks based micro-expression recognition approach, capturing the spatiotemporal deformations of micro-expression sequence. Specifically, the proposed deep model is constituted of several recurrent convolutional layers for extracting visual features and a classificatory layer for recognition. It is optimized by an end-to-end manner and obviates manual feature design. To handle sequential data, we exploit two ways to extend the connectivity of convolutional networks across temporal domain, in which the spatiotemporal deformations are modeled in views of facial appearance and geometry separately. Besides, to overcome the shortcomings of limited and imbalanced training samples, two temporal data augmentation strategies as well as a balanced loss are jointly used for our deep network. By performing the experiments on three spontaneous micro-expression datasets, we verify the effectiveness of our proposed micro-expression recognition approach compared to the state-of-the-art methods.
Automatic understanding of human affect using visual signals is of great importance in everyday human–machine interactions. Appraising human emotional states, behaviors and reactions displayed in ...real-world settings, can be accomplished using latent continuous dimensions (e.g., the circumplex model of affect). Valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the activation of the emotion) constitute popular and effective representations for affect. Nevertheless, the majority of collected datasets this far, although containing naturalistic emotional states, have been captured in highly controlled recording conditions. In this paper, we introduce the Aff-Wild benchmark for training and evaluating affect recognition algorithms. We also report on the results of the First Affect-in-the-wild Challenge (Aff-Wild Challenge) that was recently organized in conjunction with CVPR 2017 on the Aff-Wild database, and was the first ever challenge on the estimation of valence and arousal in-the-wild. Furthermore, we design and extensively train an end-to-end deep neural architecture which performs prediction of continuous emotion dimensions based on visual cues. The proposed deep learning architecture, AffWildNet, includes convolutional and recurrent neural network layers, exploiting the invariant properties of convolutional features, while also modeling temporal dynamics that arise in human behavior via the recurrent layers. The AffWildNet produced state-of-the-art results on the Aff-Wild Challenge. We then exploit the AffWild database for learning features, which can be used as priors for achieving best performances both for dimensional, as well as categorical emotion recognition, using the RECOLA, AFEW-VA and EmotiW 2017 datasets, compared to all other methods designed for the same goal. The database and emotion recognition models are available at
http://ibug.doc.ic.ac.uk/resources/first-affect-wild-challenge
.
Micro-expression recognition aims to infer genuine emotions that people try to conceal from facial video clips. It is a very challenging task because micro-expressions have a very low intensity and ...short duration, which makes micro-expressions difficult to observe. Recently, researchers have designed various spatiotemporal descriptors to describe micro-expressions. It is notable that for better capturing the low-intensity facial muscle movement, a fixed spatial division grid, <inline-formula><tex-math notation="LaTeX">8\times 8</tex-math></inline-formula> for example, is commonly used to partition the facial images into a few facial blocks before extracting descriptors. However, it is hard to choose an ideal division grid for different micro-expression samples because the division grids affect the discriminative ability of spatiotemporal descriptors to distinguish micro-expressions. To address this problem, in this paper, we design a hierarchical spatial division scheme for spatiotemporal descriptor extraction. By using the proposed scheme, it would not be a problem to determine which division grid is most suitable regarding different micro-expression samples. Furthermore, we propose a kernelized group sparse learning (KGSL) model to process hierarchical scheme based spatiotemporal descriptors such that they are more effective for micro-expression recognition tasks. To evaluate the performance of the proposed micro-expression recognition method consisting of the hierarchical scheme based spatiotemporal descriptors and KGSL, extensive experiments are conducted on two public micro-expression databases: CASME II and SMIC. Compared with many recent state-of-the-art approaches, our method achieves more promising recognition results.
Micro-expression recognition is a challenging task in computer vision field due to the repressed facial appearance and short duration. Previous work for micro-expression recognition have used ...hand-crafted features like LBP-TOP, Gabor filter and optical flow. This paper is the first work to explore the possible use of deep learning for micro-expression recognition task. Due to the lack of data for micro-expression, training a CNN model from micro-expression data is not feasible. Instead, transfer learning from objects and facial expressions based CNN models are used. The aim is to use feature selection to remove the irrelevant deep features for our task. This work extends evolutionary algorithms to search an optimal set of deep features so that it does not overfit the training data and generalizes well for the test data. Promising results are presented for various micro-expression datasets.
The alkylation of isobutane with butene is an important refining process for the production of a complex mixture of branched alkanes, which is an ideal blending component for gasoline. The current ...catalysts used in industrial processes are concentrated H
2
SO
4
and HF, which have problems including serious environmental pollution, equipment corrosion, potential safety hazard, high energy consumption in waste acid recycling,
etc
. Solid catalysts are another type of catalyst for this alkylation; however, they suffer from problems related to rapid deactivation. Ionic liquids (ILs) can be considered as catalysts of the third generation to replace traditional catalysts in isobutane/butene alkylation to produce clean oil. In this review, alkylation catalyzed by various kinds of acidic ILs, including Lewis acidic ILs (such as chloroaluminate ones) and ILs containing Brønsted acidic functional groups (
e.g.
, -SO
3
H, HSO
4
−
), is reviewed. The currently reported ILs used in the catalysis of isobutane alkylation and their corresponding catalytic activity are summarized and compared. This will help the readers to know what kinds of ILs are effective for the alkylation of isobutane with butene and to understand which factors affect the catalytic performance. The advantages of the catalysis of isobutane/butene alkylation by ILs include tunable acidity of the catalyst by varying the ion structure, limited solubility of the products in the IL phase and therefore easy separation of the alkylate from the catalyst, environmental friendliness, less corrosion of equipment,
etc.
, thus making catalysis by ILs greener. The mechanism and kinetics of the alkylation catalyzed by ILs are discussed. Finally, perspectives and challenges of the isobutane/butene alkylation catalyzed by ILs are given.
This article provides a comprehensive review on the catalysis of isobutane/butene alkylation by ionic liquids for clean oil production.
Micro-expressions (MEs) are rapid, involuntary facial expressions which reveal emotions that people do not intend to show. Studying MEs is valuable as recognizing them has many important ...applications, particularly in forensic science and psychotherapy. However, analyzing spontaneous MEs is very challenging due to their short duration and low intensity. Automatic ME analysis includes two tasks: ME spotting and ME recognition. For ME spotting, previous studies have focused on posed rather than spontaneous videos. For ME recognition, the performance of previous studies is low. To address these challenges, we make the following contributions: (i) We propose the first method for spotting spontaneous MEs in long videos (by exploiting feature difference contrast). This method is training free and works on arbitrary unseen videos. (ii) We present an advanced ME recognition framework, which outperforms previous work by a large margin on two challenging spontaneous ME databases (SMIC and CASMEII). (iii) We propose the first automatic ME analysis system (MESR), which can spot and recognize MEs from spontaneous video data. Finally, we show our method outperforms humans in the ME recognition task by a large margin, and achieves comparable performance to humans at the very challenging task of spotting and then recognizing spontaneous MEs.
In this paper, we propose a novel approach to compute rotation-invariant features from histograms of local noninvariant patterns. We apply this approach to both static and dynamic local binary ...pattern (LBP) descriptors. For static-texture description, we present LBP histogram Fourier (LBP-HF) features, and for dynamic-texture recognition, we present two rotation-invariant descriptors computed from the LBPs from three orthogonal planes (LBP-TOP) features in the spatiotemporal domain. LBP-HF is a novel rotation-invariant image descriptor computed from discrete Fourier transforms of LBP histograms. The approach can be also generalized to embed any uniform features into this framework, and combining the supplementary information, e.g., sign and magnitude components of the LBP, together can improve the description ability. Moreover, two variants of rotation-invariant descriptors are proposed to the LBP-TOP, which is an effective descriptor for dynamic-texture recognition, as shown by its recent success in different application problems, but it is not rotation invariant. In the experiments, it is shown that the LBP-HF and its extensions outperform noninvariant and earlier versions of the rotation-invariant LBP in the rotation-invariant texture classification. In experiments on two dynamic-texture databases with rotations or view variations, the proposed video features can effectively deal with rotation variations of dynamic textures (DTs). They also are robust with respect to changes in viewpoint, outperforming recent methods proposed for view-invariant recognition of DTs.