In the past few years, deep learning-based models have been very successful in achieving state-of-the-art results in many tasks in computer vision, speech recognition, and natural language ...processing. These models seem to be a natural fit for handling the ever-increasing scale of biometric recognition problems, from cellphone authentication to airport security systems. Deep learning-based models have increasingly been leveraged to improve the accuracy of different biometric recognition systems in recent years. In this work, we provide a comprehensive survey of more than 150 promising works on biometric recognition (including face, fingerprint, iris, palmprint, ear, voice, signature, and gait recognition), which deploy deep learning models, and show their strengths and potentials in different applications. For each biometric, we first introduce the available datasets that are widely used in the literature and their characteristics. We will then talk about several promising deep learning works developed for that biometric, and show their performance on popular public benchmarks. We will also discuss some of the main challenges while using these models for biometric recognition, and possible future directions to which research in this area is headed.
Image-based sequence recognition has been a long-standing research topic in computer vision. In this paper, we investigate the problem of scene text recognition, which is among the most important and ...challenging tasks in image-based sequence recognition. A novel neural network architecture, which integrates feature extraction, sequence modeling and transcription into a unified framework, is proposed. Compared with previous systems for scene text recognition, the proposed architecture possesses four distinctive properties: (1) It is end-to-end trainable, in contrast to most of the existing algorithms whose components are separately trained and tuned. (2) It naturally handles sequences in arbitrary lengths, involving no character segmentation or horizontal scale normalization. (3) It is not confined to any predefined lexicon and achieves remarkable performances in both lexicon-free and lexicon-based scene text recognition tasks. (4) It generates an effective yet much smaller model, which is more practical for realworld application scenarios. The experiments on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR datasets, demonstrate the superiority of the proposed algorithm over the prior arts. Moreover, the proposed algorithm performs well in the task of image-based music score recognition, which evidently verifies the generality of it.
Interpretation of images and videos containing humans interacting with different objects is a daunting task. It involves understanding scene or event, analyzing human movements, recognizing ...manipulable objects, and observing the effect of the human movement on those objects. While each of these perceptual tasks can be conducted independently, recognition rate improves when interactions between them are considered. Motivated by psychological studies of human perception, we present a Bayesian approach which integrates various perceptual tasks involved in understanding human-object interactions. Previous approaches to object and action recognition rely on static shape or appearance feature matching and motion analysis, respectively. Our approach goes beyond these traditional approaches and applies spatial and functional constraints on each of the perceptual elements for coherent semantic interpretation. Such constraints allow us to recognize objects and actions when the appearances are not discriminative enough. We also demonstrate the use of such constraints in recognition of actions from static images without using any motion information.
This article primarily concerns the problem of isolated handwritten numeral recognition of major Indian scripts. The principal contributions presented here are (a) pioneering development of two ...databases for handwritten numerals of two most popular Indian scripts, (b) a multistage cascaded recognition scheme using wavelet based multiresolution representations and multilayer perceptron classifiers and (c) application of (b) for the recognition of mixed handwritten numerals of three Indian scripts Devanagari, Bangla and English. The present databases include respectively 22,556 and 23,392 handwritten isolated numeral samples of Devanagari and Bangla collected from real-life situations and these can be made available free of cost to researchers of other academic Institutions. In the proposed scheme, a numeral is subjected to three multilayer perceptron classifiers corresponding to three coarse-to-fine resolution levels in a cascaded manner. If rejection occurred even at the highest resolution, another multilayer perceptron is used as the final attempt to recognize the input numeral by combining the outputs of three classifiers of the previous stages. This scheme has been extended to the situation when the script of a document is not known a priori or the numerals written on a document belong to different scripts. Handwritten numerals in mixed scripts are frequently found in Indian postal mails and table-form documents.
Dignity is increasingly central to the justificatory logic of US Supreme Court decisions. Yet the perils inherent in this jurisprudence of dignity, which we argue frames the right to dignity as a ...right to recognition, have been overlooked. Understanding dignity as synonymous with recognition clarifies its effects: dignity dethrones the autonomous, rights-bearing individual, instead figuring individuals as intersubjectively vulnerable and dependent upon institutional recognition. Dignity also casts state action as innocent, elides structural harms, and exacerbates injuries of marginalization. Applying our theoretical frame to Obergefell v. Hodges, we argue that the effects of the emerging jurisprudence of dignity are troubling.
Automatic license plate recognition (ALPR) is the extraction of vehicle license plate information from an image or a sequence of images. The extracted information can be used with or without a ...database in many applications, such as electronic payment systems (toll payment, parking fee payment), and freeway and arterial monitoring systems for traffic surveillance. The ALPR uses either a color, black and white, or infrared camera to take images. The quality of the acquired images is a major factor in the success of the ALPR. ALPR as a real-life application has to quickly and successfully process license plates under different environmental conditions, such as indoors, outdoors, day or night time. It should also be generalized to process license plates from different nations, provinces, or states. These plates usually contain different colors, are written in different languages, and use different fonts; some plates may have a single color background and others have background images. The license plates can be partially occluded by dirt, lighting, and towing accessories on the car. In this paper, we present a comprehensive review of the state-of-the-art techniques for ALPR. We categorize different ALPR techniques according to the features they used for each stage, and compare them in terms of pros, cons, recognition accuracy, and processing speed. Future forecasts of ALPR are given at the end.
Palmprint processes a number of unique features for reliable personal recognition. However, different types of palmprint images contain different dominant features. Instead, only some features of the ...palmprint are visible in a palmprint image, whereas the other features may not be notable. For example, the low-resolution palmprint image has visible principal lines and wrinkles. By contrast, the high-resolution palmprint image contains clear ridge patterns and minutiae points. In addition, the three dimensional (3-D) palmprint image possesses curvatures of the palmprint surface. So far, there is no work to summarize the feature extraction of different types of palmprint images. In this paper, we have an aim to completely study the feature extraction and recognition of palmprint. We propose to use a unified framework to classify palmprint images into four categories: 1) the contact-based; 2) contactless; 3) high-resolution; and 4) 3-D palmprint images. Then, we analyze the motivations and theories of the representative extraction and matching methods for different types of palmprint images. Finally, we compare and test the state-of-the-art methods via the widely used palmprint databases, and point out some potential directions for future research.
Emotion recognition is a hot research in modern intelligent systems. The technique is pervasively used in autonomous vehicles, remote medical service, and human–computer interaction (HCI). ...Traditional speech emotion recognition algorithms cannot be effectively generalized since both training and testing data are from the same domain, which have the same data distribution. In practice, however, speech data is acquired from different devices and recording environments. Thus, the data may differ significantly in terms of language, emotional types and tags. To solve such problem, in this work, we propose a bimodal fusion algorithm to realize speech emotion recognition, where both facial expression and speech information are optimally fused. We first combine the CNN and RNN to achieve facial emotion recognition. Subsequently, we leverage the MFCC to convert speech signal to images. Therefore, we can leverage the LSTM and CNN to recognize speech emotion. Finally, we utilize the weighted decision fusion method to fuse facial expression and speech signal to achieve speech emotion recognition. Comprehensive experimental results have demonstrated that, compared with the uni-modal emotion recognition, bimodal features-based emotion recognition achieves a better performance.
•Subsequently, we leverage the MFCC to convert speech signal to images.•We utilize the weighted decision fusion method to fuse facial expression and speech signal to achieve speech emotion recognition.•Comprehensive experimental results have demonstrated that, compared with the uni-modal emotion recognition, bimodal features-based emotion recognition achieves a better performance.
Which facial features allow human observers to successfully recognize expressions of emotion? While the eyes and mouth have been frequently shown to be of high importance, research on facial action ...units has made more precise predictions about the areas involved in displaying each emotion. The present research investigated on a fine-grained level, which physical features are most relied on when decoding facial expressions. In the experiment, individual faces expressing the basic emotions according to Ekman were hidden behind a mask of 48 tiles, which was sequentially uncovered. Participants were instructed to stop the sequence as soon as they recognized the facial expression and assign it the correct label. For each part of the face, its contribution to successful recognition was computed, allowing to visualize the importance of different face areas for each expression. Overall, observers were mostly relying on the eye and mouth regions when successfully recognizing an emotion. Furthermore, the difference in the importance of eyes and mouth allowed to group the expressions in a continuous space, ranging from sadness and fear (reliance on the eyes) to disgust and happiness (mouth). The face parts with highest diagnostic value for expression identification were typically located in areas corresponding to action units from the facial action coding system. A similarity analysis of the usefulness of different face parts for expression recognition demonstrated that faces cluster according to the emotion they express, rather than by low-level physical features. Also, expressions relying more on the eyes or mouth region were in close proximity in the constructed similarity space. These analyses help to better understand how human observers process expressions of emotion, by delineating the mapping from facial features to psychological representation.
Recognizing 3D objects in the presence of noise, varying mesh resolution, occlusion and clutter is a very challenging task. This paper presents a novel method named Rotational Projection Statistics ...(RoPS). It has three major modules: local reference frame (LRF) definition, RoPS feature description and 3D object recognition. We propose a novel technique to define the LRF by calculating the scatter matrix of all points lying on the local surface. RoPS feature descriptors are obtained by rotationally projecting the neighboring points of a feature point onto 2D planes and calculating a set of statistics (including low-order central moments and entropy) of the distribution of these projected points. Using the proposed LRF and RoPS descriptor, we present a hierarchical 3D object recognition algorithm. The performance of the proposed LRF, RoPS descriptor and object recognition algorithm was rigorously tested on a number of popular and publicly available datasets. Our proposed techniques exhibited superior performance compared to existing techniques. We also showed that our method is robust with respect to noise and varying mesh resolution. Our RoPS based algorithm achieved recognition rates of 100, 98.9, 95.4 and 96.0 % respectively when tested on the Bologna, UWA, Queen’s and Ca’ Foscari Venezia Datasets.