This paper proposes a new method, i.e., weighted hierarchical depth motion maps (WHDMM) + three-channel deep convolutional neural networks (3ConvNets), for human action recognition from depth maps on ...small training datasets. Three strategies are developed to leverage the capability of ConvNets in mining discriminative features for recognition. First, different viewpoints are mimicked by rotating the 3-D points of the captured depth maps. This not only synthesizes more data, but also makes the trained ConvNets view-tolerant. Second, WHDMMs at several temporal scales are constructed to encode the spatiotemporal motion patterns of actions into 2-D spatial structures. The 2-D spatial structures are further enhanced for recognition by converting the WHDMMs into pseudocolor images. Finally, the three ConvNets are initialized with the models obtained from ImageNet and fine-tuned independently on the color-coded WHDMMs constructed in three orthogonal planes. The proposed algorithm was evaluated on the MSRAction3D, MSRAction3DExt, UTKinect-Action, and MSRDailyActivity3D datasets using cross-subject protocols. In addition, the method was evaluated on the large dataset constructed from the above datasets. The proposed method achieved 2-9% better results on most of the individual datasets. Furthermore, the proposed method maintained its performance on the large dataset, whereas the performance of existing methods decreased with the increased number of actions.
The problem of human detection is to automatically locate people in an image or video sequence and has been actively researched in the past decade. This paper aims to provide a comprehensive survey ...on the recent development and challenges of human detection. Different from previous surveys, this survey is organised in the thread of human object descriptors. This approach has advantages in providing a thorough analysis of the state-of-the-art human detection methods and a guide to the selection of appropriate methods in practical applications. In addition, challenges such as occlusion and real-time human detection are analysed. The commonly used evaluation of human detection methods such as the datasets, tools, and performance measures are presented and future research directions are highlighted.
•A review on the state-of-the-art of human detection.•This review is organised in the thread of human object descriptors.•Challenges such as occlusion and real-time human detection are analysed.•The commonly used datasets, tools, and performance measures are presented.•Open issues and future research directions are highlighted.•A guide to the selection of detection methods for applications is provided.
This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as dynamic depth images (DDI), dynamic depth normal images (DDNI), and dynamic ...depth motion normal images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time, and DDNI and DDMNI exploit the 3-D structural information captured by depth maps. Upon the proposed representations, a convolutional neural network (ConvNet)-based method is developed for action recognition. The image-based representations enable us to fine-tune the existing ConvNet models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the large-scale continuous gesture recognition dataset (means the Jaccard index 0.4109), the large-scale isolated gesture recognition dataset (<inline-formula> <tex-math notation="LaTeX">\text{59.21}\%</tex-math></inline-formula>), and the NTU RGB+D dataset (<inline-formula> <tex-math notation="LaTeX">\text{87.08}\%</tex-math></inline-formula> cross-subject and <inline-formula> <tex-math notation="LaTeX">\text{84.22}\%</tex-math></inline-formula> cross-view) even though only the depth modality was used.
Human action recognition from RGB-D (Red, Green, Blue and Depth) data has attracted increasing attention since the first work reported in 2010. Over this period, many benchmark datasets have been ...created to facilitate the development and evaluation of new algorithms. This raises the question of which dataset to select and how to use it in providing a fair and objective comparative evaluation against state-of-the-art methods. To address this issue, this paper provides a comprehensive review of the most commonly used action recognition related RGB-D video datasets, including 27 single-view datasets, 10 multi-view datasets, and 7 multi-person datasets. The detailed information and analysis of these datasets is a useful resource in guiding insightful selection of datasets for future research. In addition, the issues with current algorithm evaluation vis-á-vis limitations of the available datasets and evaluation protocols are also highlighted; resulting in a number of recommendations for collection of new datasets and use of evaluation protocols.
•A detailed review and in-depth analysis of 44 publicly available RGB-D-based action datasets.•Recommendations on the selection of datasets and evaluation protocols for use in future research.•Identification of some limitations of these datasets and evaluation protocols.•Recommendations on future creation of datasets and use of evaluation protocols.
This paper proposes novel methods for detecting and separating smoke from a single image frame. Specifically, an image formation model is derived based on the atmospheric scattering models. The ...separation of a frame into quasi-smoke and quasi-background components is formulated as convex optimization that solves a sparse representation problem using dual dictionaries for the smoke and background components, respectively. A novel feature is constructed as a concatenation of the respective sparse coefficients for detection. In addition, a method based on the concept of image matting is developed to separate the true smoke and background components from the smoke detection results. Extensive experiments on detection were conducted and the results showed that the proposed feature significantly outperforms existing features for smoke detection. In particular, the proposed method is able to differentiate smoke from other challenging objects (e.g. fog/haze, cloud, and so on) with similar visual appearance in a gray-scale frame. Experiments on smoke separation also demonstrated that the proposed separation method can effectively estimate/separate the true smoke and background components.
Motivated by the discriminative ability of shape information and local patterns in object recognition, this paper proposes a window-based object descriptor that integrates both cues. In particular, ...contour templates representing object shape are used to derive a set of so-called key points at which local appearance features are extracted. These key points are located using an improved template matching method that utilises both spatial and orientation information in a simple and effective way. At each of the extracted key points, a new local appearance feature, namely non-redundant local binary pattern (NR-LBP), is computed. An object descriptor is formed by concatenating the NR-LBP features from all key points to encode the shape as well as the appearance of the object. The proposed descriptor was extensively tested in the task of detecting humans from static images on the commonly used MIT and INRIA datasets. The experimental results have shown that the proposed descriptor can effectively describe non-rigid objects with high articulation and improve the detection rate compared to other state-of-the-art object descriptors.
► A shape-based sparse object descriptor is proposed. ► Object shape is modelled by templates and detected using template matching. ► Non-redundant local binary pattern is proposed as the local appearance feature. ► The proposed descriptor was evaluated in the task of human detection.
Non-proliferative diabetic retinopathy is the early stage of diabetic retinopathy. Automatic detection of non-proliferative diabetic retinopathy is significant for clinical diagnosis, early screening ...and course progression of patients.
This paper introduces the design and implementation of an automatic system for screening non-proliferative diabetic retinopathy based on color fundus images. Firstly, the fundus structures, including blood vessels, optic disc and macula, are extracted and located, respectively. In particular, a new optic disc localization method using parabolic fitting is proposed based on the physiological structure characteristics of optic disc and blood vessels. Then, early lesions, such as microaneurysms, hemorrhages and hard exudates, are detected based on their respective characteristics. An equivalent optical model simulating human eyes is designed based on the anatomical structure of retina. Main structures and early lesions are reconstructed in the 3D space for better visualization. Finally, the severity of each image is evaluated based on the international criteria of diabetic retinopathy.
The system has been tested on public databases and images from hospitals. Experimental results demonstrate that the proposed system achieves high accuracy for main structures and early lesions detection. The results of severity classification for non-proliferative diabetic retinopathy are also accurate and suitable.
Our system can assist ophthalmologists for clinical diagnosis, automatic screening and course progression of patients.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
As a noninvasive and quantitative method, fluorescence molecular tomography (FMT) has many potential applications in biomedical field. It has the power to resolve in three-dimension (3D), the ...molecular processes in small animal in-vivo in both theory and practice. This paper proposes to solve the problem of reconstruction error and speed by using stacked auto-encoders (SAE). A finite element method (FEM) solution to the Laplace transformed time-domain coupled diffusion equations is employed as the forward model. The reconstruction model is formulated under the framework of SAE. Numerical simulation experiments were conducted to compare the reconstruction results of SAE and algebraic reconstruction technique (ART). We demonstrated that the proposed reconstruction algorithm can retrieve the positions and shapes of the targets more accurately than ART. This advantage of SAE is especially reflected in the reconstruction for small targets with a radius of 2 mm and 3 mm.
Automatic detection of micro-aneurysm in color retinal image is important for early screening and diagnosis of diabetic retinopathy. In this paper, a new method is proposed for micro-aneurysm ...detection based on circular bilateral Gabor filtering. Firstly, a circular bilateral Gabor filter is developed to extract micro-aneurysm candidates. Secondly, false positives are reduced by eliminating small vessels through a process involving local gradient analysis. The proposed method is tested on the retinal images from the Retinopathy Online Challenge database and Tianjin Medical University Metabolic Diseases Hospital. Evaluation results at both image and lesion level demonstrate the efficacy of the proposed method in detecting micro-aneurysm accurately.
•Automatic generations of SSM does not satisfy clinical applications.•A prototype based SSM building method is proposed to make landmarks configurable.•Remeshing and diffeomorphic registration are ...used to optimize the correspondence.•The SSMs built by proposal are as better as manually did.
Automatic segmentation of organs from medical images is indispensable for the computer-assisted medical applications. Statistical Shape Models (SSMs) based scheme has been developed as an accurate and robust approach for extraction of anatomical structures, in which a crucial step is the need to place the sampled points (landmarks) with well corresponding across the whole training set. On the one hand, the correspondence of landmarks is related the quality of shape model. On the other hand, in clinical application some key positions of landmarks should be specified by physicians referring to the anatomic structure. In this paper, we develop an interactive method to build SSM that the landmark distribution can be modified manually without influencing the model quality. We extend an existing remeshing method to produce a model prototype in advance and surface features driven registration to insure the universal optimization of correspondence. The key landmarks are fixed during the prototype generation. We experimented and evaluated the proposed SSM method for lung regions, the deformations of which are considerable large.