Shadows cause a reduction of text legibility and contrast of the overall document, leading to poor recognition rates by Optical Character Recognition (OCR). Shadows were cast during photo capturing ...because of object interferences due to projection angle from illumination sources and are challenging to handle by OCR. This work proposes an automated technique to detect the shadows caused by object interferences and eliminate them from document images. The proposed algorithm generates two reference masks of shadow and non-shadow regions using a global thresholding method to detect and remove shadow. Initially, the document image is pre-processed, followed by mask generation for shadow and non-shadow regions, and finally, a global thresholding strategy that removes the shadow using an image mapping technique. To evaluate the effectiveness of the proposed approach, a dataset of real-world smartphone-captured document images of about 300 is considered. The datasets encompass Mobile Dataset 3 and Mobile Dataset 4 from DIBCO datasets. Experiments are conducted to quantify the efficiency of the proposed approach using performance measures such as recognition rate by OCR, presentation score, legibility score, skew, and overall document quality score. From the results, the proposed approach collectively produces an average recognition rate of 87.15% and 81.50% towards word-level recognition of Mobile Datasets 3 and 4. The text quality of the document image post shadow removal procedure achieves a higher performance score with an average presentation score of 0.35, legibility score of 0.78, skew as 73.45 degrees from the horizontal axis, and overall document quality score above six towards datasets considered. Based on qualitative analysis, comparative analysis proves that the proposed is simple yet more proficient than existing methods.
•A deep multimodal fusion (DMF) model is developed for sand gravel moisture content prediction.•The DMF model effectively extracts and fuses information from image, spectral, and dielectric data.•The ...DMF model shows superior performance on sand gravel multimodal datasets.•The DMF model maintains good robustness when the dataset was disturbed by general noise.
A fast and accurate moisture content (MC) measurement of sand gravel is essential for hydraulic engineering project sites. Most existing measurement methods are unimodal, facing non-robust against external interference. To address this issue, a deep multimodal fusion (DMF) model for measuring the MC of sand gravel using images, near-infrared (NIR) spectra, and dielectric data, is proposed. A modified bottleneck transformer network (BoTNet) added with an extremely efficient spatial pyramid (EESP) block is first proposed to extract image features from different receptive fields. The improved convolutional neural network with attention blocks added (A-CNN) and gated recurrent unit with attention blocks added (A-GRU) networks are then adopted to extract local and sequential features from NIR spectra, respectively. The square root of dielectric data and above multimodal features are effectively fused according to their contribution to the target indicator in the Fusion module. Among other comparative models, the DMF model yielded the best performance (R2 = 0.962, RMSE = 0.645, RPD = 5.124) on the original sand gravel dataset, and still maintained the best accuracy (the average R2 and RPD mostly exceeded 0.85 and 2.5, respectively) when against general external noise.
We conducted an extensive and robust analysis of 28 convolutional neural networks (CNN) based methods for the detection and counting of tilapia larvae (Oreochromis niloticus niloticus (Linnaeus, ...1758)) in Petri dishes. Experiments were carried out in the western region of Paraná, Brazil, using a smartphone, positioned in a prototype developed especially to support this application. A data set comprising 301 images and 6.195 larvae in the fish reproduction phase was built. These images were divided using cross-validation stratified into five folds. Among the evaluated methods considering 140 experiments, Faster R-CNN R50-FPN 2X and Grid R-CNN-X101–32X4d-FPN 2X provided the best results, with a mean average precision (mAP)50 97.30%. Given the wide availability of smartphones, we conclude that the presented procedure can be a valuable tool in detecting and counting tilapia larvae.
This work proposes a novel hierarchical classification framework designed to categorize hundred Indian medicinal plant species. The innovation lies in introducing a comprehensive feature ...representation by integrating convolutional features with geometric, texture, shape, and multispectral features for classification tasks. In this study, a two-level hierarchical plant classification model is proposed to address the challenges of inter-class similarity and intra-class variations. The first level classifies 100 medicinal plant species into 11 groups based on visual similarities among the plants. At level two, the specific plant species containing in each group are predicted using Random Forest classifier. The evaluation is performed at two levels to analyze the effectiveness of the proposed model. The performance analysis compares the effectiveness of individual feature types against the composite feature model. Performance is also evaluated based on specific groups that demonstrate high similarity between classes and intra-class variations among the plant species separately. Furthermore, the generality of the model is tested using two self-created datasets-RTL80 and RTP40, requiring more than 300 man-hours to collect. Experimental results demonstrate a promising accuracy of 94.54% on GSL100 leaf dataset and 75.46% on RTL80 and RTP40 real-time datasets reflecting the superiority of the proposed hierarchical model over state-of-the-art methods.
•A hierarchical classification model is proposed for hundred Indian medicinal plant species.•A robust fusion feature model is devised to handle inter-class similarities and intra-class variations.•Self-created database of 13,536 image samples of hundred distinct plant species.•Level-wise and feature-wise evaluation of the hierarchical classification model.•Assessment of proposed model towards real-world plant datasets captured in natural background.
•Using smartphones in aquiculture research for faster and more efficient processes.•Feasibility of using photos from smartphones in oocyte counting.•New approach based on the combination of SLIC and ...SVM for counting lambari oocytes.•Accuracy rate above 97% for counting oocytes from the Astyanax bimaculatus species.
This work proposes a computer vision procedure for counting Twospot astyanax (Astyanax bimaculatus) oocytes in Petri dishes using images captured by smartphone. First, the proposed procedure uses simple linear iterative clustering (SLIC) to divide the images into groups of pixels (superpixels). Then, based on their color and space characteristics, the images are classified into light background, dark background, dirt, or oocyte by a machine learning algorithm. Five different types of machine learning algorithms were tested: support vector machines (SVM), decision trees using the algorithm J48 and random forest, k-nearest neighbors (k-NN), and Naive Bayes. To train the algorithms, 8.578 superpixels were classified by an expert into oocyte (n = 354), dirtiness (n = 651), dark background (n = 3.622), and light background (n = 3.951). Of the five learning algorithms, SVM obtained the best result with 97% correct oocyte recognition. Given the wide availability of smartphones, we therefore conclude that the presented procedure can be a valuable tool in future experiments and studies on fertilization and hatching success in Twospot astyanax.
This study assessed the reliability of smartphone images of plaque-disclosed anterior teeth for evaluating plaque scores among preschool children. Additionally, the reliability of plaque scores ...recorded from smartphone images of anterior teeth in representing the overall clinical plaque score was also assessed. Fifteen preschool children were recruited for this pilot study. The Simplified Debris Index (DI-S), the debris component of the Simplified Oral Hygiene Index, was used to record the plaque score. A plaque-disclosing tablet was used to disclose the plaque before the plaque score recording. Following that, the image of the anterior teeth (canine to canine) of both the upper and lower arch was captured using the smartphone. Each child had three different DI-S recorded. For the first recording of the overall clinical DI-S, the plaque score was recorded clinically from index teeth 55 (buccal), 51 (labial), 65 (buccal), 71 (labial), 75 (lingual) and 85 (lingual). For the second recording, anterior clinical DI-S, the plaque score was recorded clinically from the labial surfaces of six anterior teeth only (53, 51, 63, 73, 71 and 83). Two weeks later, anterior photographic DI-S (third recording) was done using the smartphone images of the same index teeth used for the second recording. The intra-class correlation coefficient (ICC) was calculated to evaluate the reliability of smartphone images in assessing plaque scores. The results showed high reliability (ICC = 0.987) between anterior clinical and anterior photographic examinations, indicating that smartphone images are highly reliable for evaluating plaque scores. Similarly, high reliability (ICC = 0.981) was also found for comparison between overall clinical DI-S and anterior photographic DI-S, indicating plaque scores recorded from smartphone images of anterior teeth alone can represent the overall clinical plaque score. This study suggests that smartphone images can be a valuable tool for remote screening and monitoring of oral hygiene in preschool children, contributing to better oral health outcomes.
Prior aquatic animal image classification research focused on distinguishing external features in controlled settings, utilizing either digital cameras or webcams. Identifying visually similar ...species, like Short mackerel (Rastrelliger brachysoma) and Indian mackerel (Rastrelliger kanagurta), is challenging without specialized knowledge. However, advancements in computer technology have paved the way for leveraging machine learning and deep learning systems to address such challenges. In this study, transfer learning techniques were employed, utilizing established pre-trained models such as ResNet50, Xception, InceptionV3, VGG19, VGG16, and MobileNetV3Small. These models were applied to differentiate between the two species using raw images captured by a smartphone under uncontrolled conditions. The core architecture of the pre-trained models remained unchanged, except for the removal of the final fully connected layer. Instead, a global average pooling layer and two dense layers were appended at the end, comprising 1024 units and by a single unit, respectively. To mitigate overfitting concerns, early stopping was implemented. The results revealed that, among the models assessed, the Xception model exhibited the most promising predictive performance. It achieved the highest average accuracy levels of 0.849 and 0.754 during training and validation, surpassing the other models. Furthermore, fine-tuning the Xception model by extending the number of epochs yielded more impressive outcomes. After 30 epochs of fine-tuning, the Xception model demonstrated optimal performance, reaching an accuracy of 0.843 and displaying a 11.508% improvement in predictions compared to the model without fine-tuning. These findings highlight the efficacy of transfer learning, particularly with the Xception model, in accurately distinguishing visually similar aquatic species using smartphone-captured images, even in uncontrolled conditions.
Stay cable is the major load-carrying element in cable-stayed bridges. The process of monitoring cable forces would be beneficial to ensure the safety of bridges. The conventional sensor-based ...approaches to measure stay cable forces is complicated in operation, time-consuming and relatively expensive. In order to confront these disadvantages, a lightweight measurement method using smartphone imagery was proposed in this paper. The video data acquisition process was first standardized by using a pre-designed target. Then, a novel algorithm to extract the vibration displacement of stay cables under complex condition was developed. An automatic correction algorithm was provided to further improve the displacement results. On top of that, a smartphone-based software for determining cable forces was developed and tested on a real-life bridge. The results showed a maximum error of 1.99% compared with the cable force obtained by using a dynamic tester. The developed software is proven to be feasible in real-life projects and can achieve high accuracy in cable force determination. At the same time, the proposed method does not require a fixed camera for measurement and is not limited by personnel experience and measurement time, facilitating real-time monitoring of multiple projects, multiple cable surfaces and multiple personnel in a visual vibration environment.
A complete building model reconstruction needs data collected from both air and ground. The former often has sparse coverage on building façades, while the latter usually is unable to observe the ...building rooftops. Attempting to solve the missing data issues in building reconstruction from single data source, we describe an approach for complete building reconstruction that integrates airborne LiDAR data and ground smartphone imagery. First, by taking advantages of GPS and digital compass information embedded in the image metadata of smartphones, we are able to find airborne LiDAR point clouds for the corresponding buildings in the images. In the next step, Structure-from-Motion and dense multi-view stereo algorithms are applied to generate building point cloud from multiple ground images. The third step extracts building outlines respectively from the LiDAR point cloud and the ground image point cloud. An automated correspondence between these two sets of building outlines allows us to achieve a precise registration and combination of the two point clouds, which ultimately results in a complete and full resolution building model. The developed approach overcomes the problem of sparse points on building façades in airborne LiDAR and the deficiency of rooftops in ground images such that the merits of both datasets are utilized.