•We proposed a new algorithm to preprocess huge and imbalanced data.•This algorithm, based on distance calculations, reduce both size and imbalance.•The selective sampling method was conceived for ...parallel and distributed computing.•It was combined with SVM obtaining optimized classification performances.•Synthetic and real data sets were used to evaluate the classifiers performances.
Several applications aim to identify rare events from very large data sets. Classification algorithms may present great limitations on large data sets and show a performance degradation due to class imbalance. Many solutions have been presented in literature to deal with the problem of huge amount of data or imbalancing separately. In this paper we assessed the performances of a novel method, Parallel Selective Sampling (PSS), able to select data from the majority class to reduce imbalance in large data sets. PSS was combined with the Support Vector Machine (SVM) classification. PSS-SVM showed excellent performances on synthetic data sets, much better than SVM. Moreover, we showed that on real data sets PSS-SVM classifiers had performances slightly better than those of SVM and RUSBoost classifiers with reduced processing times. In fact, the proposed strategy was conceived and designed for parallel and distributed computing. In conclusion, PSS-SVM is a valuable alternative to SVM and RUSBoost for the problem of classification by huge and imbalanced data, due to its accurate statistical predictions and low computational complexity.
The exploitation of fishery resources acts as a driving force on cetaceans both directly, by determining their fishing mortality or injury as by-catch species, and indirectly, through the lowering ...the availability of their prey. This competitive overlap between fishing and cetaceans often results in inadequate solutions so that in some cases there have been cases of intentional cetacean culling to maximize fishing production. A modelling approach applied to investigate the ecological roles of cetaceans in the food web could prove more effective to integrate ecological and fishing aspects and to provide suggestions for management. The comparative analysis carried out in the Gulf of Taranto (Northern Ionian Sea, Central Mediterranean Sea) showed that fishing exploitation provides impacts on the investigated food web greater than those due to cetacean predation. Trawling was estimated to be the most negatively impacting fishing gear considering the mortality rates and consumption flows. On the other hand, the striped dolphin was the main impact on the food web due to its highest consumption flows. Analysis showed a negative and non-selective impact on the exploited species due to the fishing gears, while the odontocetes proved to select their prey species and provide a positive impact in the assemblage. In particular, while the fishing gears are primarily size selective, targeting mostly large and economically valuable fish, the odontocetes seem to follow a co-evolution process with their prey, developing a specialization in their resources, providing control of the meso-consumers and ensuring a trophic stability in the ecosystem.
Although the Mediterranean Sea is a crucial hotspot in marine biodiversity, it has been threatened by numerous anthropogenic pressures. As flagship species, Cetaceans are exposed to those ...anthropogenic impacts and global changes. Assessing their conservation status becomes strategic to set effective management plans. The aim of this paper is to understand the habitat requirements of cetaceans, exploiting the advantages of a machine-learning framework. To this end, 28 physical and biogeochemical variables were identified as environmental predictors related to the abundance of three odontocete species in the Northern Ionian Sea (Central-eastern Mediterranean Sea). In fact, habitat models were built using sighting data collected for striped dolphins Stenella coeruleoalba, common bottlenose dolphins Tursiops truncatus, and Risso's dolphins Grampus griseus between July 2009 and October 2021. Random Forest was a suitable machine learning algorithm for the cetacean abundance estimation. Nitrate, phytoplankton carbon biomass, temperature, and salinity were the most common influential predictors, followed by latitude, 3D-chlorophyll and density. The habitat models proposed here were validated using sighting data acquired during 2022 in the study area, confirming the good performance of the strategy. This study provides valuable information to support management decisions and conservation measures in the EU marine spatial planning context.
Little is known about the immunoediting process in precancerous lesions. We explored this aspect of benign colorectal adenomas with a descriptive analysis of the immune pathways and immune cells ...whose regulation is linked to the morphology and size of these lesions. Two series of polypoid and nonpolypoid colorectal adenomas were used in this study: 1) 84 samples (42 lesions, each with matched samples of normal mucosa) whose gene expression data were used to quantify the tumor morphology- and size-related dysregulation of immune pathways collected in the Molecular Signature Database, using Gene Set Enrichment Analysis; 2) 40 other lesions examined with immunohistochemistry to quantify the presence of immune cells in the stromal compartment. In the analysis of transcriptomic data, 429 immune pathways displayed significant differential regulation in neoplasms of different morphology and size. Most pathways were significantly upregulated or downregulated in polypoid lesions versus nonpolypoid lesions (regardless of size). Differential pathway regulation associated with lesion size was observed only in polypoid neoplasms. These findings were mirrored by tissue immunostaining with CD4, CD8, FOXP3, MHC-I, CD68, and CD163 antibodies: stromal immune cell counts (mainly T lymphocytes and macrophages) were significantly higher in polypoid lesions. Certain markers displayed significant size-related differences regardless of lesion morphology. Multivariate analysis of variance showed that the marker panel clearly discriminated between precancerous lesions of different morphologies and sizes. Statistical analysis of immunostained cell counts fully support the results of the transcriptomic data analysis: the density of infiltration of most immune cells in the stroma of polypoid precancerous lesions was significantly higher than that observed in nonpolypoid lesions. Large neoplasms also have more immune cells in their stroma than small lesions. Immunoediting in precancerous colorectal tumors may vary with lesion morphology and stage of development, and this variability could influence a given lesion's trajectory to cancer.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Anemia is one of the global public health problems that affect children and pregnant women. Anemia occurs when the level of red blood cells within the body decreases or when the structure of the red ...blood cells is destroyed or when the Hb level in the red blood cell is below the normal threshold, which results from one or more increased red cell destructions, blood loss, defective cell production or a depleted sum of Red Blood Cells.
The method used in this study is divided into three phases: the datasets were gathered, which is the palm, pre-processed the image, which comprised; Extracted images, and augmented images, segmented the Region of Interest of the images and acquired their various components of the CIE L*a*b* colour space (also referred to as the CIELAB), and finally developed the proposed models for the detection of anemia using the various algorithms, which include CNN, k-NN, Nave Bayes, SVM, and Decision Tree. The experiment utilized 527 initial datasets, rotation, flipping and translation were utilized and augmented the dataset to 2635. We randomly divided the augmented dataset into 70%, 10%, and 20% and trained, validated and tested the models respectively.
The results of the study justify that the models performed appropriately when the palm is used to detect anemia, with the Naïve Bayes achieving a 99.96% accuracy while the SVM achieved the lowest accuracy of 96.34%, as the CNN also performed better with an accuracy of 99.92% in detecting anemia.
The invasive method of detecting anemia is expensive and time-consuming; however, anemia can be detected through the use of non-invasive methods such as machine learning algorithms which is efficient, cost-effective and takes less time. In this work, we compared machine learning models such as CNN, k-NN, Decision Tree, Naïve Bayes, and SVM to detect anemia using images of the palm. Finally, the study supports other similar studies on the potency of the Machine Learning Algorithm as a non-invasive method in detecting iron deficiency anemia.
Photo identification is an essential method to identify cetaceans, by using natural marks over their body, and allows experts to acquire straightforward information on these animals. The importance ...of cetaceans lies in te fact that they play a crucial role in maintaining the healthiness of marine ecosystems, however they are exposed to several anthropogenic stressors, under which they could collapse with extreme consequences on the marine ecosystem functioning. Hence, obtaining new knowledge on their status is extremely urgent for the marine biodiversity conservation. The smart use of technology to automate the individual recognition can speed up the photo identification process, opening the door to large-scale studies that are manually unfeasible. We performed a systematic review on systems based on machine learning and statistical methods for cetacean photo identification, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. This review highlights that interest has been increasing in recent years and several intelligent systems have been presented. However, there are still some open questions, and further efforts to develop more effective automated systems for cetacean photo identification are recommended.
The analysis of high-throughput gene expression data with respect to sets of genes rather than individual genes has many advantages. A variety of methods have been developed for assessing the ...enrichment of sets of genes with respect to differential expression. In this paper we provide a comparative study of four of these methods: Fisher's exact test, Gene Set Enrichment Analysis (GSEA), Random-Sets (RS), and Gene List Analysis with Prediction Accuracy (GLAPA). The first three methods use associative statistics, while the fourth uses predictive statistics. We first compare all four methods on simulated data sets to verify that Fisher's exact test is markedly worse than the other three approaches. We then validate the other three methods on seven real data sets with known genetic perturbations and then compare the methods on two cancer data sets where our a priori knowledge is limited.
The simulation study highlights that none of the three method outperforms all others consistently. GSEA and RS are able to detect weak signals of deregulation and they perform differently when genes in a gene set are both differentially up and down regulated. GLAPA is more conservative and large differences between the two phenotypes are required to allow the method to detect differential deregulation in gene sets. This is due to the fact that the enrichment statistic in GLAPA is prediction error which is a stronger criteria than classical two sample statistic as used in RS and GSEA. This was reflected in the analysis on real data sets as GSEA and RS were seen to be significant for particular gene sets while GLAPA was not, suggesting a small effect size. We find that the rank of gene set enrichment induced by GLAPA is more similar to RS than GSEA. More importantly, the rankings of the three methods share significant overlap.
The three methods considered in our study recover relevant gene sets known to be deregulated in the experimental conditions and pathologies analyzed. There are differences between the three methods and GSEA seems to be more consistent in finding enriched gene sets, although no method uniformly dominates over all data sets. Our analysis highlights the deep difference existing between associative and predictive methods for detecting enrichment and the use of both to better interpret results of pathway analysis. We close with suggestions for users of gene set methods.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
The residency pattern and site fidelity of Risso’s dolphin
Grampus griseus
were studied using sightings data collected during standardized vessel-based surveys carried out from 2013 to 2018 in the ...Gulf of Taranto (Northern Ionian Sea, Central-Eastern Mediterranean Sea). The photo-identification of 91 individuals of
G. griseus
and the occurrence of re-sighted dolphins up to 8 times provide the evidence that this mid-sized odontocetes persistently occurred in the study area. The presence of transient individuals, visitors, and seasonal residents in the Gulf of Taranto, along with the 29 multi-year resident Risso’s dolphins observed from 2 to 4 times on different years, indicated some degree of inter-annual variability in its temporal use of the area. Moreover, the occurrence of newborns and calves through the study period suggests the possibility to consider the Gulf of Taranto as both nursery and feeding area for females caring for their offspring. The importance of the Gulf of Taranto as critical habitat for the Risso’s dolphin as well as the common bottlenose dolphin and the striped dolphin candidates this area as potential Important Marine Mammal Area, in which enforce specific conservation measures are aimed to mitigate the anthropogenic pressure on different cetacean species according to ACCOBAMS and the MMPATF indication.
Starting from a digital image that represents the dolphin's body, distinctive features are extracted and used to find the identity of the unknown dolphin in a set of known individuals. This process ...is called photo identification, used by experts to monitor dolphins, providing relevant data to preserve the environment and its biodiversity. In this work, we show how semantic segmentation can be used to automatically extract a dolphin's fin contour starting from a cropped photo of the fin, and how this contour can be used for individual identification. A novel contour-based system, called ARIANNA, for the automated cetacean photo identification was designed, developed and tested. The novelty of this system is the adoption of two original modules. The first one, which takes as input a new cropped fin image of unknown dolphin, is devoted to the extraction of a mask that depicts the outline of the unknown fin; the core of this module is a trained neural network, specialized in semantic segmentation of images. The second module is designed to compare the outline of the unknown fin with the outlines of all known dolphins, collected in a referring catalogue, returning a ranked list of the best matches where to search the dolphin identity. The experiments were conducted on images collected between 2013 and 2020 in the Northern Ionian Sea (Central-eastern Mediterranean Sea), which presented cropped fins of Risso's dolphin Grampus griseus, one of the least-known cetacean species on a global and Mediterranean scale. The results suggest that ARIANNA provides advances over the state-of-the-art methods, can efficiently assist researchers in the photo identification of dolphins and can be a starting point for further studies on the photo identification of different species based on semantic segmentation.
•Developing of a new intelligent system for the re-identification of unknown dolphins.•Use of neural network specialized in semantic segmentation of dolphin images.•Automated extraction of fin contour of an unknown dolphin used for matching with known individuals.•High performance of the proposed system on real-life scenario images.