Deep learning (DL) algorithms are the state of the art in automated classification of wildlife camera trap images. The challenge is that the ecologist cannot know in advance how many images per ...species they need to collect for model training in order to achieve their desired classification accuracy. In fact there is limited empirical evidence in the context of camera trapping to demonstrate that increasing sample size will lead to improved accuracy.
In this study we explore in depth the issues of deep learning model performance for progressively increasing per class (species) sample sizes. We also provide ecologists with an approximation formula to estimate how many images per animal species they need for certain accuracy level a priori. This will help ecologists for optimal allocation of resources, work and efficient study design.
In order to investigate the effect of number of training images; seven training sets with 10, 20, 50, 150, 500, 1000 images per class were designed. Six deep learning architectures namely ResNet-18, ResNet-50, ResNet-152, DnsNet-121, DnsNet-161, and DnsNet-201 were trained and tested on a common exclusive testing set of 250 images per class. The whole experiment was repeated on three similar datasets from Australia, Africa and North America and the results were compared. Simple regression equations for use by practitioners to approximate model performance metrics are provided. Generalizes additive models (GAM) are shown to be effective in modelling DL performance metrics based on the number of training images per class, tuning scheme and dataset.
Overall, our trained models classified images with 0.94 accuracy (ACC), 0.73 precision (PRC), 0.72 true positive rate (TPR), and 0.03 false positive rate (FPR). Variation in model performance metrics among datasets, species and deep learning architectures exist and are shown distinctively in the discussion section. The ordinary least squares regression models explained 57%, 54%, 52%, and 34% of expected variation of ACC, PRC, TPR, and FPR according to number of images available for training. Generalised additive models explained 77%, 69%, 70%, and 53% of deviance for ACC, PRC, TPR, and FPR respectively.
Predictive models were developed linking number of training images per class, model, dataset to performance metrics. The ordinary least squares regression and Generalised additive models developed provides a practical toolbox to estimate model performance with respect to different numbers of training images.
•Deep Network classifier sample size requirements investigated for long-term wildlife monitoring sites.•Training sample size was linked to model performance metrics such as accuracy.•Logarithmic trends observed between training sample size and accuracy, precision, recall and false positive rate.•Model performance asymptotes with 150–500 images per class providing good accuracy.•The effects of data set, samples per class, model architectures and tuning strategies were investigated and compared.
When making the decision about whether or not to breed a given cow, knowledge about the expected outcome would have an economic impact on profitability of the breeding program and net income of the ...farm. The outcome of each breeding can be affected by many management and physiological features that vary between farms and interact with each other. Hence, the ability of machine learning algorithms to accommodate complex relationships in the data and missing values for explanatory variables makes these algorithms well suited for investigation of reproduction performance in dairy cattle. The objective of this study was to develop a user-friendly and intuitive on-farm tool to help farmers make reproduction management decisions. Several different machine learning algorithms were applied to predict the insemination outcomes of individual cows based on phenotypic and genotypic data. Data from 26 dairy farms in the Alta Genetics (Watertown, WI) Advantage Progeny Testing Program were used, representing a 10-yr period from 2000 to 2010. Health, reproduction, and production data were extracted from on-farm dairy management software, and estimated breeding values were downloaded from the US Department of Agriculture Agricultural Research Service Animal Improvement Programs Laboratory (Beltsville, MD) database. The edited data set consisted of 129,245 breeding records from primiparous Holstein cows and 195,128 breeding records from multiparous Holstein cows. Each data point in the final data set included 23 and 25 explanatory variables and 1 binary outcome for of 0.756±0.005 and 0.736±0.005 for primiparous and multiparous cows, respectively. The naïve Bayes algorithm, Bayesian network, and decision tree algorithms showed somewhat poorer classification performance. An information-based variable selection procedure identified herd average conception rate, incidence of ketosis, number of previous (failed) inseminations, days in milk at breeding, and mastitis as the most effective explanatory variables in predicting pregnancy outcome.
Dairy farm decision support systems (DSS) are tools which help dairy farmers to solve complex problems by improving the decision-making processes. In this paper, we are interested in newer ...generation, integrated DSS (IDSS), which additionally and concurrently: (1) receive continuous data feed from on-farm and off-farm data collection systems and (2) integrate more than one data stream to produce insightful outcomes. The scientific community and the allied dairy community have not been successful in developing, disseminating, and promoting a sustained adoption of IDSS. Thus, this paper identifies barriers to adoption as well as factors that would promote the sustained adoption of IDSS. The main barriers to adoption discussed include perceived lack of a good value proposition, complexities of practical application, and ease of use; and IDSS challenges related to data collection, data standards, data integration, and data shareability. Success in the sustainable adoption of IDSS depends on solving these problems and also addressing intrinsic issues related to the development, maintenance, and functioning of IDSS. There is a need for coordinated action by all the main stakeholders in the dairy sector to realize the potential benefits of IDSS, including all important players in the dairy industry production and distribution chain.
•Machine learning approaches were successfully employed to predict carcass traits in sheep.•Random Forest was the best approach for prediction of carcass traits.•Prediction models introduced in this ...paper are available in ASKBILL™ to assist sheep producers.
Currently hot carcass weight (HCW) and fat score jointly indicate the price grid for sheep meat in Australia. However, experts in the field believe that soon, yield and quality traits such as intramuscular fat (IMF), greville rule fat depth (GRFAT), computed tomography lean meat yield (CTLEAN), and loin weight (LW) are likely to play a role in pricing. Having an accurate prediction of these traits earlier in the life of an animal will allow sheep producers to adjust their management practices in order to achieve the target market requirements. Management, genetics, pasture and climate factors, influence these traits directly and epistatically. Traditional prediction methods may not be powerful enough to capture complex interactions while avoiding overfitting. In this case, learning algorithms that can learn from the current data to predict the animal’s future performance offers promise. In this study, five different types of Machine Learning (ML) algorithm, namely Deep Learning (DL), Gradient Boosting Tree (GBT), K-Nearest Neighbour (KNN), Model Tree (MT), and Random Forest (RF) were employed to predict HCW, IMF, GRFAT, LW and CTLEAN and their performances were compared against linear regression (LR) as the gold standard of multinomial prediction. Four scenarios representing different numbers of weight recordings -from a total of 9 weight measures taken between birth (WT1) and pre-slaughter (WT9)- were used to inform the algorithms and all models were trained and tested under equal conditions with identical training and testing sets. Selection of the most effective subset of predictor features were completed via greedy stepwise search among all the available features jointly with expert opinion. In predicting all the traits, RF was superior while LR and KNN showed the lowest prediction performance. When using the final model for predicting on an independent test set, the scenario with the most accurate prediction performance differed across traits. IMF and GRFAT were most accurately predicted when using birth, weaning, and pre-slaughter weights, while the most accurate scenario for HCW, LW and CTLEAN utilised weaning, six monthly weight measures after weaning and pre-slaughter weight. Across all scenarios the least accurate prediction was for IMF.
Abstract
Background
Late-maturity alpha-amylase (LMA) is a wheat genetic defect causing the synthesis of high isoelectric point alpha-amylase following a temperature shock during mid-grain ...development or prolonged cold throughout grain development, both leading to starch degradation. While the physiology is well understood, the biochemical mechanisms involved in grain LMA response remain unclear. We have applied high-throughput proteomics to 4,061 wheat flours displaying a range of LMA activities. Using an array of statistical analyses to select LMA-responsive biomarkers, we have mined them using a suite of tools applicable to wheat proteins.
Results
We observed that LMA-affected grains activated their primary metabolisms such as glycolysis and gluconeogenesis; TCA cycle, along with DNA- and RNA- binding mechanisms; and protein translation. This logically transitioned to protein folding activities driven by chaperones and protein disulfide isomerase, as well as protein assembly via dimerisation and complexing. The secondary metabolism was also mobilized with the upregulation of phytohormones and chemical and defence responses. LMA further invoked cellular structures, including ribosomes, microtubules, and chromatin. Finally, and unsurprisingly, LMA expression greatly impacted grain storage proteins, as well as starch and other carbohydrates, with the upregulation of alpha-gliadins and starch metabolism, whereas LMW glutenin, stachyose, sucrose, UDP-galactose, and UDP-glucose were downregulated.
Conclusions
To our knowledge, this is not only the first proteomics study tackling the wheat LMA issue but also the largest plant-based proteomics study published to date. Logistics, technicalities, requirements, and bottlenecks of such an ambitious large-scale high-throughput proteomics experiment along with the challenges associated with big data analyses are discussed.
Replacement decisions have a major effect on dairy farm profitability. Dynamic programming (DP) has been widely studied to find the optimal replacement policies in dairy cattle. However, DP models ...are computationally intensive and might not be practical for daily decision making. Hence, the ability of applying machine learning on a prerun DP model to provide fast and accurate predictions of nonlinear and intercorrelated variables makes it an ideal methodology. Milk class (1 to 5), lactation number (1 to 9), month in milk (1 to 20), and month of pregnancy (0 to 9) were used to describe all cows in a herd in a DP model. Twenty-seven scenarios based on all combinations of 3 levels (base, 20% above, and 20% below) of milk production, milk price, and replacement cost were solved with the DP model, resulting in a data set of 122,716 records, each with a calculated retention pay-off (RPO). Then, a machine learning model tree algorithm was used to mimic the evaluated RPO with DP. The correlation coefficient factor was used to observe the concordance of RPO evaluated by DP and RPO predicted by the model tree. The obtained correlation coefficient was 0.991, with a corresponding value of 0.11 for relative absolute error. At least 100 instances were required per model constraint, resulting in 204 total equations (models). When these models were used for binary classification of positive and negative RPO, error rates were 1% false negatives and 9% false positives. Applying this trained model from simulated data for prediction of RPO for 102 actual replacement records from the University of Wisconsin-Madison dairy herd resulted in a 0.994 correlation with 0.10 relative absolute error rate. Overall results showed that model tree has a potential to be used in conjunction with DP to assist farmers in their replacement decisions.
The common practice on most commercial dairy farms is to inseminate all cows that are eligible for breeding, while ignoring (or absorbing) the costs associated with semen and labor directed toward ...low-fertility cows that are unlikely to conceive. Modern analytical methods, such as machine learning algorithms, can be applied to cow-specific explanatory variables for the purpose of computing probabilities of success or failure associated with upcoming insemination events. Lift chart analysis can identify subsets of high fertility cows that are likely to conceive and are therefore appropriate targets for insemination (e.g., with conventional artificial insemination semen or expensive sex-enhanced semen), as well as subsets of low-fertility cows that are unlikely to conceive and should therefore be passed over at that point in time. Although such a strategy might be economically viable, the management, environmental, and financial conditions on one farm might differ widely from conditions on the next, and hence the reproductive management recommendations derived from such a tool may be suboptimal for specific farms. When coupled with cost-sensitive evaluation of misclassified and correctly classified insemination events, the strategy can be a potentially powerful tool for optimizing the reproductive management of individual farms. In the present study, lift chart analysis and cost-sensitive evaluation were applied to a data set consisting of 54,806 insemination events of primiparous Holstein cows on 26 Wisconsin farms, as well as a data set with 17,197 insemination events of primiparous Holstein cows on 3 Wisconsin farms, where the latter had more detailed information regarding health events of individual cows. In the first data set, the gains in profit associated with limiting inseminations to subsets of 79 to 97% of the most fertile eligible cows ranged from $0.44 to $2.18 per eligible cow in a monthly breeding period, depending on days in milk at breeding and milk yield relative to contemporaries. In the second data set, the strategy of inseminating only a subset consisting of 59% of the most fertile cows conferred a gain in profit of $5.21 per eligible cow in a monthly breeding period. These results suggest that, when used with a powerful classification algorithm, lift chart analysis and cost-sensitive evaluation of correctly classified and misclassified insemination events can enhance the performance and profitability of reproductive management programs on commercial dairy farms.
Developing machine learning and soft computing techniques has provided many opportunities for researchers to establish new analytical methods in different areas of science. The objective of this ...study is to investigate the potential of two types of intelligent learning methods, artificial neural networks and neuro-fuzzy systems, in order to estimate breeding values (EBV) of Iranian dairy cattle. Initially, the breeding values of lactating Holstein cows for milk and fat yield were estimated using conventional best linear unbiased prediction (BLUP) with an animal model. Once that was established, a multilayer perceptron was used to build ANN to predict breeding values from the performance data of selection candidates. Subsequently, fuzzy logic was used to form an NFS, a hybrid intelligent system that was implemented via a local linear model tree algorithm. For milk yield the correlations between EBV and EBV predicted by the ANN and NFS were 0.92 and 0.93, respectively. Corresponding correlations for fat yield were 0.93 and 0.93, respectively. Correlations between multitrait predictions of EBVs for milk and fat yield when predicted simultaneously by ANN were 0.93 and 0.93, respectively, whereas corresponding correlations with reference EBV for multitrait NFS were 0.94 and 0.95, respectively, for milk and fat production.
Fast and cost-effective prediction models are increasingly in demand for commercial use. Prediction of the outcomes of insemination events as successes or failures based on explanatory variables ...related to genetic predisposition, health history, and lactation performance can have an impact on decision-making on dairy farms. However, interactions between management and physiological features are very complex. Machine learning algorithms can be useful for understanding these complex interactions and developing tools that will help farmers make accurate reproductive management decisions. Results of this study showed that random forests have the best performance in predicting the outcome of an insemination event and that health records of the cow are very important in this prediction. Optimizing classification rate without taking into account the cost of classification errors can be misleading. Nevertheless, the cost of not breeding a cow that would have conceived is much higher than the cost of breeding a cow that would not conceive. The common practice on most commercial dairy farms is to inseminate all cows that are eligible for breeding, which is debatable. In conjunction with a lift chart analysis, which guides selection of subsets of highly or lowly fertile animals with highest and lowest probabilities of conception, the approach described herein could successfully stratify the pool of eligible cows in order to use different breeding strategies or use semen with different prices in different subsets of eligible cows in order to maximize total economic gain, as well as profit per eligible cow. This approach can enhance profitability of the dairy farm if sufficient data regarding variables that affect insemination outcomes are available. Fuzzy expert systems are distinguished from other black boxed non-parametric methods, such as random forests and artificial neural networks, because they are easy to understand and interpret. There is lack of research on rule-based methods for genomic selection, because knowledge acquisition in such a complex and highly dimensional space is a limiting factor. In this dissertation, a hybrid fuzzy expert system, which uses genetic algorithms and particle swarm optimization as knowledge acquisition tools from the data was introduced for prediction of daughter pregnancy rate in Holstein bulls.
Deep learning (DL) algorithms are the state of the art in automated classification of wildlife camera trap images. The challenge is that the ecologist cannot know in advance how many images per ...species they need to collect for model training in order to achieve their desired classification accuracy. In fact there is limited empirical evidence in the context of camera trapping to demonstrate that increasing sample size will lead to improved accuracy. In this study we explore in depth the issues of deep learning model performance for progressively increasing per class (species) sample sizes. We also provide ecologists with an approximation formula to estimate how many images per animal species they need for certain accuracy level a priori. This will help ecologists for optimal allocation of resources, work and efficient study design. In order to investigate the effect of number of training images; seven training sets with 10, 20, 50, 150, 500, 1000 images per class were designed. Six deep learning architectures namely ResNet-18, ResNet-50, ResNet-152, DnsNet-121, DnsNet-161, and DnsNet-201 were trained and tested on a common exclusive testing set of 250 images per class. The whole experiment was repeated on three similar datasets from Australia, Africa and North America and the results were compared. Simple regression equations for use by practitioners to approximate model performance metrics are provided. Generalized additive models (GAM) are shown to be effective in modelling DL performance metrics based on the number of training images per class, tuning scheme and dataset. Key-words: Camera Traps, Deep Learning, Ecological Informatics, Generalised Additive Models, Learning Curves, Predictive Modelling, Wildlife.