Abstract
Successful development of biological databases requires accommodation of the burgeoning amounts of data from high-throughput genomics pipelines. As the volume of curated data in Animal QTLdb ...(https://www.animalgenome.org/QTLdb) increases exponentially, the resulting challenges must be met with rapid infrastructure development to effectively accommodate abundant data curation and make metadata analysis more powerful. The development of Animal QTLdb and CorrDB for the past 15 years has provided valuable tools for researchers to utilize a wealth of phenotype/genotype data to study the genetic architecture of livestock traits. We have focused our efforts on data curation, improved data quality maintenance, new tool developments, and database co-developments, in order to provide convenient platforms for users to query and analyze data. The database currently has 158 499 QTL/associations, 10 482 correlations and 1977 heritability data as a result of an average 32% data increase per year. In addition, we have made >14 functional improvements or new tool implementations since our last report. Our ultimate goals of database development are to provide infrastructure for data collection, curation, and annotation, and more importantly, to support innovated data structure for new types of data mining, data reanalysis, and networked genetic analysis that lead to the generation of new knowledge.
Gram-negative bacteria such as Escherichia coli (E. coli) are assumed to be among the main agents that cause severe mastitis disease with clinical signs in dairy cattle. Rapid detection of this ...disease is so important in order to prevent transmission to other cows and helps to reduce inappropriate use of antibiotics. With the rapid progress in high-throughput technologies, and accumulation of various kinds of '-omics' data in public repositories, there is an opportunity to retrieve, integrate, and reanalyze these resources to improve the diagnosis and treatment of different diseases and to provide mechanistic insights into host resistance in an efficient way. Meta-analysis is a relatively inexpensive option with good potential to increase the statistical power and generalizability of single-study analysis. In the current meta-analysis research, six microarray-based studies that investigate the transcriptome profile of mammary gland tissue after induced mastitis by E. coli infection were used. This meta-analysis not only reinforced the findings in individual studies, but also several novel terms including responses to hypoxia, response to drug, anti-apoptosis and positive regulation of transcription from RNA polymerase II promoter enriched by up-regulated genes. Finally, in order to identify the small sets of genes that are sufficiently informative in E. coli mastitis, the differentially expressed gene introduced by meta-analysis were prioritized by using ten different attribute weighting algorithms. Twelve meta-genes were detected by the majority of attribute weighting algorithms (with weight above 0.7) as most informative genes including CXCL8 (IL8), NFKBIZ, HP, ZC3H12A, PDE4B, CASP4, CXCL2, CCL20, GRO1(CXCL1), CFB, S100A9, and S100A8. Interestingly, the results have been demonstrated that all of these genes are the key genes in the immune response, inflammation or mastitis. The Decision tree models efficiently discovered the best combination of the meta-genes as bio-signature and confirmed that some of the top-ranked genes -ZC3H12A, CXCL2, GRO, CFB- as biomarkers for E. coli mastitis (with the accuracy 83% in average). This research properly indicated that by combination of two novel data mining tools, meta-analysis and machine learning, increased power to detect most informative genes that can help to improve the diagnosis and treatment strategies for E. coli associated with mastitis in cattle.
The domestication and development of cattle has considerably impacted human societies, but the histories of cattle breeds and populations have been poorly understood especially for African, Asian, ...and American breeds. Using genotypes from 43,043 autosomal single nucleotide polymorphism markers scored in 1,543 animals, we evaluate the population structure of 134 domesticated bovid breeds. Regardless of the analytical method or sample subset, the three major groups of Asian indicine, Eurasian taurine, and African taurine were consistently observed. Patterns of geographic dispersal resulting from co-migration with humans and exportation are recognizable in phylogenetic networks. All analytical methods reveal patterns of hybridization which occurred after divergence. Using 19 breeds, we map the cline of indicine introgression into Africa. We infer that African taurine possess a large portion of wild African auroch ancestry, causing their divergence from Eurasian taurine. We detect exportation patterns in Asia and identify a cline of Eurasian taurine/indicine hybridization in Asia. We also identify the influence of species other than Bos taurus taurus and B. t. indicus in the formation of Asian breeds. We detect the pronounced influence of Shorthorn cattle in the formation of European breeds. Iberian and Italian cattle possess introgression from African taurine. American Criollo cattle originate from Iberia, and not directly from Africa with African ancestry inherited via Iberian ancestors. Indicine introgression into American cattle occurred in the Americas, and not Europe. We argue that cattle migration, movement and trading followed by admixture have been important forces in shaping modern bovine genomic variation.
Abstract
The Animal QTLdb (https://www.animalgenome.org/QTLdb) and CorrDB (https://www.animalgenome.org/CorrDB) are unique resources for livestock animal genetics and genomics research which have ...been used extensively by the international livestock genome research community. This is largely due to the active development of the databases over the years to keep up with the rapid advancement of genome sciences. The ongoing development has ensured that these databases provide researchers not only with continually updated data but also with new web tools to disseminate the data. Through our continued efforts, the databases have evolved from the original Pig QTLdb for cross-experiment QTL data comparisons to an Animal QTLdb hosting 220 401 QTL, SNP association and eQTL data linking phenotype to genotype for 2210 traits. In addition, there are 23 552 correlations for 866 traits and 4273 heritability data on 1069 traits in CorrDB. All these data were curated from 3157 publications that cover seven livestock species. Along with the continued data curation, new species, additional genome builds, and new functions and features have been built into the databases as well. Standardized procedures to support data mapping on multiple species/genome builds and the ability to browse data based on linked ontology terms are highlights of the recent developments.
The Animal QTL database (QTLdb; http://www.animalgenome.org/QTLdb) is designed to house all publicly available QTL and single-nucleotide polymorphism/gene association data on livestock animal ...species. An earlier version was published in the Nucleic Acids Research Database issue in 2007. Since then, we have continued our efforts to develop new and improved database tools to allow more data types, parameters and functions. Our efforts have transformed the Animal QTLdb into a tool that actively serves the research community as a quality data repository and more importantly, a provider of easily accessible tools and functions to disseminate QTL and gene association information. The QTLdb has been heavily used by the livestock genomics community since its first public release in 2004. To date, there are 5920 cattle, 3442 chicken, 7451 pigs, 753 sheep and 88 rainbow trout data points in the database, and at least 290 publications that cite use of the database. The rapid advancement in genomic studies of cattle, chicken, pigs, sheep and other livestock animals has presented us with challenges, as well as opportunities for the QTLdb to meet the evolving needs of the research community. Here, we report our progress over the recent years and highlight new functions and services available to the general public.
The Animal QTL Database (QTLdb; http://www.animalgenome.org/QTLdb) has undergone dramatic growth in recent years in terms of new data curated, data downloads and new functions and tools. We have ...focused our development efforts to cope with challenges arising from rapid growth of newly published data and end users' data demands, and to optimize data retrieval and analysis to facilitate users' research. Evidenced by the 27 releases in the past 11 years, the growth of the QTLdb has been phenomenal. Here we report our recent progress which is highlighted by addition of one new species, four new data types, four new user tools, a new API tool set, numerous new functions and capabilities added to the curator tool set, expansion of our data alliance partners and more than 20 other improvements. In this paper we present a summary of our progress to date and an outlook regarding future directions.
The availability of the bovine genome sequence and SNP panels has improved various genomic analyses, from exploring genetic diversity to aiding genetic selection. However, few of the SNP on the ...bovine chips are polymorphic in buffalo, therefore a panel of single nucleotide DNA markers exclusive for buffalo was necessary for molecular genetic analyses and to develop genomic selection approaches for water buffalo. The creation of a 90K SNP panel for river buffalo and testing in a genome wide association study for milk production is described here.
The genomes of 73 buffaloes of 4 different breeds were sequenced and aligned against the bovine genome, which facilitated the identification of 22 million of sequence variants among the buffalo genomes. Based on frequencies of variants within and among buffalo breeds, and their distribution across the genome, inferred from the bovine genome sequence, 90,000 putative single nucleotide polymorphisms were selected to create an Axiom® Buffalo Genotyping Array 90K.
This 90K "SNP-Chip" was tested in several river buffalo populations and found to have ∼70% high quality and polymorphic SNPs. Of the 90K SNPs about 24K were also found to be polymorphic in swamp buffalo. The SNP chip was used to investigate the structure of buffalo populations, and could distinguish buffalo from different farms. A Genome Wide Association Study identified genomic regions on 5 chromosomes putatively involved in milk production.
The 90K buffalo SNP chip described here is suitable for the analysis of the genomes of river buffalo breeds, and could be used for genetic diversity studies and potentially as a starting point for genome-assisted selection programmes. This SNP Chip could also be used to analyse swamp buffalo, but many loci are not informative and creation of a revised SNP set specific for swamp buffalo would be advised.
Intramuscular fat (IMF) content is related to insulin resistance, which is an important prediction factor for disorders, such as cardiovascular disease, obesity and type 2 diabetes in human. At the ...same time, it is an economically important trait, which influences the sensorial and nutritional value of meat. The deposition of IMF is influenced by many factors such as sex, age, nutrition, and genetics. In this study Nellore steers (Bos taurus indicus subspecies) were used to better understand the molecular mechanisms involved in IMF content. This was accomplished by identifying differentially expressed genes (DEG), biological pathways and putative regulatory factors. Animals included in this study had extreme genomic estimated breeding value (GEBV) for IMF. RNA-seq analysis, gene set enrichment analysis (GSEA) and co-expression network methods, such as partial correlation coefficient with information theory (PCIT), regulatory impact factor (RIF) and phenotypic impact factor (PIF) were utilized to better understand intramuscular adipogenesis. A total of 16,101 genes were analyzed in both groups (high (H) and low (L) GEBV) and 77 DEG (FDR 10%) were identified between the two groups. Pathway Studio software identified 13 significantly over-represented pathways, functional classes and small molecule signaling pathways within the DEG list. PCIT analyses identified genes with a difference in the number of gene-gene correlations between H and L group and detected putative regulatory factors involved in IMF content. Candidate genes identified by PCIT include: ANKRD26, HOXC5 and PPAPDC2. RIF and PIF analyses identified several candidate genes: GLI2 and IGF2 (RIF1), MPC1 and UBL5 (RIF2) and a host of small RNAs, including miR-1281 (PIF). These findings contribute to a better understanding of the molecular mechanisms that underlie fat content and energy balance in muscle and provide important information for the production of healthier beef for human consumption.
Integration of high throughput DNA genotyping and RNA-sequencing data allows for the identification of genomic regions that control gene expression, known as expression quantitative trait loci ...(eQTL), on a whole genome scale. Intramuscular fat (IMF) content and carcass composition play important roles in metabolic and physiological processes in mammals because they influence insulin sensitivity and consequently prevalence of metabolic diseases such as obesity and type 2 diabetes. However, limited information is available on the genetic variants and mechanisms associated with IMF deposition in mammals. Thus, our hypothesis was that eQTL analyses could identify putative regulatory regions and transcription factors (TFs) associated with intramuscular fat (IMF) content traits.
We performed an integrative eQTL study in skeletal muscle to identify putative regulatory regions and factors associated with intramuscular fat content traits. Data obtained from skeletal muscle samples of 192 animals was used for association analysis between 461,466 SNPs and the transcription level of 11,808 genes. This yielded 1268 cis- and 10,334 trans-eQTLs, among which we identified nine hotspot regions that each affected the expression of > 119 genes. These putative regulatory regions overlapped with previously identified QTLs for IMF content. Three of the hotspots respectively harbored the transcription factors USF1, EGR4 and RUNX1T1, which are known to play important roles in lipid metabolism. From co-expression network analysis, we further identified modules significantly correlated with IMF content and associated with relevant processes such as fatty acid metabolism, carbohydrate metabolism and lipid metabolism.
This study provides novel insights into the link between genotype and IMF content as evident from the expression level. It thereby identifies genomic regions of particular importance and associated regulatory factors. These new findings provide new knowledge about the biological processes associated with genetic variants and mechanisms associated with IMF deposition in mammals.
Furthermore, to date, most of the datasets are from tissues consisting of heterogeneous cell populations, hindering the resolution of functional information and limiting our ability to understand the ...fundamental cellular and subcellular processes underlying phenotypes. Since the original FAANG white paper was published in 2015 2, exciting new opportunities have arisen to tackle these challenges. Most of these causal variants, with small effects, are likely to be located in regulatory sequences and impact complex traits through changes in gene expression 4. ...it is expected that improvements in prediction accuracy can be achieved by filtering the genetic marker information based upon whether the genetic variants reside in functional sequences and developing robust prediction models that can accommodate the biological priors. The GTEx consortium (https://gtexportal.org/home/) has achieved this very effectively across human tissues, enabling expression QTL (eQTL) studies linking gene expression to genetic variation 7 and providing a framework for FAANG to develop a similar project for farmed animals (FAANGGTEx). ...providing new opportunities for informed management decisions during an animal’s lifetime (e.g. to optimise diets or for steering animals into the most appropriate production systems).