Abstract
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics ...Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC.
The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living ...and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.
Mapping genome-wide data to human subtelomeres has been problematic due to the incomplete assembly and challenges of low-copy repetitive DNA elements. Here, we provide updated human subtelomere ...sequence assemblies that were extended by filling telomere-adjacent gaps using clone-based resources. A bioinformatic pipeline incorporating multiread mapping for annotation of the updated assemblies using short-read data sets was developed and implemented. Annotation of subtelomeric sequence features as well as mapping of CTCF and cohesin binding sites using ChIP-seq data sets from multiple human cell types confirmed that CTCF and cohesin bind within 3 kb of the start of terminal repeat tracts at many, but not all, subtelomeres. CTCF and cohesin co-occupancy were also enriched near internal telomere-like sequence (ITS) islands and the nonterminal boundaries of subtelomere repeat elements (SREs) in transformed lymphoblastoid cell lines (LCLs) and human embryonic stem cell (ES) lines, but were not significantly enriched in the primary fibroblast IMR90 cell line. Subtelomeric CTCF and cohesin sites predicted by ChIP-seq using our bioinformatics pipeline (but not predicted when only uniquely mapping reads were considered) were consistently validated by ChIP-qPCR. The colocalized CTCF and cohesin sites in SRE regions are candidates for mediating long-range chromatin interactions in the transcript-rich SRE region. A public browser for the integrated display of short-read sequence-based annotations relative to key subtelomere features such as the start of each terminal repeat tract, SRE identity and organization, and subtelomeric gene models was established.
Abstract
The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) is a Bioinformatics Resource Center funded by the National Institutes of Health with ...additional funding from the Wellcome Trust. VEuPathDB supports >600 organisms that comprise invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Since 2004, VEuPathDB has analyzed omics data from the public domain using contemporary bioinformatic workflows, including orthology predictions via OrthoMCL, and integrated the analysis results with analysis tools, visualizations, and advanced search capabilities. The unique data mining platform coupled with >3000 pre-analyzed data sets facilitates the exploration of pertinent omics data in support of hypothesis driven research. Comparisons are easily made across data sets, data types and organisms. A Galaxy workspace offers the opportunity for the analysis of private large-scale datasets and for porting to VEuPathDB for comparisons with integrated data. The MapVEu tool provides a platform for exploration of spatially resolved data such as vector surveillance and insecticide resistance monitoring. To address the growing body of omics data and advances in laboratory techniques, VEuPathDB has added several new data types, searches and features, improved the Galaxy workspace environment, redesigned the MapVEu interface and updated the infrastructure to accommodate these changes.
Graphical Abstract
Graphical Abstract
EuPathDB (http://eupathdb.org) resources include 11 databases supporting eukaryotic pathogen genomic and functional genomic data, isolate data and phylogenomics. EuPathDB resources are built using ...the same infrastructure and provide a sophisticated search strategy system enabling complex interrogations of underlying data. Recent advances in EuPathDB resources include the design and implementation of a new data loading workflow, a new database supporting Piroplasmida (i.e. Babesia and Theileria), the addition of large amounts of new data and data types and the incorporation of new analysis tools. New data include genome sequences and annotation, strand-specific RNA-seq data, splice junction predictions (based on RNA-seq), phosphoproteomic data, high-throughput phenotyping data, single nucleotide polymorphism data based on high-throughput sequencing (HTS) and expression quantitative trait loci data. New analysis tools enable users to search for DNA motifs and define genes based on their genomic colocation, view results from searches graphically (i.e. genes mapped to chromosomes or isolates displayed on a map) and analyze data from columns in result tables (word cloud and histogram summaries of column content). The manuscript herein describes updates to EuPathDB since the previous report published in NAR in 2010.
Human subtelomeric segmental duplications ('subtelomeric repeats') comprise about 25% of the most distal 500 kb and 80% of the most distal 100 kb in human DNA. A systematic analysis of the ...duplication substructure of human subtelomeric regions was done in order to develop a detailed understanding of subtelomeric sequence organization and a nucleotide sequence-level characterization of subtelomeric duplicon families.
The extent of nucleotide sequence divergence within subtelomeric duplicon families varies considerably, as does the organization of duplicon blocks at subtelomere alleles. Subtelomeric internal (TTAGGG)n-like tracts occur at duplicon boundaries, suggesting their involvement in the generation of the complex sequence organization. Most duplicons have copies at both subtelomere and non-subtelomere locations, but a class of duplicon blocks is identified that are subtelomere-specific. In addition, a group of six subterminal duplicon families are identified that, together with six single-copy telomere-adjacent segments, include all of the (TTAGGG)n-adjacent sequence identified so far in the human genome.
Identification of a class of duplicon blocks that is subtelomere-specific will facilitate high-resolution analysis of subtelomere repeat copy number variation as well as studies involving somatic subtelomere rearrangements. The significant levels of nucleotide sequence divergence within many duplicon families as well as the differential organization of duplicon blocks on subtelomere alleles may provide opportunities for allele-specific subtelomere marker development; this is especially true for subterminal regions, where divergence and organizational differences are the greatest. These subterminal sequence families comprise the immediate cis-elements for (TTAGGG)n tracts, and are prime candidates for subtelomeric sequences regulating telomere-specific (TTAGGG)n tract length in humans.
Indian muntjac (Muntiacus muntjak vaginalis) has an extreme mammalian karyotype, with only six and seven chromosomes in the female and male, respectively. Chinese muntjac (Muntiacus reevesi) has a ...more typical mammalian karyotype, with 46 chromosomes in both sexes. Despite this disparity, the two muntjac species are morphologically similar and can even interbreed to produce viable (albeit sterile) offspring. Previous studies have suggested that a series of telocentric chromosome fusion events involving telomeric and/or satellite repeats led to the extant Indian muntjac karyotype.
We used a comparative mapping and sequencing approach to characterize the sites of ancestral chromosomal fusions in the Indian muntjac genome. Specifically, we screened an Indian muntjac bacterial artificial-chromosome library with a telomere repeat-specific probe. Isolated clones found by fluorescence in situ hybridization to map to interstitial regions on Indian muntjac chromosomes were further characterized, with a subset then subjected to shotgun sequencing. Subsequently, we isolated and sequenced overlapping clones extending from the ends of some of these initial clones; we also generated orthologous sequence from isolated Chinese muntjac clones. The generated Indian muntjac sequence has been analyzed for the juxtaposition of telomeric and satellite repeats and for synteny relationships relative to other mammalian genomes, including the Chinese muntjac.
The generated sequence data and comparative analyses provide a detailed genomic context for seven ancestral chromosome fusion sites in the Indian muntjac genome, which further supports the telocentric fusion model for the events leading to the unusual karyotypic differences among muntjac species.
Considered in this article is the Cauchy problem of a generalized Korteweg-de Vries equation $\left\{ \matrix {u_t} + {u_{xxx}} + u{u_x} + {\left| {{D_x}} \right|^{2\alpha }}u = 0,t \in {{\Cal R}^ + ...},x \in {\Cal R}, \hfill \cr u\left( {x,0} \right) = \varphi \left( x \right) \hfill \cr \endmatrix \right.$ with 0 ≤ α ≤ 1. The local well-posedness of the Cauchy problem in the homogeneous Sobolev space Hs (ℝ) for $s \in \left( {\frac{{\alpha - 3}}{{2\left( {2 - \alpha } \right)}},0} \right)$ is proved. In addition, the mapping that associated to appropriate initial-data the corresponding solution is analytic as a function between appropriate Banach spaces.
The distribution characteristics of the oil-water contact are the basis for the reservoir exploration and development and reserves evaluation. The reservoir with a tilted oil-water contact has a ...unique formation mechanism, and the understanding of its distribution and formation mechanism will directly affect the evaluations for the reservoir type, well deployment, selection of well pattern and type, determination of test section, and reserves evaluation. Based on the analysis of reservoir characteristics, petrophysical properties and geological structure in 40 reservoirs worldwide with tilted oil-water contacts, the progress of the research on the formation mechanisms of titled oil-water contacts is summarized in terms of the hydrodynamic conditions, reservoir heterogeneity, neotectonic movement and oil-gas exploitation. According to the formation mechanism of tilted oil-water contacts and the needs of exploration research, different aspects of research methods are summarized and classified, such as the calculation of equipotential surfaces for oil and water in the formation, analysis of formation pressure and analysis of reservoir physical properties and so on. Based upon statistical analysis, it is suggested that the degree of the inclination of the oil-water contact be divided based on the dip of oil-water contact(DipTOWC). The tilted oil-water contact is divided into three categories: large dip(DipTOWC≥55 m/km), medium dip(4 m/km≤DipTOWC55 m/km), and small dip(DipTOWC4 m/km). The classification and evaluation method can be combined with structure amplitude and reservoir property. The formation mechanism of domestic and international reservoirs with tilted oil-water contacts are summarized in this paper, which have important significance in guiding the exploration and development of the oilfield with tilted oil-water contacts, reserves evaluation, and well deployment.