Akademska digitalna zbirka SLovenije - logo
E-viri
Recenzirano Odprti dostop
  • Ten simple rules for annota...
    Stevens, Irene; Mukarram, Abdul Kadir; Hörtenhuber, Matthias; Meehan, Terrence F; Rung, Johan; Daub, Carsten O

    PLOS computational biology/PLoS computational biology, 10/2020, Letnik: 16, Številka: 10
    Journal Article

    About the Authors: Irene Stevens * E-mail: irene.stevens@ki.se Affiliations Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden ORCID logo http://orcid.org/0000-0003-3823-1499 Abdul Kadir Mukarram Affiliation: Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden ORCID logo http://orcid.org/0000-0002-9726-0399 Matthias Hörtenhuber Affiliation: Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden ORCID logo http://orcid.org/0000-0002-5599-5565 Terrence F. Meehan Affiliation: European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom Johan Rung Affiliations Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden ORCID logo http://orcid.org/0000-0001-5875-8429 Carsten O. Daub Affiliations Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden, Science for Life Laboratory, Karolinska Institutet, Stockholm, Sweden ORCID logo http://orcid.org/0000-0002-3295-8729 Introduction A file of nucleic acid sequences itself is not descriptive. Furthermore, metadata provides the basis for supervised machine learning algorithms using labeled data and indexing Next Generation Sequencing datasets into public repositories to support database queries and data discovery. ...metadata is key for making data Findable, Accessible, Interoperable, and Reusable (FAIR) 1. Several large-scale sequencing projects, such as the Functional Annotation of the Mammalian Genome (FANTOM5) 13, Encyclopedia of DNA Elements (ENCODE) 14, and the Danio Rerio Encyclopedia of DNA Elements (DANIO-CODE) 15, have established additional metadata models to customarily describe their data in a systematic way that allows for integrative analysis of disparate datasets. Under each section, we defined weights on the terms such as required (e.g., biosample type), conditionally required (e.g., target of a chromatin immunoprecipitation sequencing (ChIP-seq assay)), and optional terms (e.g., chemistry version used for sequencing).