Ten simple rules for organizing a webinar series Fadlelmola, Faisal M; Panji, Sumir; Ahmed, Azza E ...
PLOS computational biology/PLoS computational biology,
04/2019, Letnik:
15, Številka:
4
Journal Article
Recenzirano
Odprti dostop
Biosciences eastern and central Africa (BecA-ILRI Hub), International Livestock Research Institute, Nairobi, Kenya Affiliation: South African MRC Bioinformatics Unit, South African National ...Bioinformatics Institute, University of the Western Cape, Bellville 7535, Cape Town, South Africa ORCID logo http://orcid.org/0000-0002-8282-1325 Oussema Souiai Affiliations Laboratory of BioInformatics Biomathematics and bioStatistics, Institut Pasteur de SalTunis, Tunis, Tunisia, Institut supérieur des technologies médicales, Univesité Tunis al Manar, Tunis, Tunisia Nicola Mulder Affiliation: Computational Biology Division, Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa ORCID logo http://orcid.org/0000-0003-4905-0941 H3ABioNet Research working group as members of the H3Africa Consortium ¶Membership list of the H3ABioNet Research working group can be found in the Acknowledgments section. ...choosing webinar themes requires mapping of the target audience needs and interests 7. ...the earlier a regular webinar date and time is settled upon, the earlier it will make its way into attendees’ calendars thereby enabling them to avoid subsequent scheduling conflicts. The webinar coordination team assists with all the planning and logistics for hosting a webinar (Rule 1); Choosing webinar themes requires mapping of the target audience needs and interest (Rule 2); Drafting a webinar planning checklist through regular planning meetings as well as post webinar meetings (Rule 3); Decentralized webinar organization of tasks and resources through accessible shared space (Rule 4); Planing early and settling on the provisional dates and times of the webinar events along with their themes (Rule 5); Choosing and settling on convenient and user friendly webinar platform (Rule 6); Approaching and confirming potential speakers (Rule 7); Obtaining the webinar title, abstract and presenter’s biography for creating the webinar announcement through emails and social media channels (Rule 8); Allocating time for the platform orientation (Rule 9); and Keeping close-up track of webinar metrics for regular assessment and evaluation (Rule 10).
Bioinformatics research is frequently performed using complex workflows with multiple steps, fans, merges, and conditionals. This complexity makes management of the workflow difficult on a computer ...cluster, especially when running in parallel on large batches of data: hundreds or thousands of samples at a time. Scientific workflow management systems could help with that. Many are now being proposed, but is there yet the "best" workflow management system for bioinformatics? Such a system would need to satisfy numerous, sometimes conflicting requirements: from ease of use, to seamless deployment at peta- and exa-scale, and portability to the cloud. We evaluated Swift/T as a candidate for such role by implementing a primary genomic variant calling workflow in the Swift/T language, focusing on workflow management, performance and scalability issues that arise from production-grade big data genomic analyses. In the process we introduced novel features into the language, which are now part of its open repository. Additionally, we formalized a set of design criteria for quality, robust, maintainable workflows that must function at-scale in a production setting, such as a large genomic sequencing facility or a major hospital system. The use of Swift/T conveys two key advantages. (1) It operates transparently in multiple cluster scheduling environments (PBS Torque, SLURM, Cray aprun environment, etc.), thus a single workflow is trivially portable across numerous clusters. (2) The leaf functions of Swift/T permit developers to easily swap executables in and out of the workflow, which makes it easy to maintain and to request resources optimal for each stage of the pipeline. While Swift/T's data-level parallelism eliminates the need to code parallel analysis of multiple samples, it does make debugging more difficult, as is common for implicitly parallel code. Nonetheless, the language gives users a powerful and portable way to scale up analyses in many computing architectures. The code for our implementation of a variant calling workflow using Swift/T can be found on GitHub at https://github.com/ncsa/Swift-T-Variant-Calling, with full documentation provided at http://swift-t-variant-calling.readthedocs.io/en/latest/.
The changing landscape of genomics research and clinical practice has created a need for computational pipelines capable of efficiently orchestrating complex analysis stages while handling large ...volumes of data across heterogeneous computational environments. Workflow Management Systems (WfMSs) are the software components employed to fill this gap. This work provides an approach and systematic evaluation of key features of popular bioinformatics WfMSs in use today: Nextflow, CWL, and WDL and some of their executors, along with Swift/T, a workflow manager commonly used in high-scale physics applications. We employed two use cases: a variant-calling genomic pipeline and a scalability-testing framework, where both were run locally, on an HPC cluster, and in the cloud. This allowed for evaluation of those four WfMSs in terms of language expressiveness, modularity, scalability, robustness, reproducibility, interoperability, ease of development, along with adoption and usage in research labs and healthcare settings. This article is trying to answer, which WfMS should be chosen for a given bioinformatics application regardless of analysis type?. The choice of a given WfMS is a function of both its intrinsic language and engine features. Within bioinformatics, where analysts are a mix of dry and wet lab scientists, the choice is also governed by collaborations and adoption within large consortia and technical support provided by the WfMS team/community. As the community and its needs continue to evolve along with computational infrastructure, WfMSs will also evolve, especially those with permissive licenses that allow commercial use. In much the same way as the dataflow paradigm and containerization are now well understood to be very useful in bioinformatics applications, we will continue to see innovations of tools and utilities for other purposes, like big data technologies, interoperability, and provenance.
Display omitted
•Research funders currently require the development of a DMP within a proposal.•A DMP is essential for tracking, organizing and maintaining large-scale data.•A DMP is a document ...developed at the start of a research project to describe how project data will be managed.•A DMP details how research data will be collected, processed, stored and shared.•Researchers, who develop a DMP, should involve library representatives in the process.
Drafting and writing a data management plan (DMP) is increasingly seen as a key part of the academic research process. A DMP is a document that describes how a researcher will collect, document, describe, share, and preserve the data that will be generated as part of a research project. The DMP illustrates the importance of utilizing best practices through all stages of working with data while ensuring accessibility, quality, and longevity of the data. The benefits of writing a DMP include compliance with funder and institutional mandates; making research more transparent (for reproduction and validation purposes); and FAIR (findable, accessible, interoperable, reusable); protecting data subjects and compliance with the General Data Protection Regulation (GDPR) and/or local data protection policies. In this review, we highlight the importance of a DMP in modern biomedical research, explaining both the rationale and current best practices associated with DMPs. In addition, we outline various funders’ requirements concerning DMPs and discuss open-source tools that facilitate the development and implementation of a DMP. Finally, we discuss DMPs in the context of African research, and the considerations that need to be made in this regard.
The Pan-African bioinformatics network, H3ABioNet, comprises 27 research institutions in 17 African countries. H3ABioNet is part of the Human Health and Heredity in Africa program (H3Africa), an ...African-led research consortium funded by the US National Institutes of Health and the UK Wellcome Trust, aimed at using genomics to study and improve the health of Africans. A key role of H3ABioNet is to support H3Africa projects by building bioinformatics infrastructure such as portable and reproducible bioinformatics workflows for use on heterogeneous African computing environments. Processing and analysis of genomic data is an example of a big data application requiring complex interdependent data analysis workflows. Such bioinformatics workflows take the primary and secondary input data through several computationally-intensive processing steps using different software packages, where some of the outputs form inputs for other steps. Implementing scalable, reproducible, portable and easy-to-use workflows is particularly challenging.
H3ABioNet has built four workflows to support (1) the calling of variants from high-throughput sequencing data; (2) the analysis of microbial populations from 16S rDNA sequence data; (3) genotyping and genome-wide association studies; and (4) single nucleotide polymorphism imputation. A week-long hackathon was organized in August 2016 with participants from six African bioinformatics groups, and US and European collaborators. Two of the workflows are built using the Common Workflow Language framework (CWL) and two using Nextflow. All the workflows are containerized for improved portability and reproducibility using Docker, and are publicly available for use by members of the H3Africa consortium and the international research community.
The H3ABioNet workflows have been implemented in view of offering ease of use for the end user and high levels of reproducibility and portability, all while following modern state of the art bioinformatics data processing protocols. The H3ABioNet workflows will service the H3Africa consortium projects and are currently in use. All four workflows are also publicly available for research scientists worldwide to use and adapt for their respective needs. The H3ABioNet workflows will help develop bioinformatics capacity and assist genomics research within Africa and serve to increase the scientific output of H3Africa and its Pan-African Bioinformatics Network.
Genomics policy development involves assessing a wide range of issues extending from specimen collection and data sharing to whether and how to utilize advanced technologies in clinical practice and ...public health initiatives. A survey was conducted among African scientists and stakeholders with an interest in genomic medicine, seeking to evaluate: 1) Their knowledge and understanding of the field. 2) The institutional environment and infrastructure available to them. 3) The state and awareness of the field in their country. 4) Their perception of potential barriers to implementation of precision medicine. We discuss how the information gathered in the survey could instruct the policies of African institutions seeking to implement precision, and more specifically, genomic medicine approaches in their health care systems in the following areas: 1) Prioritization of infrastructures. 2) Need for translational research. 3) Information dissemination to potential users. 4) Training programs for specialized personnel. 5) Engaging political stakeholders and the public. A checklist with key requirements to assess readiness for implementation of genomic medicine programs is provided to guide the process from scientific discovery to clinical application.
Investigating variation in genes involved in the absorption, distribution, metabolism, and excretion (ADME) of drugs are key to characterizing pharmacogenomic (PGx) relationships. ADME gene variation ...is relatively well characterized in European and Asian populations, but data from African populations are under-studied-which has implications for drug safety and effective use in Africa.
We identified significant ADME gene variation in African populations using data from 458 high-coverage whole genome sequences, 412 of which are novel, and from previously available African sequences from the 1,000 Genomes Project. ADME variation was not uniform across African populations, particularly within high impact coding variation. Copy number variation was detected in 116 ADME genes, with equal ratios of duplications/deletions. We identified 930 potential high impact coding variants, of which most are discrete to a single African population cluster. Large frequency differences (i.e., >10%) were seen in common high impact variants between clusters. Several novel variants are predicted to have a significant impact on protein structure, but additional functional work is needed to confirm the outcome of these for PGx use. Most variants of known clinical outcome are rare in Africa compared to European populations, potentially reflecting a clinical PGx research bias to European populations.
The genetic diversity of ADME genes across sub-Saharan African populations is large. The Southern African population cluster is most distinct from that of far West Africa. PGx strategies based on European variants will be of limited use in African populations. Although established variants are important, PGx must take into account the full range of African variation. This work urges further characterization of variants in African populations including
and
studies, and to consider the unique African ADME landscape when developing precision medicine guidelines and tools for African populations.
There is growing evidence that comprehensive and harmonized metadata are fundamental for effective public data reusability. However, it is often challenging to extract accurate metadata from public ...repositories. Of particular concern is the metagenomic data related to African individuals, which often omit important information about the particular features of these populations. As part of a collaborative consortium, H3ABioNet, we created a web portal, namely the African Human Microbiome Portal (AHMP), exclusively dedicated to metadata related to African human microbiome samples. Metadata were collected from various public repositories prior to cleaning, curation and harmonization according to a pre-established guideline and using ontology terms. These metadata sets can be accessed at https://microbiome.h3abionet.org/. This web portal is open access and offers an interactive visualization of 14 889 records from 70 bioprojects associated with 72 peer reviewed research articles. It also offers the ability to download harmonized metadata according to the user's applied filters. The AHMP thereby supports metadata search and retrieve operations, facilitating, thus, access to relevant studies linked to the African Human microbiome. Database URL: https://microbiome.h3abionet.org/.
Scientific research plays a key role in the advancement of human knowledge and pursuit of solutions to important societal challenges. Typically, research occurs within specific institutions where ...data are generated and subsequently analyzed. Although collaborative science bringing together multiple institutions is now common, in such collaborations the analytical processing of the data is often performed by individual researchers within the team, with only limited internal oversight and critical analysis of the workflow prior to publication. Here, we show how hackathons can be a means of enhancing collaborative science by enabling peer review before results of analyses are published by cross-validating the design of studies or underlying data sets and by driving reproducibility of scientific analyses. Traditionally, in data analysis processes, data generators and bioinformaticians are divided and do not collaborate on analyzing the data. Hackathons are a good strategy to build bridges over the traditional divide and are potentially a great agile extension to the more structured collaborations between multiple investigators and institutions.