Abstract
Background
Batch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect ...correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios.
Results
As part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies.
Conclusions
Multiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
Various laboratory-developed metabolomic methods lead to big challenges in inter-laboratory comparability and effective integration of diverse datasets.
As part of the Quartet Project, we establish a ...publicly available suite of four metabolite reference materials derived from B lymphoblastoid cell lines from a family of parents and monozygotic twin daughters. We generate comprehensive LC-MS-based metabolomic data from the Quartet reference materials using targeted and untargeted strategies in different laboratories. The Quartet multi-sample-based signal-to-noise ratio enables objective assessment of the reliability of intra-batch and cross-batch metabolomics profiling in detecting intrinsic biological differences among the four groups of samples. Significant variations in the reliability of the metabolomics profiling are identified across laboratories. Importantly, ratio-based metabolomics profiling, by scaling the absolute values of a study sample relative to those of a common reference sample, enables cross-laboratory quantitative data integration. Thus, we construct the ratio-based high-confidence reference datasets between two reference samples, providing "ground truth" for inter-laboratory accuracy assessment, which enables objective evaluation of quantitative metabolomics profiling using various instruments and protocols.
Our study provides the community with rich resources and best practices for inter-laboratory proficiency tests and data integration, ensuring reliability of large-scale and longitudinal metabolomic studies.
Abstract
Background
Genomic DNA reference materials are widely recognized as essential for ensuring data quality in omics research. However, relying solely on reference datasets to evaluate the ...accuracy of variant calling results is incomplete, as they are limited to benchmark regions. Therefore, it is important to develop DNA reference materials that enable the assessment of variant detection performance across the entire genome.
Results
We established a DNA reference material suite from four immortalized cell lines derived from a family of parents and monozygotic twins. Comprehensive reference datasets of 4.2 million small variants and 15,000 structural variants were integrated and certified for evaluating the reliability of germline variant calls inside the benchmark regions. Importantly, the genetic built-in-truth of the Quartet family design enables estimation of the precision of variant calls outside the benchmark regions. Using the Quartet reference materials along with study samples, batch effects are objectively monitored and alleviated by training a machine learning model with the Quartet reference datasets to remove potential artifact calls. Moreover, the matched RNA and protein reference materials and datasets from the Quartet project enables cross-omics validation of variant calls from multiomics data.
Conclusions
The Quartet DNA reference materials and reference datasets provide a unique resource for objectively assessing the quality of germline variant calls throughout the whole-genome regions and improving the reliability of large-scale genomic profiling.
Staphylococcus aureus biofilms are a serious problem in the food industry. Wall teichoic acid (WTA) is crucial in S. aureus biofilm formation. Overexpression of the WTA-hydrolyzing enzyme ...glycerophosphoryl diester phosphodiesterase (GlpQ), induced by lactobionic acid (LBA), may be related to biofilm formation. We investigated the relationship between the regulation on GlpQ degradation of WTA by LBA and S. aureus biofilm formation. LBA minimum inhibitory concentration for S. aureus was 12.5 mg/mL. Crystal violet staining revealed the LBA-mediated inhibition of S. aureus adhesion and biofilm formation. RT-qPCR revealed the repressed expression of adhesion-related genes by LBA. Scanning electron microscopy revealed the obvious disruption of S. aureus surface structure, confirming the repression of S. aureus adhesion and biofilm formation by LBA. Native-PAGE results suggested that the WTA content of S. aureus was reduced under the inhibition of LBA. Additionally, LBA induced the overexpression of glpQ. Combined with our previous work, these results suggest that glpQ is induced in S. aureus to function in WTA degradation with the addition of LBA, resulting in decreased WTA content and subsequent reduction of adhesion and biofilm formation. The findings provide new insight into the degradation mechanism of S. aureus WTA and indicate the potential of LBA as an anti-biofilm agent.
Riemerella anatipestifer is reported worldwide as a cause of septicemic and exudative diseases of domestic ducks. In this study, we identified a mutant strain RA2640 by Tn4351 transposon mutagenesis, ...in which the AS87_04050 gene was inactivated by insertion of the transposon. Southern blot analysis indicated that only one insertion was found in the genome of the mutant strain RA2640. SDS-PAGE followed by silver staining showed that the lipopolysaccharide (LPS) pattern of mutant strain RA2640 was different from its wild-type strain Yb2, suggesting the LPS was defected. In addition, the phenotype of the mutant strain RA2640 was changed to rough-type, evident by altered colony morphology, autoaggregation ability and crystal violet staining characteristics. Bacterial LPS is a key factor in virulence as well as in both innate and acquired host responses to infection. The rough-type mutant strain RA2640 showed higher sensitivity to antibiotics, disinfectants and normal duck serum, and higher capability of adherence and invasion to Vero cells, compared to its wild-type strain Yb2. Moreover, the mutant strain RA2640 lost the agglutination ability of its wild-type strain Yb2 to R. anatipestifer serotype 2 positive sera, suggesting that the O-antigen is defected. Animal experiments indicated that the virulence of the mutant strain RA2640 was attenuated by more than 100,000-fold, compared to its wild-type strain Yb2. These results suggested that the AS87_04050 gene in R. anatipestifer is associated with the LPS biosynthesis and bacterial pathogenicity.
Rat is one of the most widely-used models in chemical safety evaluation and biomedical research. However, the knowledge about its microRNA (miRNA) expression patterns across multiple organs and ...various developmental stages is still limited. Here, we constructed a comprehensive rat miRNA expression BodyMap using a diverse collection of 320 RNA samples from 11 organs of both sexes of juvenile, adolescent, adult and aged Fischer 344 rats with four biological replicates per group. Following the Illumina TruSeq Small RNA protocol, an average of 5.1 million 50 bp single-end reads was generated per sample, yielding a total of 1.6 billion reads. The quality of the resulting miRNA-seq data was deemed to be high from raw sequences, mapped sequences, and biological reproducibility. Importantly, aliquots of the same RNA samples have previously been used to construct the mRNA BodyMap. The currently presented miRNA-seq dataset along with the existing mRNA-seq dataset from the same RNA samples provides a unique resource for studying the expression characteristics of existing and novel miRNAs, and for integrative analysis of miRNA-mRNA interactions, thereby facilitating better utilization of rats for biomarker discovery.
We identified a rare missense germline mutation in BARD1 (c.403G>A or p.Asp135Asn) as pathogenic using integrated genomics and transcriptomics profiling of germline and tumor samples from an ...early-onset triple-negative breast cancer patient who later was administrated with a PARP inhibitor for 2 months. We demonstrated in cell and mouse models that, compared to the wild-type, (1) c.403G>A mutant cell lines were more sensitive to irradiation, a DNA damage agent, and a PARP inhibitor; (2) c.403G>A mutation inhibited interaction between BARD1 and RAD51 (but not BRCA1); and (3) c.403G>A mutant mice were hypersensitive to ionizing radiation. Our study shed lights on the clinical interpretation of rare germline mutations of BARD1.
Abstract
Molecular subtyping of triple-negative breast cancer (TNBC) is essential for understanding the mechanisms and discovering actionable targets of this highly heterogeneous type of breast ...cancer. We previously performed a large single-center and multiomics study consisting of genomics, transcriptomics, and clinical information from 465 patients with primary TNBC. To facilitate reusing this unique dataset, we provided a detailed description of the dataset with special attention to data quality in this study. The multiomics data were generally of high quality, but a few sequencing data had quality issues and should be noted in subsequent data reuse. Furthermore, we reconduct data analyses with updated pipelines and the updated version of the human reference genome from hg19 to hg38. The updated profiles were in good concordance with those previously published in terms of gene quantification, variant calling, and copy number alteration. Additionally, we developed a user-friendly web-based database for convenient access and interactive exploration of the dataset. Our work will facilitate reusing the dataset, maximize the values of data and further accelerate cancer research.
Riemerella anatipestifer is one of the most important bacterial pathogen of ducks and causes a contagious septicemia. R. anatipestifer infection causes serositis syndromes similar to other bacterial ...infections in ducks, including infection by Escherichia coli, Salmonella enterica and Pasteurella multocida. Clinically differentiating R. anatipestifer infections from other bacterial pathogen infections is usually difficult. In this study, MAb 1G2F10, a monoclonal antibody against R. anatipestifer GroEL, was used to develop a colloidal gold immunochromatographic strip. Colloidal gold particles were prepared by chemical synthesis to an average diameter of 20 ± 5.26 nm by transmission electron microscope imaging. MAb 1G2F10 was conjugated to colloidal gold particles and the formation of antibody-colloidal gold conjugates was monitored by UV/Vis spectroscopy. Immunochromatographic strips were assembled in regular sequence through different accessories sticked on PVC plate. Strips specifically detected R. anatipestifer within 10 min, but did not detect E. coli, S. enterica and P. multocida. The detection limit for R. anatipestifer was 1 × 10(6) colony forming units, which was 500 times higher than a conventional agglutination test. Accuracy was 100% match to multiplex PCR. Assay stability and reproducibility were excellent after storage at 4°C for 6 months. The immunochromatographic strips prepared in this study offer a specific, sensitive, and rapid detection method for R. anatipestifer, which is of great importance for the prevention and control of R. anatipestifer infections.
Edge intelligent computing devices are often deployed in some extreme environments, where the transmission network bandwidth is low or the network environment changes greatly. Therefore, the ...traditional queue scheduling algorithms cannot guarantee the QoS of edge intelligent computing. WF2Q+ allocates bandwidth according to a fixed weight, which causes real-time data flow delay to increase when the network is unstable. The dynamic perception scheduling strategy proposed in this paper is to dynamically change the weight of WF2Q+ by dynamically sensing the backlog length of the queue. At the same time, combined with the queue scheduling algorithm of PQ, this algorithm can prioritize the transmission of real-time data with certain fairness. In addition, the token bucket algorithm is used to limit the sending rate of the device and prevent network congestion caused by burst data injection into the network. After experimental simulation, the improved algorithm can achieve good results on the delay index.