The non-linear least squares (NLLS) algorithm is widely used in localization systems. Its performance approaches the Cramer-Rao lower bound under i.i.d. additive white Gaussian noise. However, when ...the initial position chosen is not close enough to the actual target position, the NLLS algorithm will very likely diverge. The non-iterative method of moments estimator does not have this divergence problem, but it performs worse than NLLS and requires at least one more anchor to linearize the range measurement equations. In this paper, we develop a coarse position estimation algorithm based on scaling by majorizing a complicated function for time-difference-of-arrival localization, which is robust with regard to the initial position and does not require redundant receivers.
During the last several years, high-density genotyping SNP arrays have facilitated genome-wide association studies (GWAS) that successfully identified common genetic variants associated with a ...variety of phenotypes. However, each of the identified genetic variants only explains a very small fraction of the underlying genetic contribution to the studied phenotypic trait. Moreover, discordance observed in results between independent GWAS indicates the potential for Type I and II errors. High reliability of genotyping technology is needed to have confidence in using SNP data and interpreting GWAS results. Therefore, reproducibility of two widely genotyping technology platforms from Affymetrix and Illumina was assessed by analyzing four technical replicates from each of the six individuals in five laboratories. Genotype concordance of 99.40% to 99.87% within a laboratory for the sample platform, 98.59% to 99.86% across laboratories for the same platform, and 98.80% across genotyping platforms was observed. Moreover, arrays with low quality data were detected when comparing genotyping data from technical replicates, but they could not be detected according to venders' quality control (QC) suggestions. Our results demonstrated the technical reliability of currently available genotyping platforms but also indicated the importance of incorporating some technical replicates for genotyping QC in order to improve the reliability of GWAS results. The impact of discordant genotypes on association analysis results was simulated and could explain, at least in part, the irreproducibility of some GWAS findings when the effect size (i.e. the odds ratio) and the minor allele frequencies are low.
Large amounts of mammalian protein-protein interaction (PPI) data have been generated and are available for public use. From a systems biology perspective, Proteins/genes interactions encode the key ...mechanisms distinguishing disease and health, and such mechanisms can be uncovered through network analysis. An effective network analysis tool should integrate different content-specific PPI databases into a comprehensive network format with a user-friendly platform to identify key functional modules/pathways and the underlying mechanisms of disease and toxicity.
atBioNet integrates seven publicly available PPI databases into a network-specific knowledge base. Knowledge expansion is achieved by expanding a user supplied proteins/genes list with interactions from its integrated PPI network. The statistically significant functional modules are determined by applying a fast network-clustering algorithm (SCAN: a Structural Clustering Algorithm for Networks). The functional modules can be visualized either separately or together in the context of the whole network. Integration of pathway information enables enrichment analysis and assessment of the biological function of modules. Three case studies are presented using publicly available disease gene signatures as a basis to discover new biomarkers for acute leukemia, systemic lupus erythematosus, and breast cancer. The results demonstrated that atBioNet can not only identify functional modules and pathways related to the studied diseases, but this information can also be used to hypothesize novel biomarkers for future analysis.
atBioNet is a free web-based network analysis tool that provides a systematic insight into proteins/genes interactions through examining significant functional modules. The identified functional modules are useful for determining underlying mechanisms of disease and biomarker discovery. It can be accessed at: http://www.fda.gov/ScienceResearch/BioinformaticsTools/ucm285284.htm.
Several different microarray platforms are available for measuring gene expression. There are disagreements within the microarray scientific community for intra- and inter-platform consistency of ...these platforms. Both high and low consistencies were demonstrated across different platforms in terms of genes with significantly differential expression. Array studies for gene expression are used to explore biological causes and effects. Therefore, consistency should eventually be evaluated in a biological setting to reveal the functional differences between the examined samples, not just a list of differentially expressed genes (DEG). In this study, we investigated whether different platforms had a high consistency from the biologically functional perspective.
DEG data without filtering the different probes in microarrays from different platforms generated from kidney samples of rats treated with the kidney carcinogen, aristolochic acid, in five test sites using microarrays from Affymetrix, Applied Biosystems, Agilent, and GE health platforms (two sites using Affymetrix for intra-platform comparison) were input into the Ingenuity Pathway Analysis (IPA) system for functional analysis. The functions of the DEG lists determined by IPA were compared across the four different platforms and two test sites for Affymetrix platform. Analysis results showed that there is a very high level of consistency between the two test sites using the same platform or among different platforms. The top functions determined by the different platforms were very similar and reflected carcinogenicity and toxicity of aristolochic acid in the rat kidney.
Our results demonstrate that highly consistent biological information can be generated from different microarray platforms.
Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms SNPs) across the entire human genome that are associated with phenotypic traits such ...as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.
Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.
Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.
The accuracy of indoor positioning systems could be significantly reduced by non-line-of-sight (NLOS) propagation. The bulk of existing work on NLOS error mitigation for time-of-arrival (TOA) systems ...assumes that the NLOS links can be identified and/or the NLOS error statistics are known. To avoid requiring such information that is often unavailable in practice, recent work has applied convex optimization for NLOS error mitigation. However, convex optimization for NLOS error mitigation in TOA systems is often an infeasible problem. A strategy to reduce the infeasible problem probability is to relax the constraints for the optimization at the expenses of a reduced positioning accuracy. In this paper, we develop a soft-minimum method for NLOS error mitigation TOA systems. The major advantages of the proposed method include: 1) like existing convex optimization schemes, it does not require any a priori information about NLOS links or NLOS error statistics; 2) unlike existing convex optimization schemes, it does not have infeasibility issues; and 3) it results in a higher positioning accuracy than with existing convex optimization schemes.
Protein-protein interactions (PPIs) are a critical component for many underlying biological processes. A PPI network can provide insight into the mechanisms of these processes, as well as the ...relationships among different proteins and toxicants that are potentially involved in the processes. There are many PPI databases publicly available, each with a specific focus. The challenge is how to effectively combine their contents to generate a robust and biologically relevant PPI network.
In this study, seven public PPI databases, BioGRID, DIP, HPRD, IntAct, MINT, REACTOME, and SPIKE, were used to explore a powerful approach to combine multiple PPI databases for an integrated PPI network. We developed a novel method called k-votes to create seven different integrated networks by using values of k ranging from 1-7. Functional modules were mined by using SCAN, a Structural Clustering Algorithm for Networks. Overall module qualities were evaluated for each integrated network using the following statistical and biological measures: (1) modularity, (2) similarity-based modularity, (3) clustering score, and (4) enrichment.
Each integrated human PPI network was constructed based on the number of votes (k) for a particular interaction from the committee of the original seven PPI databases. The performance of functional modules obtained by SCAN from each integrated network was evaluated. The optimal value for k was determined by the functional module analysis. Our results demonstrate that the k-votes method outperforms the traditional union approach in terms of both statistical significance and biological meaning. The best network is achieved at k = 2, which is composed of interactions that are confirmed in at least two PPI databases. In contrast, the traditional union approach yields an integrated network that consists of all interactions of seven PPI databases, which might be subject to high false positives.
We determined that the k-votes method for constructing a robust PPI network by integrating multiple public databases outperforms previously reported approaches and that a value of k=2 provides the best results. The developed strategies for combining databases show promise in the advancement of network construction and modeling.
The acceptance of microarray technology in regulatory decision-making is being challenged by the existence of various platforms and data analysis methods. A recent report (E. Marshall, Science, 306, ...630-631, 2004), by extensively citing the study of Tan et al. (Nucleic Acids Res., 31, 5676-5684, 2003), portrays a disturbingly negative picture of the cross-platform comparability, and, hence, the reliability of microarray technology.
We reanalyzed Tan's dataset and found that the intra-platform consistency was low, indicating a problem in experimental procedures from which the dataset was generated. Furthermore, by using three gene selection methods (i.e., p-value ranking, fold-change ranking, and Significance Analysis of Microarrays (SAM)) on the same dataset we found that p-value ranking (the method emphasized by Tan et al.) results in much lower cross-platform concordance compared to fold-change ranking or SAM. Therefore, the low cross-platform concordance reported in Tan's study appears to be mainly due to a combination of low intra-platform consistency and a poor choice of data analysis procedures, instead of inherent technical differences among different platforms, as suggested by Tan et al. and Marshall.
Our results illustrate the importance of establishing calibrated RNA samples and reference datasets to objectively assess the performance of different microarray platforms and the proficiency of individual laboratories as well as the merits of various data analysis procedures. Thus, we are progressively coordinating the MAQC project, a community-wide effort for microarray quality control.
Realizing personalized medicine requires integrating diverse data types with bioinformatics. The most vital data are genomic information for individuals that are from advanced next-generation ...sequencing (NGS) technologies at present. The technologies continue to advance in terms of both decreasing cost and sequencing speed with concomitant increase in the amount and complexity of the data. The prodigious data together with the requisite computational pipelines for data analysis and interpretation are stressors to IT infrastructure and the scientists conducting the work alike. Bioinformatics is increasingly becoming the rate-limiting step with numerous challenges to be overcome for translating NGS data for personalized medicine. We review some key bioinformatics tasks, issues, and challenges in contexts of IT requirements, data quality, analysis tools and pipelines, and validation of biomarkers.
Timing-based indoor positioning requires synchronization of the anchors that are distributed in space. Many synchronization techniques have been proposed, but implementing them in practice is ...extremely challenging, especially for networks with a large number of anchors. In this paper, a new synchronization-free localization model for timing-based indoor positioning is proposed. This scheme employs a repeater for a network; if there are many anchors, multiple repeaters can be used, and each repeater works with only a subset of its nearby anchors. We develop the mathematical model for the proposed scheme and derive its Cram'er-Rao lower bound (CRLB). To evaluate the localization performance loss in exchange for the synchronization-free property of the proposed scheme, we compare the CRLBs of the proposed scheme, and time-of-arrival (TOA) and time-difference-of-arrival (TDOA) schemes assuming perfect anchor synchronization. The simulated mean squared error (MSE) of the proposed scheme with nonlinear least-squares (NLLS) method approaches well the CRLB.