In microarray data analysis, we are often required to combine several dependent partial test results. To overcome this, many suggestions have been made in previous literature; Tippett's test and ...Fisher's omnibus test are most popular. Both tests have known null distributions when the partial tests are independent. However, for dependent tests, their (even, asymptotic) null distributions are unknown and additional numerical procedures are required. In this paper, we revisited Stouffer's test based on z-scores and showed its advantage over the two aforementioned methods in the analysis of large-scale microarray data. The combined statistic in Stouffer's test has a normal distribution with mean 0 from the normality of the z-scores. Its variance can be estimated from the scores of genes in the experiment without an additional numerical procedure. We numerically compared the errors of Stouffer's test and the two p-value based methods, Tippett's test and Fisher's omnibus test. We also analyzed our microarray data to find differentially expressed genes by non-genotoxic and genotoxic carcinogen compounds. Both numerical study and the real application showed that Stouffer's test performed better than Tippett's method and Fisher's omnibus method with additional permutation steps.
Summary
The fused lasso signal approximator (FLSA) is a smoothing procedure for noisy observations that uses fused lasso penalty on unobserved mean levels to find sparse signal blocks. Several path ...algorithms have been developed to obtain the whole solution path of the FLSA. However, it is known that the FLSA has model selection inconsistency when the underlying signals have a stair‐case block, where three consecutive signal blocks are either strictly increasing or decreasing. Modified path algorithms for the FLSA have been proposed to guarantee model selection consistency regardless of the stair‐case block. In this paper, we provide a comprehensive review of the path algorithms for the FLSA and prove the properties of the recently modified path algorithms' hitting times. Specifically, we reinterpret the modified path algorithm as the path algorithm for local FLSA problems and reveal the condition that the hitting time for the fusion of the modified path algorithm is not monotone in a tuning parameter. To recover the monotonicity of the solution path, we propose a pathwise adaptive FLSA having monotonicity with similar performance as the modified solution path algorithm. Finally, we apply the proposed method to the number of daily‐confirmed cases of COVID‐19 in Korea to identify the change points of its spread.
Rare binary events data arise frequently in medical research. Due to lack of statistical power in individual studies involving such data, meta‐analysis has become an increasingly important tool for ...combining results from multiple independent studies. However, traditional meta‐analysis methods often report severely biased estimates in such rare‐event settings. Moreover, many rely on models assuming a pre‐specified direction for variability between control and treatment groups for mathematical convenience, which may be violated in practice. Based on a flexible random‐effects model that removes the assumption about the direction, we propose new Bayesian procedures for estimating and testing the overall treatment effect and inter‐study heterogeneity. Our Markov chain Monte Carlo algorithm employs Pólya‐Gamma augmentation so that all conditionals are known distributions, greatly facilitating computational efficiency. Our simulation shows that the proposed approach generally reports less biased and more stable estimates compared to existing methods. We further illustrate our approach using two real examples, one using rosiglitazone data from 56 studies and the other using stomach ulcers data from 41 studies.
In multiple instance learning (MIL), a bag represents a sample that has a set of instances, each of which is described by a vector of explanatory variables, but the entire bag only has one ...label/response. Though many methods for MIL have been developed to date, few have paid attention to interpretability of models and results. The proposed Bayesian regression model stands on two levels of hierarchy, which transparently show how explanatory variables explain and instances contribute to bag responses. Moreover, two selection problems are simultaneously addressed; the instance selection to find out the instances in each bag responsible for the bag response, and the variable selection to search for the important covariates. To explore a joint discrete space of indicator variables created for selection of both explanatory variables and instances, the shotgun stochastic search algorithm is modified to fit in the MIL context. Also, the proposed model offers a natural and rigorous way to quantify uncertainty in coefficient estimation and outcome prediction, which many modern MIL applications call for. The simulation study shows the proposed regression model can select variables and instances with high performance (AUC greater than 0.86), thus predicting responses well. The proposed method is applied to the musk data for prediction of binding strengths (labels) between molecules (bags) with different conformations (instances) and target receptors. It outperforms all existing methods, and can identify variables relevant in modeling responses.
The real-time reverse-transcript polymerase chain reaction (RT-PCR) test is a widely used laboratory technique that is highly sensitive and reliable for measuring the quantification of gene ...expression levels and diagnosing various of diseases, including COVID-19. The RT-PCR experiments often have correlated technical replicates of a small number of samples. However, current statistical analysis of RT-PCR assumes a large sample size and does not account for correlated structure across the replicates. In this paper, we review popular statistical methods for analyzing RT-PCR data and propose a permutation method that accounts for the small sample size and the correlated structure of RT-PCR data. Our proposed method provides a more accurate and efficient analysis of RT-PCR data. We provide an R program to implement our method for practitioners.
•Current statistical analysis of RT-PCR assumes a large sample size and does not account for correlated structure across the replicates.•We propose a permutation method that accounts for both the small sample size and replicated structure of the RT-PCR data.•We have created a Github website where practitioners can obtain an R program used in this paper.
The medium‐throughput mRNA abundance platform NanoString nCounter has gained great popularity in the past decade, due to its high sensitivity and technical reproducibility as well as remarkable ...applicability to ubiquitous formalin fixed paraffin embedded (FFPE) tissue samples. Based on RCRnorm developed for normalizing NanoString nCounter data and Bayesian LASSO for variable selection, we propose a fully integrated Bayesian method, called RCRdiff, to detect differentially expressed (DE) genes between different groups of tissue samples (eg, normal and cancer). Unlike existing methods that often require normalization performed beforehand, RCRdiff directly handles raw read counts and jointly models the behaviors of different types of internal controls along with DE and non‐DE gene patterns. Doing so would avoid efficiency loss caused by ignoring estimation uncertainty from the normalization step in a sequential approach and thus can offer more reliable statistical inference. We also propose clustering‐based strategies for DE gene selection, which do not require any external dataset and are free of any arbitrary cutoff. Empirical evidence of the attractiveness of RCRdiff is demonstrated via extensive simulation and data examples.
Parenting practices are essential in promoting children's mental health, especially in effective and ineffective parenting. The use of ineffective parenting practices is no longer encouraged in the ...west; however, it remains a common practice among Asian households. Ineffective parenting consists of inconsistent discipline, corporal punishment, and poor monitoring which may result in mental health consequences. Thus, this study assessed the mediating effects of adolescents' self-efficacy and parental acceptance-rejection on the relationship between ineffective parenting practices and adolescents' mental health. The current study involved a total of 761 school-going Malaysian adolescents aged 13-18 (38.5% males; M
= 15.65; SD
= 1.43). This study utilized a cross-sectional design where it measured adolescents' mental health, ineffective parenting practices, parental acceptance-rejection, and adolescents' self-efficacy. Both paternal and maternal parenting practices and acceptance-rejection were measured independently. Adolescents' self-efficacy and perceived paternal and maternal acceptance-rejection were found to be significant mediators for ineffective parenting practices and adolescents' mental health. Our findings suggest that ineffective parenting practices will result in perceived parental rejection and lower self-efficacy which in turn resulted in poorer mental health among adolescents. It means parents should be mindful of their parenting approaches as they have a direct and indirect impact on the mental health of their offspring.
The recent controversy about the size of crowds at candlelight protests in Korea raises an interesting question regarding the methods used to estimate crowd size. Protest organizers tend to count all ...participants in the event from its start to finish, while the police usually report the crowd size at its peak. While several counting methods are available to estimate the size of a crowd at a given time, counting the total number of the participants at a protest is not straightforward. In this paper, we propose a new estimator to count the total number of participants that we call the size of a dynamic crowd. We assume that the arrival and departure times of the crowd are randomly observed and that the number of the attendees in the crowd at a specific time is estimable. We estimate the number of total attendees during the entire gathering based on the capture-recapture model. We also propose a bootstrap procedure to construct a confidence interval for the crowd size. We demonstrate the performance of the proposed method with simulation studies and the data from Korea's March for Science, a global event across the world on Earth Day, April 22, 2017.
A key objective in many microarray association studies is the identification of individual genes associated with clinical outcome. It is often of additional interest to identify sets of genes, known ...a priori to have similar biologic function, associated with the outcome.
In this paper, we propose a general permutation-based framework for gene set testing that controls the false discovery rate (FDR) while accounting for the dependency among the genes within and across each gene set. The application of the proposed method is demonstrated using three public microarray data sets. The performance of our proposed method is contrasted to two other existing Gene Set Enrichment Analysis (GSEA) and Gene Set Analysis (GSA) methods.
Our simulations show that the proposed method controls the FDR at the desired level. Through simulations and case studies, we observe that our method performs better than GSEA and GSA, especially when the number of prognostic gene sets is large.