Physics-informed neural networks (PINNs) have shown to be effective tools for solving both forward and inverse problems of partial differential equations (PDEs). PINNs embed the PDEs into the loss of ...the neural network using automatic differentiation, and this PDE loss is evaluated at a set of scattered spatio-temporal points (called residual points). The location and distribution of these residual points are highly important to the performance of PINNs. However, in the existing studies on PINNs, only a few simple residual point sampling methods have mainly been used. Here, we present a comprehensive study of two categories of sampling for PINNs: non-adaptive uniform sampling and adaptive nonuniform sampling. We consider six uniform sampling methods, including (1) equispaced uniform grid, (2) uniformly random sampling, (3) Latin hypercube sampling, (4) Halton sequence, (5) Hammersley sequence, and (6) Sobol sequence. We also consider a resampling strategy for uniform sampling. To improve the sampling efficiency and the accuracy of PINNs, we propose two new residual-based adaptive sampling methods: residual-based adaptive distribution (RAD) and residual-based adaptive refinement with distribution (RAR-D), which dynamically improve the distribution of residual points based on the PDE residuals during training. Hence, we have considered a total of 10 different sampling methods, including six non-adaptive uniform sampling, uniform sampling with resampling, two proposed adaptive sampling, and an existing adaptive sampling. We extensively tested the performance of these sampling methods for four forward problems and two inverse problems in many setups. Our numerical results presented in this study are summarized from more than 6000 simulations of PINNs. We show that the proposed adaptive sampling methods of RAD and RAR-D significantly improve the accuracy of PINNs with fewer residual points for both forward and inverse problems. The results obtained in this study can also be used as a practical guideline in choosing sampling methods.
Respondent-driven sampling is a form of link-tracing network sampling, which is widely used to study hard-to-reach populations, often to estimate population proportions. Previous treatments of this ...process have used a with-replacement approximation, which we show induces bias in estimates for large sample fractions and differential network connectedness by characteristic of interest. We present a treatment of respondent-driven sampling as a successive sampling process. Unlike existing representations, our approach respects the essential without-replacement feature of the process, while converging to an existing with-replacement representation for small sample fractions, and to the sample mean for a full-population sample. We present a successive-sampling based estimator for population means based on respondent-driven sampling data, and demonstrate its superior performance when the size of the hidden population is known. We present sensitivity analyses for unknown population sizes. In addition, we note that like other existing estimators, our new estimator is subject to bias induced by the selection of the initial sample. Using data collected among three populations in two countries, we illustrate the application of this approach to populations with varying characteristics. We conclude that the successive sampling estimator improves on existing estimators, and can also be used as a diagnostic tool when population size is not known. This article has supplementary material online.
Sampling is central to the practice of qualitative methods, but compared with data collection and analysis its processes have been discussed relatively little. A four-point approach to sampling in ...qualitative interview-based research is presented and critically discussed in this article, which integrates theory and process for the following: (1) defining a sample universe, by way of specifying inclusion and exclusion criteria for potential participation; (2) deciding upon a sample size, through the conjoint consideration of epistemological and practical concerns; (3) selecting a sampling strategy, such as random sampling, convenience sampling, stratified sampling, cell sampling, quota sampling or a single-case selection strategy; and (4) sample sourcing, which includes matters of advertising, incentivising, avoidance of bias, and ethical concerns pertaining to informed consent. The extent to which these four concerns are met and made explicit in a qualitative study has implications for its coherence, transparency, impact and trustworthiness.
Rejoinder McKeague, Ian W; Qian, Min
Journal of the American Statistical Association,
12/2015, Letnik:
110, Številka:
512
Journal Article
Recenzirano
Sampling from the limit distribution under the null (using sample covariances to replace population covariances) provides a simple and computationally efficient way to estimate critical values, and ...the size of the resulting test converges to the nominal size as sample size increases. This can be used instead of the bootstrap procedure given in Theorem 2 and also instead of the double bootstrap method used in the simulations, which both are computationally more expensive. (More generally, for any fixed ... and ..., the finite-sample distribution along sequences of parameters the form ... as in Theorem 1 can be estimated by sampling from the corresponding limit distribution given in that theorem after replacing population covariances by estimates.) (ProQuest: ... denotes formulae/symbols omitted.)
Comment Shah, Rajen D; Samworth, Richard J
Journal of the American Statistical Association,
12/2015, Letnik:
110, Številka:
512
Journal Article
Recenzirano
Two potential screening procedures constructed by modifying the adaptive resampling test (ART): (1) a "parametric bootstrap" analog of ART; and (2) an ART-inspired adaptive testing procedure designed ...to be more powerful against dense, weak alternatives are presented in this article. The parametric bootstrap procedure avoids the tuning parameter used in ART and thus eliminates potentially computationally burdensome tuning. The proposed parametric bootstrap procedure has a desirable invariance property under local alternatives. However, both ART and proposed parametric bootstrap analog can have poor power against dense, weak alternatives. A class of adaptive procedures that reduce to our parametric bootstrap version of ART under strong, sparse signals and reduce to a sum of squares criteria under weak, dense signals are proposed.
Wideband analog signals push contemporary analog-to-digital conversion (ADC) systems to their performance limits. In many applications, however, sampling at the Nyquist rate is inefficient because ...the signals of interest contain only a small number of significant frequencies relative to the band limit, although the locations of the frequencies may not be known a priori. For this type of sparse signal, other sampling strategies are possible. This paper describes a new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components. Let K denote the total number of frequencies in the signal, and let W denote its band limit in hertz. Simulations suggest that the random demodulator requires just O(K log(W/K)) samples per second to stably reconstruct the signal. This sampling rate is exponentially lower than the Nyquist rate of W hertz. In contrast to Nyquist sampling, one must use nonlinear methods, such as convex programming, to recover the signal from the samples taken by the random demodulator. This paper provides a detailed theoretical analysis of the system's performance that supports the empirical observations.
Modern methods construct a matched sample by minimizing the total cost of a flow in a network, finding a pairing of treated and control individuals that minimizes the sum of within-pair covariate ...distances subject to constraints that ensure distributions of covariates are balanced. In aggregate, these methods work well; however, they can exhibit a lack of interest in a small number of pairs with large covariate distances. Here, a new method is proposed for imposing a minimax constraint on a minimum total distance matching. Such a match minimizes the total within-pair distance subject to various constraints including the constraint that the maximum pair difference is as small as possible. In an example with 1391 matched pairs, this constraint eliminates dozens of pairs with moderately large differences in age, but otherwise exhibits the same excellent covariate balance found without this additional constraint. A minimax constraint eliminates edges in the network, and can improve the worst-case time bound for the performance of the minimum cost flow algorithm, that is, a better match from a practical perspective may take less time to construct. The technique adapts ideas for a different problem, the bottleneck assignment problem, whose sole objective is to minimize the maximum within-pair difference; however, here, that objective becomes a constraint on the minimum cost flow problem. The method generalizes. Rather than constrain the maximum distance, it can constrain an order statistic. Alternatively, the method can minimize the maximum difference in propensity scores, and subject to doing that, minimize the maximum robust Mahalanobis distance. An example from labor economics is used to illustrate.
Diagnostics for respondent-driven sampling Gile, Krista J.; Johnston, Lisa G.; Salganik, Matthew J.
Journal of the Royal Statistical Society. Series A, Statistics in society,
January 2015, Letnik:
178, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Respondent-driven sampling (RDS) is a widely used method for sampling from hard-to-reach human populations, especially populations at higher risk for human immunodeficiency virus or acquired immune ...deficiency syndrome. Data are collected through a peer referral process over social networks. RDS has proven practical for data collection in many difficult settings and has been adopted by leading public health organizations around the world. Unfortunately, inference from RDS data requires many strong assumptions because the sampling design is partially beyond the control of the researcher and not fully observable. We introduce diagnostic tools for most of these assumptions and apply them in 12 high risk populations. These diagnostics empower researchers to understand their RDS data better and encourage future statistical research on RDS sampling and inference.
Arable land quality has been evaluated through weighted average of indicators related to soil properties and tillage technics to present the most basic function of cropland: food production ...potential. A hybrid sampling method spatial coverage sampling and random sampling–conditioned Latin hypercube sampling (SPCOSA–CLHS) was designed for arable land quality observation network in this paper. The SPCOSA–CLHS integrates the uniform spatial partitioning results generated by the spatial coverage sampling and random sampling (SPCOSA) into the conditioned Latin hypercube sampling (CLHS) method along with other arable land quality indicators such as field slope, soil bulk density, organic matter content, thickness of plough layer and irrigation. Then, SPCOSA, CLHS, SPCOSA–CLHS, CLHS with x and y coordinates as covariates (XY‐CLHS), random sampling method (RSM) were compared using the example of Heilongjiang Province. The sample population covers 12,147,008 grid cells and 17 arable land quality indicators. Five parameters: information entropy, Kullback–Leibler divergence, similarity distance, expression ability to local spatial heterogeneity of arable land quality and distribution homogeneity of sampling results were used to compare the applicability of these methods for overall arable land quality. The SPCOSA–CLHS can better trade off sampling result's representative ability to the population and spatial heterogeneity of arable land quality, as well as its samples have advantages of spatially uniform distribution. When the sample size is between 5000 and 20,000, all methods show good applicability. When the sample size is below 5000, however, the differences among these methods become significant. SPCOSA and random sampling method offset most dramatically. Based on this detailed comparison of the five sampling strategy approaches, we strongly recommend to use SPCOSA–CLHS to design arable land quality observation.