This paper introduces the pseudo-calibration estimators, a novel method that integrates a non-probability sample of big size with a probability sample, assuming both samples contain relevant ...information for estimating the population parameter. The proposed estimators share a structural similarity with the adjusted projection estimators and the difference estimators but they adopt a different inferential approach and informative setup. The pseudo-calibration estimators can be employed when the target variable is observed in the probability sample and, in the non-probability sample, it is observed correctly, observed with error, or predicted. This paper also introduces an original application of the jackknife-type method for variance estimation. A simulation study shows that the proposed estimators are robust and efficient compared to the regression data integration estimators that use the same informative setup. Finally, a further evaluation using real data is carried out.
Self-Rated Health (SRH) is becoming one of the most popular indicator of population health. Nevertheless, a limited understanding still remains about the elements to which individuals refer when ...evaluating their health and how those elements act and interact in the evaluation process. In this study we use a structural equation model with latent variables to identify direct and indirect influences of various health dimensions (chronic morbidity, functional abilities and emotional health) and socio-demographic covariates (age, gender and education) on poor SRH. The sample consists of 25,183 Italian elderly aged 65 years and over, interviewed in the 2005 National Health Interview Survey. The results have pointed out the higher direct effect of psychological and emotional health on SRH, while the higher total effect is caused by chronic morbidity, which influences SRH both directly and altering functional and emotional health. Growing older, being a woman and having a low education negatively impacts on SRH. However, this is almost completely the result of the indirect effect exerted by the covariates, while their direct effect is not significant (gender), negative (age) or very modest (education).
The air in the Lombardy region, Italy, is one of the most polluted in Europe because of limited air circulation and high emission levels. There is a large scientific consensus that the agricultural ...sector has a significant impact on air quality. To support studies quantifying the role of the agricultural and livestock sectors on the Lombardy air quality, this paper presents a harmonised dataset containing daily values of air quality, weather, emissions, livestock, and land and soil use in the years 2016-2021, for the Lombardy region. The daily scale is obtained by averaging hourly data and interpolating other variables. In fact, the pollutant data come from the European Environmental Agency and the Lombardy Regional Environment Protection Agency, weather and emissions data from the European Copernicus programme, livestock data from the Italian zootechnical registry, and land and soil use data from the CORINE Land Cover project. The resulting dataset is designed to be used as is by those using air quality data for research.
Introduction
The identification of dementia cases through routinely collected health data represents an easily accessible and inexpensive method to estimate the prevalence of dementia. In Italy, a ...project aimed at the validation of an algorithm was conducted.
Methods
The project included cases (patients with dementia or mild cognitive impairment MCI) recruited in centers for cognitive disorders and dementias and controls recruited in outpatient units of geriatrics and neurology. The algorithm based on pharmaceutical prescriptions, hospital discharge records, residential long‐term care records, and information on exemption from health‐care co‐payment, was applied to the validation population.
Results
The main analysis was conducted on 1110 cases and 1114 controls. The sensitivity, specificity, and positive and negative predictive values in discerning cases of dementia were 74.5%, 96.0%, 94.9%, and 79.1%, respectively, whereas in detecting cases of MCI these values were 29.7%, 97.5%, 92.2%, and 58.1%, respectively. The variables associated with misclassification of cases were also identified.
Discussion
This study provided a validated algorithm, based on administrative data, which can be used to identify cases with dementia and, with lower sensitivity, also early onset dementia but not cases with MCI.
Abstract Spatial mapping of biodiversity is crucial to investigate spatial variations in natural communities. Several indices have been proposed in the literature to represent biodiversity as a ...single statistic. However, these indices only provide information on individual dimensions of biodiversity, thus failing to grasp its complexity comprehensively. Consequently, relying solely on these single indices can lead to misleading conclusions about the actual state of biodiversity. In this work, we focus on biodiversity profiles , which provide a more flexible framework to express biodiversity through nonnegative and convex curves, which can be analyzed by means of functional data analysis. By treating the whole curves as single entities, we propose to achieve a functional zoning of the region of interest by means of a penalized model‐based clustering procedure. This provides a spatial clustering of the biodiversity profiles, which is useful for policy‐makers both for conserving and managing natural resources and revealing patterns of interest. Our approach is evaluated using a simulation study and discussed through the analysis of the Harvard Forest Data , which provides information on the spatial distribution of woody stems within a plot of the Harvard Forest.
This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models (GAMM), ...and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting
PM
2.5
concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and
PM
2.5
concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
Abstract This study presents a comparative analysis of three predictive models with an increasing degree of flexibility: hidden dynamic geostatistical models (HDGM), generalised additive mixed models ...(GAMM), and the random forest spatiotemporal kriging models (RFSTK). These models are evaluated for their effectiveness in predicting $$\text {PM}_{2.5}$$ PM 2.5 concentrations in Lombardy (North Italy) from 2016 to 2020. Despite differing methodologies, all models demonstrate proficient capture of spatiotemporal patterns within air pollution data with similar out-of-sample performance. Furthermore, the study delves into station-specific analyses, revealing variable model performance contingent on localised conditions. Model interpretation, facilitated by parametric coefficient analysis and partial dependence plots, unveils consistent associations between predictor variables and $$\text {PM}_{2.5}$$ PM 2.5 concentrations. Despite nuanced variations in modelling spatiotemporal correlations, all models effectively accounted for the underlying dependence. In summary, this study underscores the efficacy of conventional techniques in modelling correlated spatiotemporal data, concurrently highlighting the complementary potential of Machine Learning and classical statistical approaches.
Bayesian logistic regression for presence-only data Divino, Fabio; Golini, Natalia; Jona Lasinio, Giovanna ...
Stochastic environmental research and risk assessment,
08/2015, Letnik:
29, Številka:
6
Journal Article
Recenzirano
Odprti dostop
Presence-only data are referred to situations in which a censoring mechanism acts on a binary response which can be partially observed only with respect to one outcome, usually denoting the presence ...of an attribute of interest. A typical example is the recording of species presence in ecological surveys. In this work a Bayesian approach to the analysis of presence-only data based on a two levels scheme is presented. A probability law and a case-control design are combined to handle the double source of uncertainty: one due to censoring and the other one due to sampling. In the paper, through the use of a stratified sampling design with non-overlapping strata, a new formulation of the logistic model for presence-only data is proposed. In particular, the logistic regression with linear predictor is considered. Estimation is carried out with a new Markov Chain Monte Carlo algorithm with data augmentation, which does not require the a priori knowledge of the population prevalence. The performance of the new algorithm is validated by means of extensive simulation experiments using three scenarios and comparison with optimal benchmarks. An application to data existing in literature is reported in order to discuss the model behaviour in real world situations together with the results of an original study on termites occurrences data.
Random Forest (RF) is a well-known data-driven algorithm applied in several fields thanks to its flexibility in modeling the relationship between the response variable and the predictors, also in ...case of strong non-linearities. In environmental applications, it often occurs that the phenomenon of interest may present spatial and/or temporal dependence that is not taken explicitly into account by RF in its standard version. In this work, we propose a taxonomy to classify strategies according to when (Pre-, In- and/or Post-processing) they try to include the spatial information into regression RF. Moreover, we provide a systematic review and classify the most recent strategies adopted to "adjust" regression RF to spatially dependent data, based on the criteria provided by the Preferred Reporting Items for Systematic reviews and Meta-Analysis (PRISMA). The latter consists of a reproducible methodology for collecting and processing existing literature on a specified topic from different sources. PRISMA starts with a query and ends with a set of scientific documents to review: we performed an online query on the 25\(^{th}\) October 2022 and, in the end, 32 documents were considered for review. The employed methodological strategies and the application fields considered in the 32 scientific documents are described and discussed. This work falls inside the Agriculture Impact On Italian Air (AgrImOnIA) project.