Modeling bivariate (or multivariate) count data has received increased interest in recent years. The aim is to model the number of different but correlated counts taking into account covariate ...information. Bivariate Poisson regression models based on the shock model approach are widely used because of their simple form and interpretation. However, these models do not allow for overdispersion or negative correlation, and thus, other models have been proposed in the literature to avoid these limitations. The present paper proposes copula-based bivariate finite mixture of regression models. These models offer some advantages since they have all the benefits of a finite mixture, allowing for unobserved heterogeneity and clustering effects, while the copula-based derivation can produce more flexible structures, including negative correlations and regressors. In this paper, the new approach is defined, estimation through an EM algorithm is presented, and then different models are applied to a Spanish insurance claim count database.
A multivariate INAR(1) regression model based on the Sarmanov distribution is proposed for modelling claim counts from an automobile insurance contract with different types of coverage. The ...correlation between claims from different coverage types is considered jointly with the serial correlation between the observations of the same policyholder observed over time. Several models based on the multivariate Sarmanov distribution are analyzed. The new models offer some advantages since they have all the advantages of the MINAR(1) regression model but allow for a more flexible dependence structure by using the Sarmanov distribution. Driven by a real panel data set, these models are considered and fitted to the data to discuss their goodness of fit and computational efficiency.
In previous research, significant effects of weather conditions on car crashes have been found. However, most studies use monthly or yearly data and only few studies are available analyzing the ...impact of weather conditions on daily car crash counts. Furthermore, the studies that are available on a daily level do not explicitly model the data in a time-series context, hereby ignoring the temporal serial correlation that may be present in the data. In this paper, we introduce an integer autoregressive model for modelling count data with time interdependencies. The model is applied to daily car crash data, metereological data and traffic exposure data from the Netherlands aiming at examining the risk impact of weather conditions on the observed counts. The results show that several assumptions related to the effect of weather conditions on crash counts are found to be significant in the data and that if serial temporal correlation is not accounted for in the model, this may produce biased results.
Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. For cluster analysis, using a traditional ...finite mixture model framework, the number of components either needs to be known
a priori
or needs to be estimated
a posteriori
using some model selection criterion after deriving results for a range of possible number of components. However, different model selection criteria can sometimes result in different numbers of components yielding uncertainty. Here, an infinite mixture model framework, also known as Dirichlet process mixture model, is proposed for the mixtures of MNIG distributions. This Dirichlet process mixture model approach allows the number of components to grow or decay freely from 1 to
∞
(in practice from 1 to
N
) and the number of components is inferred along with the parameter estimates in a Bayesian framework, thus alleviating the need for model selection criteria. We run our algorithm on simulated as well as real benchmark datasets and compare with other clustering approaches. The proposed method provides competitive results for both simulations and real data.
Clustering discrete-valued time series Roick, Tyler; Karlis, Dimitris; McNicholas, Paul D.
Advances in data analysis and classification,
03/2021, Letnik:
15, Številka:
1
Journal Article
Recenzirano
Odprti dostop
There is a need for the development of models that are able to account for discreteness in data, along with its time series properties and correlation. Our focus falls on INteger-valued ...AutoRegressive (INAR) type models. The INAR type models can be used in conjunction with existing model-based clustering techniques to cluster discrete-valued time series data. With the use of a finite mixture model, several existing techniques such as the selection of the number of clusters, estimation using expectation-maximization and model selection are applicable. The proposed model is then demonstrated on real data to illustrate its clustering applications.
When actuaries face the problem of pricing an insurance contract that contains different types of coverage, such as a motor insurance or a homeowner’s insurance policy, they usually assume that types ...of claim are independent. However, this assumption may not be realistic: several studies have shown that there is a positive correlation between types of claim. Here we introduce different multivariate Poisson regression models in order to relax the independence assumption, including zero-inflated models to account for excess of zeros and overdispersion. These models have been largely ignored to date, mainly because of their computational difficulties. Bayesian inference based on MCMC helps to resolve this problem (and also allows us to derive, for several quantities of interest, posterior summaries to account for uncertainty). Finally, these models are applied to an automobile insurance claims database with three different types of claim. We analyse the consequences for pure and loaded premiums when the independence assumption is relaxed by using different multivariate Poisson regression models together with their zero-inflated versions.
Cardiac performance depends on optimum ventriculoarterial coupling which is impaired in patients with heart failure (HF). Galectin-3 is a mediator of myocardial fibrosis and remodeling, and is ...associated with clinical status in patients with chronic HF. We examined the association of arterial stiffness with galectin-3 levels in patients with HF of ischemic etiology.
We consecutively enrolled 40 patients with stable ischemic HF and reduced ejection fraction. Central aortic stiffness was evaluated non-invasively by measuring carotid femoral pulse wave velocity (PWV). Among other factors, serum levels of galectin-3 and b-type natriuretic peptide (BNP) were measured.
The median galectin-3 levels in our study population were 12.9 (10.8-18.7) ng/ml and the mean PWV was 9.31±2.79 m/sec. There was significant association of galectin-3 levels with age (r=0.48, p=0.003), creatinine clearance (r=-0.66, p<0.001) and BNP levels (r=0.36, p=0.05). There was a significant association of galectin-3 levels with PWV (r=0.37, p=0.03) and patients with PWV above median also had significantly increased levels of galectin-3 compared with patients with lower values of PWV 16.1(11.8-25.2) vs. 12.1(10.5-14) ng/ml, p=0.03.
We found an association of arterial stiffness and PWV with galectin-3 levels in patients with chronic HF of ischemic etiology. These findings suggest a pathway driving arterial stiffening and myocardial remodelling in HF. This may provide insight into the mechanism determining prognosis and clinical status of patients with HF.
When modelling insurance claim count data, the actuary often observes overdispersion and an excess of zeros that may be caused by unobserved heterogeneity. A common approach to accounting for ...overdispersion is to consider models with some overdispersed distribution as opposed to Poisson models. Zero-inflated, hurdle and compound frequency models are typically applied to insurance data to account for such a feature of the data. However, a natural way to deal with unobserved heterogeneity is to consider mixtures of a simpler models. In this paper, we consider k-finite mixtures of some typical regression models. This approach has interesting features: first, it allows for overdispersion and the zero-inflated model represents a special case, and second, it allows for an elegant interpretation based on the typical clustering application of finite mixture models. k-finite mixture models are applied to a car insurance claim dataset in order to analyse whether the problem of unobserved heterogeneity requires a richer structure for risk classification. Our results show that the data consist of two subpopulations for which the regression structure is different.
The EM algorithm is the standard tool for maximum likelihood estimation in finite mixture models. The main drawbacks of the EM algorithm are its slow convergence and the dependence of the solution on ...both the stopping criterion and the initial values used. The problems referring to slow convergence and the choice of a stopping criterion have been dealt with in literature and the present paper deals with the initial value problem for the EM algorithm. The aim of this paper is to compare several methods for choosing initial values for the EM algorithm in the case of finite mixtures as well as to propose some new methods based on modifications of existing ones. The cases of finite normal mixtures with common variance and finite Poisson mixtures are examined through a simulation study.
The majority of model-based clustering techniques is based on multivariate normal models and their variants. In this paper copulas are used for the construction of flexible families of models for ...clustering applications. The use of copulas in model-based clustering offers two direct advantages over current methods: (i) the appropriate choice of copulas provides the ability to obtain a range of exotic shapes for the clusters, and (ii) the explicit choice of marginal distributions for the clusters allows the modelling of multivariate data of various modes (either discrete or continuous) in a natural way. This paper introduces and studies the framework of copula-based finite mixture models for clustering applications. Estimation in the general case can be performed using standard EM, and, depending on the mode of the data, more efficient procedures are provided that can fully exploit the copula structure. The closure properties of the mixture models under marginalization are discussed, and for continuous, real-valued data parametric rotations in the sample space are introduced, with a parallel discussion on parameter identifiability depending on the choice of copulas for the components. The exposition of the methodology is accompanied and motivated by the analysis of real and artificial data.