The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining ...likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default.
The prediction of the time of default in a credit risk setting via survival analysis needs to take a high censoring rate into account. This rate is because default does not occur for the majority of ...debtors. Mixture cure models allow the part of the loan population that is unsusceptible to default to be modeled, distinct from time of default for the susceptible population. In this article, we extend the mixture cure model to include time-varying covariates. We illustrate the method via simulations and by incorporating macro-economic factors as predictors for an actual bank dataset.
We have previously identified a unique subtype of acute lymphoblastic leukemia (ALL) associated with a poor outcome and characterized by intrachromosomal amplification of chromosome 21 including the ...RUNX1 gene (iAMP21). In this study, array-based comparative genomic hybridization (aCGH) (n = 10) detected a common region of amplification (CRA) between 33.192 and 39.796 Mb and a common region of deletion (CRD) between 43.7 and 47 Mb in 100% and 70% of iAMP21 patients, respectively. High-resolution genotypic analysis (n = 3) identified allelic imbalances in the CRA. Supervised gene expression analysis showed a distinct signature for eight patients with iAMP21, with 10% of overexpressed genes located within the CRA. The mean expression of these genes was significantly higher in iAMP21 when compared to other ALL samples (n = 45). Although genomic copy number correlated with overall gene expression levels within areas of loss or gain, there was considerable individual variation. A unique subset of differentially expressed genes, outside the CRA and CRD, were identified when gene expression signatures of iAMP21 were compared to ALL samples with ETV6-RUNX1 fusion (n = 21) or high hyperdiploidy with additional chromosomes 21 (n = 23). From this analysis, LGMN was shown to be overexpressed in patients with iAMP21 (P = 0.0012). Genomic and expression data has further characterized this ALL subtype, demonstrating high levels of 21q instability in these patients leading to proposals for mechanisms underlying this clinical phenotype and plausible alternative treatments.
Retail credit models are implemented using discrete survival analysis, enabling macroeconomic conditions to be included as time-varying covariates. In consequence, these models can be used to ...estimate changes in probability of default given downturn economic scenarios. Compared with traditional models, we offer improved methodologies for scenario generation and for the use of them to predict default rates. Monte Carlo simulation is used to generate a distribution of estimated default rates from which Value at Risk and Expected Shortfall are computed as a means of stress testing. Several macroeconomic variables are considered and in particular factor analysis is employed to model the structure between these variables. Two large UK data sets are used to test this approach, resulting in plausible dynamic models and stress test outcomes.
Assessment of risk levels for existing credit accounts is important to the implementation of bank policies and offering financial products. This article uses cluster analysis of behaviour of credit ...card accounts to help assess credit risk level. Account behaviour is modelled parametrically and we then implement the behavioural cluster analysis using a recently proposed dissimilarity measure of statistical model parameters. The advantage of this new measure is the explicit exploitation of uncertainty associated with parameters estimated from statistical models. Interesting clusters of real credit card behaviours data are obtained, in addition to superior prediction and forecasting of account default based on the clustering outcomes.
Bankruptcy prediction has been a popular and challenging research area for decades. Most prediction models are built using financial figures, stock market data and firm specific variables. We ...complement such traditional low-dimensional data with high-dimensional data on the company's directors and managers in the prediction models. This information is used to build a network between small and medium-sized enterprises (SMEs), where two companies are related if they share a director or high-level manager. A smoothed version of the weighted-vote relational neighbour classifier is applied on the network and transforms the relationships between companies into bankruptcy prediction scores, thereby assuming that a company is more likely to file for bankruptcy if one of the related companies in its network has already failed. An ensemble model is built that combines the relational model's output scores with structured data and is applied on two data sets of Belgian and UK SMEs. We find that the relational model gives improved predictions over a simple financial model when detecting the riskiest firms. The largest performance increase is found when the relational and financial data are combined, confirming the complementary nature of both data types.
•A new SME bankruptcy prediction model that includes relational data is proposed.•The model links two companies using shared directors and managers.•A relational classifier is applied to the resulting network.•Relational data helps detecting the riskiest firms.•Relational and financial data have complementary predictive power.
We present discrete time survival models of borrower default for credit cards that include behavioural data about credit card holders and macroeconomic conditions across the credit card lifetime. We ...find that dynamic models which include these behavioural and macroeconomic variables provide statistically significant improvements in model fit, which translate into better forecasts of default at both account and portfolio levels when applied to an out-of-sample data set. By simulating extreme economic conditions, we show how these models can be used to stress test credit card portfolios.
Based on UK data for major retail credit cards, we build several models of Loss Given Default based on account level data, including Tobit, a decision tree model, a Beta and fractional logit ...transformation. We find that Ordinary Least Squares models with macroeconomic variables perform best for forecasting Loss Given Default at the account and portfolio levels on independent hold-out data sets. The inclusion of macroeconomic conditions in the model is important, since it provides a means to model Loss Given Default in downturn conditions, as required by Basel II, and enables stress testing. We find that bank interest rates and the unemployment level significantly affect LGD.
Logistic regression is a standard procedure for real-world classification problems. The challenge of class imbalance arises in two-class classification problems when the minority class is observed ...much less than the majority class. This characteristic is endemic in many domains. Work by Owen has shown that cluster structure among the minority class may be a specific problem in highly imbalanced logistic regression. In this article, we propose a novel relabeling approach to handle the class imbalance problem when using logistic regression, which essentially assigns new labels to the minority class observations. An expectation-maximization algorithm is formalized to serve as a tool for efficiently computing this relabeling. Modeling on such relabeled data can lead to improved predictive performance. We demonstrate the effectiveness of this approach with detailed experiments on real datasets. Supplemental materials for the article are available online.