The assessment of calibration performance of risk prediction models based on regression or more flexible machine learning algorithms receives little attention.
Herein, we argue that this needs to ...change immediately because poorly calibrated algorithms can be misleading and potentially harmful for clinical decision-making. We summarize how to avoid poor calibration at algorithm development and how to assess calibration at algorithm validation, emphasizing balance between model complexity and the available sample size. At external validation, calibration curves require sufficiently large samples. Algorithm updating should be considered for appropriate support of clinical practice.
Efforts are required to avoid poor calibration when developing prediction models, to evaluate calibration when validating models, and to update models when indicated. The ultimate aim is to optimize the utility of predictive analytics for shared decision-making and patient counseling.
Abstract
STUDY QUESTION
What is the chance of a live birth following one or more linked complete cycles of IVF (including ICSI)?
SUMMARY ANSWER
The chance of a live birth after three complete cycles ...of IVF was 42.3% for treatment commencing from 1999 to 2007.
WHAT IS KNOWN ALREADY
IVF success has generally been reported on the basis of live birth rates after a single episode of treatment resulting in the transfer of a fresh embryo. This fails to capture the real chance of having a baby after a number of complete cycles—each involving the replacement of fresh as well as frozen-thawed embryos.
STUDY DESIGN, SIZE AND DURATION
Population-based observational cohort study of 178 898 women between 1992 and 2007.
PARTICIPANTS/MATERIALS, SETTING, METHODS
Participants included all women who commenced IVF treatment at a licenced clinic in the UK as recorded in the Human Fertilisation and Embryology Authority (HFEA) national database. Exclusion criteria included women whose treatment involved donor insemination, egg donation, surrogacy and the transfer of more than three embryos. Cumulative rates of live birth, term (>37 weeks) singleton live birth, and multiple pregnancy were estimated for two time-periods, 1992–1998 and 1999–2007. Conservative estimates assumed that women who did not return for IVF would not have the outcome of interest while optimal estimates assumed that these women would have similar outcome rates to those who continued IVF.
MAIN RESULTS AND THE ROLE OF CHANCE
A total of 71 551 women commenced IVF treatment during 1992–1998 and an additional 107 347 during 1999–2007. After the third complete IVF cycle (defined as three fresh IVF treatments—including replacement of any surplus frozen-thawed embryos), the conservative CLBR in women who commenced IVF during 1992–1998 was 30.8% increasing to 42.3% during 1999–2007. The optimal CLBRs were 44.6 and 57.1%, respectively. After eight complete cycles the optimal CLBR was 82.4% in the latter time period. The conservative rate for multiple pregnancy per pregnant woman fell from 31.9% during the earlier time period to 26.2% during the latter.
LIMITATIONS AND REASON FOR CAUTION
Linkage of all IVF treatments to individual women was conducted. However, it was not possible to identify with certainty in all cases the episode of ovarian stimulation which generated some of the frozen embryos. Cumulative live birth rates could not be calculated for women who started treatment beyond 2007 as follow-up data were incomplete in some of them. Following a change in legislation in 2008, linked data were only made available for research in women who gave formal consent for this purpose. BMI and ethnicity could not be reported: these demographics are not recorded in the HFEA database.
WIDER IMPLICATIONS OF THE FINDINGS
Our results demonstrate, at a national level, the chances of live birth in couples undergoing a number of complete (fresh and frozen) IVF cycles. They reflect improvements in reproductive technology and a more conservative embryo transfer policy. Although most couples in the UK still do not receive three complete IVF cycles; assuming no barriers to continuation of IVF treatment, around 83% of women receiving IVF would achieve a live birth by the eighth complete cycle, similar to the natural live birth rate in a non-contraception practising population. Our results support the call from NICE to develop consistent IVF policies based on three complete cycles.
STUDY FUNDING/COMPETING INTEREST(S)
This work was funded by a Chief Scientist Office Postdoctoral Training Fellowship in Health Services Research and Health of the Public Research (Ref PDF/12/06). The views expressed here are those of the authors and not necessarily those of the Chief Scientist Office. S.B. reports grants from Chief Scientist Office Scotland during the conduct of the study. His institution has received support from Pharmaceutical companies (for educational seminars), which is not related to the submitted work. D.J.M., A.M. and A.J.L. have no conflicts of interest to declare.
Clinical prediction models are useful in estimating a patient's risk of having a certain disease or experiencing an event in the future based on their current characteristics. Defining an appropriate ...risk threshold to recommend intervention is a key challenge in bringing a risk prediction model to clinical application; such risk thresholds are often defined in an ad hoc way. This is problematic because tacitly assumed costs of false positive and false negative classifications may not be clinically sensible. For example, when choosing the risk threshold that maximizes the proportion of patients correctly classified, false positives and false negatives are assumed equally costly. Furthermore, small to moderate sample sizes may lead to unstable optimal thresholds, which requires a particularly cautious interpretation of results.
We discuss how three common myths about risk thresholds often lead to inappropriate risk stratification of patients. First, we point out the contexts of counseling and shared decision-making in which a continuous risk estimate is more useful than risk stratification. Second, we argue that threshold selection should reflect the consequences of the decisions made following risk stratification. Third, we emphasize that there is usually no universally optimal threshold but rather that a plausible risk threshold depends on the clinical context. Consequently, we recommend to present results for multiple risk thresholds when developing or validating a prediction model.
Bearing in mind these three considerations can avoid inappropriate allocation (and non-allocation) of interventions. Using discriminating and well-calibrated models will generate better clinical outcomes if context-dependent thresholds are used.
Objective To develop a prediction model to estimate the chances of a live birth over multiple complete cycles of in vitro fertilisation (IVF) based on a couple’s specific characteristics and ...treatment information.Design Population based cohort study.Setting All licensed IVF clinics in the UK. National data from the Human Fertilisation and Embryology Authority register.Participants All 253 417 women who started IVF (including intracytoplasmic sperm injection) treatment in the UK from 1999 to 2008 using their own eggs and partner’s sperm.Main outcome measure Two clinical prediction models were developed to estimate the individualised cumulative chance of a first live birth over a maximum of six complete cycles of IVF—one model using information available before starting treatment and the other based on additional information collected during the first IVF attempt. A complete cycle is defined as all fresh and frozen-thawed embryo transfers arising from one episode of ovarian stimulation.Results After exclusions, 113 873 women with 184 269 complete cycles were included, of whom 33 154 (29.1%) had a live birth after their first complete cycle and 48 925 (43.0%) after six complete cycles. Key pretreatment predictors of live birth were the woman’s age (31 v 37 years; adjusted odds ratio 1.66, 95% confidence interval 1.62 to 1.71) and duration of infertility (3 v 6 years; 1.09, 1.08 to 1.10). Post-treatment predictors included number of eggs collected (13 v 5 eggs; 1.29, 1.27 to 1.32), cryopreservation of embryos (1.91, 1.86 to 1.96), the woman’s age (1.53, 1.49 to 1.58), and stage of embryos transferred (eg, double blastocyst v double cleavage; 1.79, 1.67 to 1.91). Pretreatment, a 30 year old woman with two years of unexplained primary infertility has a 46% chance of having a live birth from the first complete cycle of IVF and a 79% chance over three complete cycles. If she then has five eggs collected in her first complete cycle followed by a single cleavage stage embryo transfer (with no embryos left for freezing) her chances change to 28% and 56%, respectively.Conclusions This study provides an individualised estimate of a couple’s cumulative chances of having a baby over a complete package of IVF both before treatment and after the first fresh embryo transfer. This novel resource may help couples plan their treatment and prepare emotionally and financially for their IVF journey.
IMPORTANCE: Planned cesarean delivery comprises a significant proportion of births globally, with combined rates of planned and unscheduled cesarean delivery in a number of regions approaching 50%. ...Observational studies have shown that offspring born by cesarean delivery are at increased risk of ill health in childhood, but these studies have been unable to adjust for some key confounding variables. Additionally, risk of death beyond the neonatal period has not yet been reported for offspring born by planned cesarean delivery. OBJECTIVE: To investigate the relationship between planned cesarean delivery and offspring health problems or death in childhood. DESIGN, SETTING, AND PARTICIPANTS: Population-based data-linkage study of 321 287 term singleton first-born offspring born in Scotland, United Kingdom, between 1993 and 2007, with follow-up until February 2015. EXPOSURES: Offspring born by planned cesarean delivery in a first pregnancy were compared with offspring born by unscheduled cesarean delivery and with offspring delivered vaginally. MAIN OUTCOMES AND MEASURES: The primary outcome was asthma requiring hospital admission; secondary outcomes were salbutamol inhaler prescription at age 5 years, obesity at age 5 years, inflammatory bowel disease, type 1 diabetes, cancer, and death. RESULTS: Compared with offspring born by unscheduled cesarean delivery (n = 56 015 17.4%), those born by planned cesarean delivery (12 355 3.8%) were at no significantly different risk of asthma requiring hospital admission, salbutamol inhaler prescription at age 5 years, obesity at age 5 years, inflammatory bowel disease, cancer, or death but were at increased risk of type 1 diabetes (0.66% vs 0.44%; difference, 0.22% 95% CI, 0.13%-0.31%; adjusted hazard ratio HR, 1.35 95% CI, 1.05-1.75). In comparison with children born vaginally (n = 252 917 78.7%), offspring born by planned cesarean delivery were at increased risk of asthma requiring hospital admission (3.73% vs 3.41%; difference, 0.32% 95% CI, 0.21%-0.42%; adjusted HR, 1.22 95% CI, 1.11-1.34), salbutamol inhaler prescription at age 5 years (10.34% vs 9.62%; difference, 0.72% 95% CI, 0.36%-1.07%; adjusted HR, 1.13 95% CI, 1.01-1.26), and death (0.40% vs 0.32%; difference, 0.08% 95% CI, 0.02%-1.00%; adjusted HR, 1.41 95% CI, 1.05-1.90), whereas there were no significant differences in risk of obesity at age 5 years, inflammatory bowel disease, type 1 diabetes, or cancer. CONCLUSIONS AND RELEVANCE: Among offspring of women with first births in Scotland between 1993 and 2007, planned cesarean delivery compared with vaginal delivery (but not compared with unscheduled cesarean delivery) was associated with a small absolute increased risk of asthma requiring hospital admission, salbutamol inhaler prescription at age 5 years, and all-cause death by age 21 years. Further investigation is needed to understand whether the observed associations are causal.
To determine whether perinatal outcomes following frozen vs. fresh embryo transfer (ET) differ within singletons, within sets of twins, and between siblings.
Population-based retrospective cohort ...study.
Academic Medical School
200,075 live births in 151,561 women who underwent in vitro fertilization with frozen or fresh ET between 1992 and 2017.
Gestational age at birth, birthweight, congenital anomaly, and healthy baby (≥37 weeks of gestation, birthweight 2,500–4,000 g, no congenital malformations).
There were 200,075 live births in 151,561 women including 132,679 singletons, 33,698 sets of twins, and 5,723 pairs of singleton siblings. In singletons, frozen ET was associated with a lower risk of very preterm birth (adjusted relative risk aRR, 0.83; 95% confidence interval CI, 0.73, 0.94), preterm birth (aRR, 0.93; 95% CI, 0.88, 0.97), low birthweight (<2,500 g) (aRR, 0.72; 95% CI, 0.68, 0.77), small for gestational age (aRR, 0.66; 95% CI, 0.62, 0.70) and congenital anomaly (aRR, 0.85; 95% CI, 0.78, 0.94), but higher risk of high birthweight (>4,000 g) (aRR, 1.64; 95% CI, 1.58, 1.72) and large for gestational age (aRR, 1.62; 95% CI, 1.55, 1.70) in comparison with fresh ET. In twins, frozen ET was associated with lower risk of very preterm birth (aRR, 0.84; 95% CI, 0.73, 0.97), and low birthweight (aRR, 0.72; 95% CI, 0.68, 0.77), but with a higher chance of a healthy baby (aRR, 1.11; 95% CI, 1.06, 1.16) compared to fresh ET. Singletons conceived following frozen ET had a lower risk of low birthweight (aRR, 0.56; 95% CI, 0.44, 0.74) and being small for gestational age (aRR, 0.54; 95% CI, 0.42, 0.68) than a singleton sibling born after a fresh ET. Frozen ET also was associated with higher risk of high birthweight (aRR, 1.85; 95% CI, 1.54, 2.24) and being large for gestational age (aRR, 1.81; 95% CI, 1.50, 2.20), and also were less likely to be preterm (aRR, 0.81; 95% CI, 0.67, 0.99).
Our key finding is that singletons born following a frozen ET are less likely to be small for gestational age than a singleton sibling born following fresh ET but are more likely to be large for gestational age.
The last few decades have witnessed a rise in the global uptake of in vitro fertilization (IVF) treatment. To ensure optimal use of this technology, it is important for patients and clinicians to ...have access to tools that can provide accurate estimates of treatment success and understand the contribution of key clinical and laboratory parameters that influence the chance of conception after IVF treatment. The focus of this review was to identify key predictors of IVF treatment success and assess their impact in terms of live birth rates. We have identified 11 predictors that consistently feature in currently available prediction models, including age, duration of infertility, ethnicity, body mass index, antral follicle count, previous pregnancy history, cause of infertility, sperm parameters, number of oocytes collected, morphology of transferred embryos, and day of embryo transfer.
The improvement in IVF cryopreservation techniques over the last 20 years has led to an increase in elective single embryo transfer, thus reducing multiple pregnancy rates. This strategy of ...successive transfers of fresh followed by frozen embryos has resulted in the acceptance of using cumulative live birth over complete cycles of IVF as a critical measure of success. Clinical prediction models are a useful way of estimating the cumulative chances of success for couples tailored to their individual clinical factors, which help them prepare for and plan future treatment. In this review, we describe several models that predict cumulative live birth and recommend which should be used by couples and/or their clinicians and when they should be used. We also discuss the most relevant predictors to consider when either developing new IVF prediction models or updating existing models.
•IVF models from the UK and the US recommended for predicting cumulative live birth.•Predictions should be recalculated before the second cycle using new patient and treatment information.•IVF prediction models need constant validation to prevent decline in accuracy.•Patient predictors: female age, duration of infertility, BMI and ovarian reserve.•Treatment predictors should include the number of eggs and embryo information.
Abstract
STUDY QUESTION
Can two prediction models developed using data from 1999 to 2009 accurately predict the cumulative probability of live birth per woman over multiple complete cycles of IVF in ...an updated UK cohort?
SUMMARY ANSWER
After being updated, the models were able to estimate individualized chances of cumulative live birth over multiple complete cycles of IVF with greater accuracy.
WHAT IS KNOWN ALREADY
The McLernon models were the first to predict cumulative live birth over multiple complete cycles of IVF. They were converted into an online calculator called OPIS (Outcome Prediction In Subfertility) which has 3000 users per month on average. A previous study externally validated the McLernon models using a Dutch prospective cohort containing data from 2011 to 2014. With changes in IVF practice over time, it is important that the McLernon models are externally validated on a more recent cohort of patients to ensure that predictions remain accurate.
STUDY DESIGN, SIZE, DURATION
A population-based cohort of 91 035 women undergoing IVF in the UK between January 2010 and December 2016 was used for external validation. Data on frozen embryo transfers associated with these complete IVF cycles conducted from 1 January 2017 to 31 December 2017 were also collected.
PARTICIPANTS/MATERIALS, SETTING, METHODS
Data on IVF treatments were obtained from the Human Fertilisation and Embryology Authority (HFEA). The predictive performances of the McLernon models were evaluated in terms of discrimination and calibration. Discrimination was assessed using the c-statistic and calibration was assessed using calibration-in-the-large, calibration slope, and calibration plots. Where any model demonstrated poor calibration in the validation cohort, the models were updated using intercept recalibration, logistic recalibration, or model revision to improve model performance.
MAIN RESULTS AND THE ROLE OF CHANCE
Following exclusions, 91 035 women who underwent 144 734 complete cycles were included. The validation cohort had a similar distribution age profile to women in the development cohort. Live birth rates over all complete cycles of IVF per woman were higher in the validation cohort. After calibration assessment, both models required updating. The coefficients of the pre-treatment model were revised, and the updated model showed reasonable discrimination (c-statistic: 0.67, 95% CI: 0.66 to 0.68). After logistic recalibration, the post-treatment model showed good discrimination (c-statistic: 0.75, 95% CI: 0.74 to 0.76). As an example, in the updated pre-treatment model, a 32-year-old woman with 2 years of primary infertility has a 42% chance of having a live birth in the first complete ICSI cycle and a 77% chance over three complete cycles. In a couple with 2 years of primary male factor infertility where a 30-year-old woman has 15 oocytes collected in the first cycle, a single fresh blastocyst embryo transferred in the first cycle and spare embryos cryopreserved, the estimated chance of live birth provided by the post-treatment model is 46% in the first complete ICSI cycle and 81% over three complete cycles.
LIMITATIONS, REASONS FOR CAUTION
Two predictors from the original models, duration of infertility and previous pregnancy, which were not available in the recent HFEA dataset, were imputed using data from the older cohort used to develop the models. The HFEA dataset does not contain some other potentially important predictors, e.g. BMI, ethnicity, race, smoking and alcohol intake in women, as well as measures of ovarian reserve such as antral follicle count.
WIDER IMPLICATIONS OF THE FINDINGS
Both updated models show improved predictive ability and provide estimates which are more reflective of current practice and patient case mix. The updated OPIS tool can be used by clinicians to help shape couples’ expectations by informing them of their individualized chances of live birth over a sequence of multiple complete cycles of IVF.
STUDY FUNDING/COMPETING INTEREST(S)
This study was supported by an Elphinstone scholarship scheme at the University of Aberdeen and Aberdeen Fertility Centre, University of Aberdeen. S.B. has a commitment of research funding from Merck. D.J.M. and M.B.R. declare support for the present manuscript from Elphinstone scholarship scheme at the University of Aberdeen and Assisted Reproduction Unit at Aberdeen Fertility Centre, University of Aberdeen. D.J.M. declares grants received by University of Aberdeen from NHS Grampian, The Meikle Foundation, and Chief Scientist Office in the past 3 years. D.J.M. declares receiving an honorarium for lectures from Merck. D.J.M. is Associate Editor of Human Reproduction Open and Statistical Advisor for Reproductive BioMed Online. S.B. declares royalties from Cambridge University Press for a book. S.B. declares receiving an honorarium for lectures from Merck, Organon, Ferring, Obstetric and Gynaecological Society of Singapore, and Taiwanese Society for Reproductive Medicine. S.B. has received support from Merck, ESHRE, and Ferring for attending meetings as speaker and is on the METAFOR and CAPRE Trials Data Monitoring Committee.
TRIAL REGISTRATION NUMBER
N/A.
Global cesarean section (CS) rates range from 1% to 52%, with a previous CS being the commonest indication. Labour following a previous CS carries risk of scar rupture, with potential for offspring ...hypoxic brain injury, leading to high rates of repeat elective CS. However, the effect of delivery by CS on long-term outcomes in children is unclear. Increasing evidence suggests that in avoiding exposure to maternal bowel flora during labour or vaginal birth, offspring delivered by CS may be adversely affected in terms of energy uptake from the gut and immune development, increasing obesity and asthma risks, respectively. This study aimed to address the evidence gap on long-term childhood outcomes following repeat CS by comparing adverse childhood health outcomes after (1) planned repeat CS and (2) unscheduled repeat CS with those that follow vaginal birth after CS (VBAC).
A data-linkage cohort study was performed. All second-born, term, singleton offspring delivered between 1 January 1993 and 31 December 2007 in Scotland, UK, to women with a history of CS (n = 40,145) were followed up until 31 January 2015. Outcomes assessed included obesity at age 5 y, hospitalisation with asthma, learning disability, cerebral palsy, and death. Cox regression and binary logistic regression were used as appropriate to compare outcomes following planned repeat CS (n = 17,919) and unscheduled repeat CS (n = 8,847) with those following VBAC (n = 13,379). Risk of hospitalisation with asthma was greater following both unscheduled repeat CS (3.7% versus 3.3%, adjusted hazard ratio HR 1.18, 95% CI 1.05-1.33) and planned repeat CS (3.6% versus 3.3%, adjusted HR 1.24, 95% CI 1.09-1.42) compared with VBAC. Learning disability and death were more common following unscheduled repeat CS compared with VBAC (3.7% versus 2.3%, adjusted odds ratio 1.64, 95% CI 1.17-2.29, and 0.5% versus 0.4%, adjusted HR 1.50, 95% CI 1.00-2.25, respectively). Risk of obesity at age 5 y and risk of cerebral palsy were similar between planned repeat CS or unscheduled repeat CS and VBAC. Study limitations include the risk that women undergoing an unscheduled CS had intended to have a planned CS, and lack of data on indication for CS, which may confound the findings.
Birth by repeat CS, whether planned or unscheduled, was associated with an increased risk of hospitalisation with asthma but no difference in risk of obesity at age 5 y. Greater risk of death and learning disability following unscheduled repeat CS compared to VBAC may reflect complications during labour. Further research, including meta-analyses of studies of rarer outcomes (e.g., cerebral palsy), are needed to confirm whether such risks are similar between delivery groups.