UP - logo
E-resources
Full text
Peer reviewed Open access
  • Supervised machine learning...
    Sun, Xiaoran

    Journal of marriage and family, 02/2024
    Journal Article

    Abstract Objective This article introduces supervised machine learning (ML) for conducting exploratory, discovery‐oriented family research in a transparent and systematic way. Background Supervised ML can examine large numbers of variable simultaneously, identify key predictors, and explore patterns among predictors—an approach that may help address concerns in family research about lack of theoretical specificity and prevalence of unguided exploratory analysis. Method Following an overview of supervised ML, example analyses drew on the National Longitudinal Study of Adolescent Health (Add Health) dataset across Waves I–IV ( N = 5114 adolescents, 50.53% female, M age = 15.94, SD = 1.77 at Wave I). From 143 articles using Add Health data Waves I through IV, 62 adolescent family variables from eight domains (e.g., socioeconomics, parenting, health) were identified as predictors of young adult (ages 24–32) educational attainment. Following benchmark regression models, ML models were trained using Lasso regression, decision tree, random forest, and extreme gradient boosting; these were tested separately from training data and interpreted through SHapley Additive exPlanations. Results The random forest model performed best ( R 2 = .382 for the model with all the predictors): 14 variables were identified to be the key predictors of educational attainment. Patterns among these predictors, including directionality, nonlinearity and interactions emerged. Conclusions Supervised ML research can be used to inform further confirmatory analyses and advance theory.