Machine learning methods hold promise for personalized care in psychiatry, demonstrating the potential to tailor treatment decisions and stratify patients into clinically meaningful taxonomies. ...Subsequently, publication counts applying machine learning methods have risen, with different data modalities, mathematically distinct models, and samples of varying size being used to train and test models with the promise of clinical translation. Consequently, and in part due to the preliminary nature of such works, many studies have reported largely varying degrees of accuracy, raising concerns over systematic overestimation and methodological inconsistencies. Furthermore, a lack of procedural evaluation guidelines for non-expert medical professionals and funding bodies leaves many in the field with no means to systematically evaluate the claims, maturity, and clinical readiness of a project. Given the potential of machine learning methods to transform patient care, albeit, contingent on the rigor of employed methods and their dissemination, we deem it necessary to provide a review of current methods, recommendations, and future directions for applied machine learning in psychiatry. In this review we will cover issues of best practice for model training and evaluation, sources of systematic error and overestimation, model explainability vs. trust, the clinical implementation of AI systems, and finally, future directions for our field.
Despite the large number of studies that have investigated the use of wearable sensors to detect gait disturbances such as Freezing of gait (FOG) and falls, there is little consensus regarding ...appropriate methodologies for how to optimally apply such devices. Here, an overview of the use of wearable systems to assess FOG and falls in Parkinson’s disease (PD) and validation performance is presented. A systematic search in the PubMed and Web of Science databases was performed using a group of concept key words. The final search was performed in January 2017, and articles were selected based upon a set of eligibility criteria. In total, 27 articles were selected. Of those, 23 related to FOG and 4 to falls. FOG studies were performed in either laboratory or home settings, with sample sizes ranging from 1 PD up to 48 PD presenting Hoehn and Yahr stage from 2 to 4. The shin was the most common sensor location and accelerometer was the most frequently used sensor type. Validity measures ranged from 73–100% for sensitivity and 67–100% for specificity. Falls and fall risk studies were all home-based, including samples sizes of 1 PD up to 107 PD, mostly using one sensor containing accelerometers, worn at various body locations. Despite the promising validation initiatives reported in these studies, they were all performed in relatively small sample sizes, and there was a significant variability in outcomes measured and results reported. Given these limitations, the validation of sensor-derived assessments of PD features would benefit from more focused research efforts, increased collaboration among researchers, aligning data collection protocols, and sharing data sets.
The field of neuroimaging has embraced methods from machine learning in a variety of ways. Although an increasing number of initiatives have published open-access neuroimaging datasets, specifically ...designed benchmarks are rare in the field. In this article, we first describe how benchmarks in computer science and biomedical imaging have fostered methodological progress in machine learning. Second, we identify the special characteristics of neuroimaging data and outline what researchers have to ensure when establishing a neuroimaging benchmark, how datasets should be composed and how adequate evaluation criteria can be chosen. Based on lessons learned from machine learning benchmarks, we argue for an extended evaluation procedure that, next to applying suitable performance metrics, focuses on scientifically relevant aspects such as explainability, robustness, uncertainty, computational efficiency and code quality. Lastly, we envision a collaborative neuroimaging benchmarking platform that combines the discussed aspects in a collaborative and agile framework, allowing researchers across disciplines to work together on the key predictive problems of the field of neuroimaging and psychiatry.
Display omitted
We currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger ...machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from Major Depressive Disorder (MDD) and healthy controls based on neuroimaging data. Drawing upon structural MRI data from a balanced sample of N = 1868 MDD patients and healthy controls from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of 61%. Next, we mimicked the process by which researchers would draw samples of various sizes (N = 4 to N = 150) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes (N = 20), we observe accuracies of up to 95%. For medium sample sizes (N = 100) accuracies up to 75% were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.
Wearable devices can capture objective day-to-day data about Parkinson's Disease (PD). This study aims to assess the feasibility of implementing wearable technology to collect data from multiple ...sensors during the daily lives of PD patients. The Parkinson@home study is an observational, two-cohort (North America, NAM; The Netherlands, NL) study. To recruit participants, different strategies were used between sites. Main enrolment criteria were self-reported diagnosis of PD, possession of a smartphone and age≥18 years. Participants used the Fox Wearable Companion app on a smartwatch and smartphone for a minimum of 6 weeks (NAM) or 13 weeks (NL). Sensor-derived measures estimated information about movement. Additionally, medication intake and symptoms were collected via self-reports in the app. A total of 953 participants were included (NL: 304, NAM: 649). Enrolment rate was 88% in the NL (n = 304) and 51% (n = 649) in NAM. Overall, 84% (n = 805) of participants contributed sensor data. Participants were compliant for 68% (16.3 hours/participant/day) of the study period in NL and for 62% (14.8 hours/participant/day) in NAM. Daily accelerometer data collection decreased 23% in the NL after 13 weeks, and 27% in NAM after 6 weeks. Data contribution was not affected by demographics, clinical characteristics or attitude towards technology, but was by the platform usability score in the NL (χ2 (2) = 32.014, p<0.001), and self-reported depression in NAM (χ2(2) = 6.397, p = .04). The Parkinson@home study shows that it is feasible to collect objective data using multiple wearable sensors in PD during daily life in a large cohort.
Functional near-infrared spectroscopy (fNIRS) is an established optical neuroimaging method for measuring functional hemodynamic responses to infer neural activation. However, the impact of ...individual anatomy on the sensitivity of fNIRS measuring hemodynamics within cortical gray matter is still unknown. By means of Monte Carlo simulations and structural MRI of 23 healthy subjects (mean age: 25.0±2.8 years), we characterized the individual distribution of tissue-specific NIR-light absorption underneath 24 prefrontal fNIRS channels. We, thereby, investigated the impact of scalp-cortex distance (SCD), frontal sinus volume as well as sulcal morphology on gray matter volumes (V(gray)) traversed by NIR-light, i.e. anatomy-dependent fNIRS sensitivity. The NIR-light absorption between optodes was distributed describing a rotational ellipsoid with a mean penetration depth of (23.6±0.7) mm considering the deepest 5% of light. Of the detected photon packages scalp and bone absorbed (96.4±9.7)% and V(gray) absorbed (3.1±1.8)% of the energy. The mean V(gray) volume (1.1±0.4) cm3 was negatively correlated (r=-.76) with the SCD and frontal sinus volume (r=-.57) and was reduced by 41.5% in subjects with relatively large compared to small frontal sinus. Head circumference was significantly positively correlated with the mean SCD (r=.46) and the traversed frontal sinus volume (r=.43). Sulcal morphology had no significant impact on V(gray). Our findings suggest to consider individual SCD and frontal sinus volume as anatomical factors impacting fNIRS sensitivity. Head circumference may represent a practical measure to partly control for these sources of error variance.
Psychiatric disorders show heterogeneous symptoms and trajectories, with current nosology not accurately reflecting their molecular etiology and the variability and symptomatic overlap within and ...between diagnostic classes. This heterogeneity impedes timely and targeted treatment. Our study aimed to identify psychiatric patient clusters that share clinical and genetic features and may profit from similar therapies. We used high-dimensional data clustering on deep clinical data to identify transdiagnostic groups in a discovery sample (N = 1250) of healthy controls and patients diagnosed with depression, bipolar disorder, schizophrenia, schizoaffective disorder, and other psychiatric disorders. We observed five diagnostically mixed clusters and ordered them based on severity. The least impaired cluster 0, containing most healthy controls, showed general well-being. Clusters 1-3 differed predominantly regarding levels of maltreatment, depression, daily functioning, and parental bonding. Cluster 4 contained most patients diagnosed with psychotic disorders and exhibited the highest severity in many dimensions, including medication load. Depressed patients were present in all clusters, indicating that we captured different disease stages or subtypes. We replicated all but the smallest cluster 1 in an independent sample (N = 622). Next, we analyzed genetic differences between clusters using polygenic scores (PGS) and the psychiatric family history. These genetic variables differed mainly between clusters 0 and 4 (prediction area under the receiver operating characteristic curve (AUC) = 81%; significant PGS: cross-disorder psychiatric risk, schizophrenia, and educational attainment). Our results confirm that psychiatric disorders consist of heterogeneous subtypes sharing molecular factors and symptoms. The identification of transdiagnostic clusters advances our understanding of the heterogeneity of psychiatric disorders and may support the development of personalized treatments.
Abstract
Despite the growing deployment of network representation to comprehend psychological phenomena, the question of whether and how networks can effectively describe the effects of psychological ...interventions remains elusive. Network control theory, the engineering study of networked interventions, has recently emerged as a viable methodology to characterize and guide interventions. However, there is a scarcity of empirical studies testing the extent to which it can be useful within a psychological context. In this paper, we investigate a representative psychological intervention experiment, use network control theory to model the intervention and predict its effect. Using this data, we showed that: (1) the observed psychological effect, in terms of sensitivity and specificity, relates to the regional network control theoretic metrics (average and modal controllability), (2) the size of change following intervention negatively correlates with a whole-network topology that quantifies the “ease” of change as described by control theory (control energy), and (3) responses after intervention can be predicted based on formal results from control theory. These insights assert that network control theory has significant potential as a tool for investigating psychological interventions. Drawing on this specific example and the overarching framework of network control theory, we further elaborate on the conceptualization of psychological interventions, methodological considerations, and future directions in this burgeoning field.
PHOTONAI is a high-level Python API designed to simplify and accelerate machine learning model development. It functions as a unifying framework allowing the user to easily access and combine ...algorithms from different toolboxes into custom algorithm sequences. It is especially designed to support the iterative model development process and automates the repetitive training, hyperparameter optimization and evaluation tasks. Importantly, the workflow ensures unbiased performance estimates while still allowing the user to fully customize the machine learning analysis. PHOTONAI extends existing solutions with a novel pipeline implementation supporting more complex data streams, feature combinations, and algorithm selection. Metrics and results can be conveniently visualized using the PHOTONAI Explorer and predictive models are shareable in a standardized format for further external validation or application. A growing add-on ecosystem allows researchers to offer data modality specific algorithms to the community and enhance machine learning in the areas of the life sciences. Its practical utility is demonstrated on an exemplary medical machine learning problem, achieving a state-of-the-art solution in few lines of code. Source code is publicly available on Github, while examples and documentation can be found at www.photon-ai.com.