General large language models (LLMs), such as ChatGPT (GPT-3.5), have demonstrated the capability to pass multiple-choice medical board examinations. However, comparative accuracy of different LLMs ...and LLM performance on assessments of predominantly higher-order management questions is poorly understood. We aimed to assess the performance of 3 LLMs (GPT-3.5, GPT-4, and Google Bard) on a question bank designed specifically for neurosurgery oral boards examination preparation.
The 149-question Self-Assessment Neurosurgery Examination Indications Examination was used to query LLM accuracy. Questions were inputted in a single best answer, multiple-choice format. χ 2 , Fisher exact, and univariable logistic regression tests assessed differences in performance by question characteristics.
On a question bank with predominantly higher-order questions (85.2%), ChatGPT (GPT-3.5) and GPT-4 answered 62.4% (95% CI: 54.1%-70.1%) and 82.6% (95% CI: 75.2%-88.1%) of questions correctly, respectively. By contrast, Bard scored 44.2% (66/149, 95% CI: 36.2%-52.6%). GPT-3.5 and GPT-4 demonstrated significantly higher scores than Bard (both P < .01), and GPT-4 outperformed GPT-3.5 ( P = .023). Among 6 subspecialties, GPT-4 had significantly higher accuracy in the Spine category relative to GPT-3.5 and in 4 categories relative to Bard (all P < .01). Incorporation of higher-order problem solving was associated with lower question accuracy for GPT-3.5 (odds ratio OR = 0.80, P = .042) and Bard (OR = 0.76, P = .014), but not GPT-4 (OR = 0.86, P = .085). GPT-4's performance on imaging-related questions surpassed GPT-3.5's (68.6% vs 47.1%, P = .044) and was comparable with Bard's (68.6% vs 66.7%, P = 1.000). However, GPT-4 demonstrated significantly lower rates of "hallucination" on imaging-related questions than both GPT-3.5 (2.3% vs 57.1%, P < .001) and Bard (2.3% vs 27.3%, P = .002). Lack of question text description for questions predicted significantly higher odds of hallucination for GPT-3.5 (OR = 1.45, P = .012) and Bard (OR = 2.09, P < .001).
On a question bank of predominantly higher-order management case scenarios for neurosurgery oral boards preparation, GPT-4 achieved a score of 82.6%, outperforming ChatGPT and Google Bard.
Interest surrounding generative large language models (LLMs) has rapidly grown. Although ChatGPT (GPT-3.5), a general LLM, has shown near-passing performance on medical student board examinations, ...the performance of ChatGPT or its successor GPT-4 on specialized examinations and the factors affecting accuracy remain unclear. This study aims to assess the performance of ChatGPT and GPT-4 on a 500-question mock neurosurgical written board examination.
The Self-Assessment Neurosurgery Examinations (SANS) American Board of Neurological Surgery Self-Assessment Examination 1 was used to evaluate ChatGPT and GPT-4. Questions were in single best answer, multiple-choice format. χ 2 , Fisher exact, and univariable logistic regression tests were used to assess performance differences in relation to question characteristics.
ChatGPT (GPT-3.5) and GPT-4 achieved scores of 73.4% (95% CI: 69.3%-77.2%) and 83.4% (95% CI: 79.8%-86.5%), respectively, relative to the user average of 72.8% (95% CI: 68.6%-76.6%). Both LLMs exceeded last year's passing threshold of 69%. Although scores between ChatGPT and question bank users were equivalent ( P = .963), GPT-4 outperformed both (both P < .001). GPT-4 answered every question answered correctly by ChatGPT and 37.6% (50/133) of remaining incorrect questions correctly. Among 12 question categories, GPT-4 significantly outperformed users in each but performed comparably with ChatGPT in 3 (functional, other general, and spine) and outperformed both users and ChatGPT for tumor questions. Increased word count (odds ratio = 0.89 of answering a question correctly per +10 words) and higher-order problem-solving (odds ratio = 0.40, P = .009) were associated with lower accuracy for ChatGPT, but not for GPT-4 (both P > .005). Multimodal input was not available at the time of this study; hence, on questions with image content, ChatGPT and GPT-4 answered 49.5% and 56.8% of questions correctly based on contextual context clues alone.
LLMs achieved passing scores on a mock 500-question neurosurgical written board examination, with GPT-4 significantly outperforming ChatGPT.
Cell-free DNA shed by cancer cells has been shown to be a rich source of putative tumor-specific biomarkers. Because cell-free DNA from brain and spinal cord tumors cannot usually be detected in the ...blood, we studied whether the cerebrospinal fluid (CSF) that bathes the CNS is enriched for tumor DNA, here termed CSF-tDNA. We analyzed 35 primary CNS malignancies and found at least one mutation in each tumor using targeted or genome-wide sequencing. Using these patient-specific mutations as biomarkers, we identified detectable levels of CSF-tDNA in 74% 95% confidence interval (95% CI) = 57−88% of cases. All medulloblastomas, ependymomas, and high-grade gliomas that abutted a CSF space were detectable (100% of 21 cases; 95% CI = 88−100%), whereas no CSF-tDNA was detected in patients whose tumors were not directly adjacent to a CSF reservoir (P< 0.0001, Fisher’s exact test). These results suggest that CSF-tDNA could be useful for the management of patients with primary tumors of the brain or spinal cord.
In patients with spinal instability, cord compression, or neurologic deficits, the standard of care is surgery followed by radiation therapy (RT). Recurrence rates after conventional RT remain high. ...The purpose of this study is to prospectively examine the efficacy of postoperative stereotactic body RT (SBRT) in patients who have undergone surgical intervention for spine metastases. We hypothesize that postoperative SBRT to the spine would be associated with higher local control than historical rates after conventional RT.
Thirty-five adult patients with a Karnofsky Performance Status score ≥40 and spine metastases from solid tumors with no prior overlapping RT and target volumes ≤3 consecutive vertebral levels were enrolled. Thirty-three patients were treated. Two patients underwent treatment to 2 target volumes for a total of 35 target volumes. All patients received SBRT 30 Gy in 5 fractions. Patients were followed with neurological examinations and computed tomography and/or magnetic resonance imaging every 3 months. Neurologic function was assessed at the same time points using the American Spinal Injury Association (ASIA) impairment score. Pain was rated according to the 10-point visual analogue scale and MD Anderson Cancer Center brief pain index. Toxicity was recorded according to National Cancer Institute Common Toxicity Criteria for Adverse Events Version 4. The primary objective was the rate of radiographic local recurrence at 12 months after completion of SBRT.
Patient characteristics were as follows: 34.3% had radioresistant primaries; 71.4% were ASIA E and the remainder ASIA D; and the median baseline Karnofsky Performance Status score was 70 (range, 50-100). Radiographic and symptomatic local control at 1 year were 90% (95% confidence interval, 76%-98%). The median time to recurrence in these 3 patients was 3.5 months (range, 3.4-5.8 months), all had radiosensitive tumors, and all recurrences were epidural. No patients experienced wound dehiscence, hardware failure, or spinal cord myelopathy. The median time to return to systemic therapy was 0.5 months (range, 0-9.4 months).
This prospective study of postoperative spine SBRT demonstrates excellent local control with low toxicity. These data suggest superior rates of local control compared with conventional RT; however, a formal comparative study is warranted.
Neurofibromatosis 1 is a hereditary syndrome characterized by the development of numerous benign neurofibromas, a small subset of which progress to malignant peripheral nerve sheath tumors (MPNSTs). ...To better understand the genetic basis for MPNSTs, we performed genome-wide or targeted sequencing on 50 cases. Sixteen MPNSTs but none of the neurofibromas tested were found to have somatic mutations in SUZ12, implicating it as having a central role in malignant transformation.
A series of epidemiological studies have shown the limited life expectancy of patients suffering from idiopathic normal pressure hydrocephalus (iNPH). In most cases, comorbid medical conditions are ...the cause of death, rather than iNPH. Though it has also been shown that shunting improves both life quality and lifetime. We sought to investigate the utility of the Charlson comorbidity index (CCI) for improved preoperative risk-benefit assessment of shunt surgery in individual iNPH cases. 208 shunted iNPH cases were prospectively investigated. Two in-person follow up visits at 3 and 12 months assessed postoperative clinical status. The correlation of the age adjusted CCI with survival was investigated over the median observation time of 2.37 years (IQR 1.16-4.15). Kaplan Meier statistics revealed that patients with a CCI score of 0-5 have a 5-year survival rate of 87%, compared to only 55% in patients with CCI > 5. Cox multivariate statistics revealed that the CCI was an independent predictor of survival, while common preoperative iNPH scores (modified Rankin Scale (mRS), gait score, and continence score) are not. As expected, mRS, gait, and continence scores improved during the postoperative follow up period, though relative improvement on any of these was not predicted by baseline CCI. The CCI is an easily applicable preoperative predictor of survival time in shunted iNPH patients. The lack of a correlation between the CCI and functional outcome means that even patients with multiple comorbidities and limited remaining lifetime may appreciate benefit from shunt surgery.
Retrospective cohort study.
To compare short- and long-term outcomes in obese versus nonobese patients undergoing instrumented posterolateral fusion of the lumbar spine.
Obesity is an important ...public health issue due to the negative effects on quality of life. Some studies have shown an association between obesity and higher rates of complications and unfavorable outcomes after spine surgery.
We retrospectively reviewed medical records for all adult patients undergoing 1- to 3-level posterolateral fusion for degenerative spine disease between 1992 and 2012 at a single institution. Patients were divided into obese (body mass index > 30 kg/m) and nonobese cohorts to compare complications, reoperation rates, and symptom resolution at the last follow-up. A regression model was used to estimate relative risk ratios.
During the study period, 732 patients underwent lumbar fusion, with 662 (90.44%) nonobese patients and 70 (9.56%) obese patients in the cohort. Obese patients had significantly higher blood loss intraoperatively (P = 0.002) and a longer average length of stay (P = 0.022). Moreover, obesity was independently associated with a significantly increased risk of developing a postoperative complication (risk ratio 2.14; 95% confidence interval, 1.10-4.16) and surgical site infection (risk ratio 3.11; 95% confidence interval, 1.48-6.52). At the last follow-up, a higher proportion of obese patients had radiculopathy (P = 0.018), motor deficits (P = 0.006), sensory deficits (P = 0.008), and bowel or bladder dysfunction (P = 0.006) than nonobese patients.
In this study, obese patients undergoing lumbar fusion had higher blood loss, longer lengths of stay, higher complication rates, and worse functional outcomes at the last follow-up than nonobese patients. These findings suggest that both surgeons and patients should acknowledge the significantly increased morbidity profile of obese patients after lumbar fusion.
Background Acute spinal cord injury (ASCI) is a catastrophic event that can profoundly affect the trajectory of a patient's life. Debate continues over the pharmacologic management of ASCI, ...specifically, the widespread but controversial use of the steroid methylprednisolone (MP). Treatment efforts are impeded because of limitations in understanding of the pathobiology of ASCI and the difficulty in proving the efficacy of therapies. Methods This review presents the pathophysiology of ASCI and the laboratory and clinical findings on the use of MP. Results The use of MP remains a contentious issue in part because of the catastrophic nature of ASCI, the paucity of treatment options, and the legal ramifications. Although historical data on the use of MP in ASCI have been challenged, more recent studies have been used both to support and to oppose treatment of ASCI with steroids. Conclusions ASCI is a devastating event with a complex aftermath of secondary damaging processes that worsen the initial injury. Although the results of NASCIS (National Acute Spinal Cord Injury Study) II and III trials led to the widespread adoption of a high-dose MP regimen for patients treated within 8 hours of injury, subsequent studies have called into question the validity of NASCIS conclusions. Further evidence of the ineffectiveness of the MP protocol has led to declining confidence in the treatment over the last decade. At the present time, high-dose MP cannot be recommended as a standard of care, but it remains an option until supplanted by future evidence-based therapies.
The relationship between intervertebral disc degeneration and chronic infection by Propionibacterium acnes is controversial with contradictory evidence available in the literature. Previous studies ...investigating these relationships were under-powered and fraught with methodical differences; moreover, they have not taken into consideration P. acnes' ability to form biofilms or attempted to quantitate the bioburden with regard to determining bacterial counts/genome equivalents as criteria to differentiate true infection from contamination. The aim of this prospective cross-sectional study was to determine the prevalence of P. acnes in patients undergoing lumbar disc microdiscectomy.
The sample consisted of 290 adult patients undergoing lumbar microdiscectomy for symptomatic lumbar disc herniation. An intraoperative biopsy and pre-operative clinical data were taken in all cases. One biopsy fragment was homogenized and used for quantitative anaerobic culture and a second was frozen and used for real-time PCR-based quantification of P. acnes genomes. P. acnes was identified in 115 cases (40%), coagulase-negative staphylococci in 31 cases (11%) and alpha-hemolytic streptococci in 8 cases (3%). P. acnes counts ranged from 100 to 9000 CFU/ml with a median of 400 CFU/ml. The prevalence of intervertebral discs with abundant P. acnes (≥ 1x103 CFU/ml) was 11% (39 cases). There was significant correlation between the bacterial counts obtained by culture and the number of P. acnes genomes detected by real-time PCR (r = 0.4363, p<0.0001).
In a large series of patients, the prevalence of discs with abundant P. acnes was 11%. We believe, disc tissue homogenization releases P. acnes from the biofilm so that they can then potentially be cultured, reducing the rate of false-negative cultures. Further, quantification study revealing significant bioburden based on both culture and real-time PCR minimize the likelihood that observed findings are due to contamination and supports the hypothesis P. acnes acts as a pathogen in these cases of degenerative disc disease.
Frailty is associated with adverse outcomes in traumatically injured geriatric patients but has not been well-studied in geriatric Traumatic Brain Injury (TBI). To assess relationships between ...frailty and outcomes after TBI The records of all patients aged 70 or older admitted from home to the neurosurgical service of a single institution for non-operative TBI between January 2020 and July 2021 were retrospectively reviewed. The primary outcome was adverse discharge disposition (either in-hospital expiration or discharge to skilled nursing facility (SNF), hospice, or home with hospice). Secondary outcomes included major inpatient complication, 30-day readmission, and length of stay. 100 patients were included, 90% of whom presented with Glasgow Coma Score (GCS) 14-15. The mean length of stay was 3.78 days. 7% had an in-hospital complication, and 44% had an unfavorable discharge destination. 49% of patients attended follow-up within 3 months. The rate of readmission within 30 days was 13%. Patients were characterized as low frailty (FRAIL score 0-1, n = 35, 35%) or high frailty (FRAIL score 2-5, n = 65, 65%). In multivariate analysis controlling for age and other factors, frailty category (aOR 2.63, 95CI 1.02, 7.14, p = 0.005) was significantly associated with unfavorable discharge. Frailty was not associated with increased readmission rate, LOS, or rate of complications on uncontrolled univariate analyses. Frailty is associated with increased odds of unfavorable discharge disposition for geriatric patients admitted with TBI.