In broad engineering fields, missing data is a common issue which often causes undesired bias and sparseness impeding rigorous data analyses. To tackle this problem, many imputation theories have ...been proposed and widely used. However, prior methods often require distributional assumptions and prior knowledge regarding data which may cause some difficulty for engineering research. Essentially, the fractional hot-deck imputation (FHDI) is an assumption-free imputation method, holding broad applicability in the engineering domains. FHDIs internal parameters and impact on statistical and machine learning methods, however, have been rarely understood. Thus, this study investigates the behavior and impacts of FHDI on prediction methods including generalized additive model, support vector machine, extremely randomized trees, and artificial neural network, for which four practical datasets (appliance energy, air quality, phenotypes, and weather) are used. Results show that FHDI performs better for improving the prediction accuracy compared to a simple naive method which cures missing data using the mean value of attributes, and FHDI has an asymptotically positive effect on prediction accuracy with decreasing response rates. Regarding an optimal setting, 30 to 35 is recommended for the FHDIs internal categorization number while 5 is recommended for the FHDI donors, which is aligned with Rubins recommendation.
The papers in this special section focus on early prediction and the support of learning performance. Predicting student's learning performance in traditional face-to-face learning, online learning ...(LMS, MOOCs, etc.), and blended learning is a challenging but essential task in education 1. On the one hand, it has become a difficult challenge due to the high number of factors that can influence a student’s final status. On the other hand, it is a critical issue in education because it concerns many students of all levels (primary education, secondary education, and tertiary or higher education) and institutions over the entire world. Moreover, also, an increase in the number of low performing students can cause a lower graduation rate, an inferior institution reputation in the eyes of all involved, and it usually results in overall financial loss. The task of predicting students’ performance is one of the oldest and most studied tasks in Educational Data Mining (EDM) and Learning Analytics (LA), and a wide range of classification and regression approaches have been successfully applied.
The role of prophylactic inguinal irradiation (PII) in the treatment of anal cancer patients is controversial. We developped an innovative algorithm based on the Machine Learning (ML) allowing the ...tailoring of the prescription of PII.
Once verified on the independent testing set, J48 showed the better performances, with specificity, sensitivity, and accuracy rates in predicting relapsing patients of 86.4%, 50.0% and 83.1% respectively (vs 36.5%, 90.4% and 80.25%, respectively, for LR).
We classified 194 anal cancer patients with Logistic Regression (LR) and other 3 ML techniques based on decision trees (J48, Random Tree and Random Forest), using a large set of clinical and therapeutic variables. We tested obtained ML algorithms on an independent testing set of 65 anal cancer patients. TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) methodology was used for the development, the Quality Assurance and the description of the experimental procedures.
In an internationally approved quality assurance framework, ML seems promising in predicting the outcome of patients that would benefit or not of the PII. Once confirmed in larger and/or multi-centric databases, ML could support the physician in tailoring the treatment and in deciding if deliver or not the PII.
A practical application of the Food MicroModel (FMM) predictive software is presented. A case study on meat-based paté is used to illustrate the various requirements needed to assure the safety of ...this type of foodstuff when pH is reduced. Identification of hazards was obtained from a literature review and confirmed by epidemiological links between the product and foodborne disease outbreaks. For risk assessment four different zones (safe, caution, dangerous and critical) of the level of the variable under study (pH) were defined, each zone equating to a particular level of risk. Having identified the hazards, associated risks and intrinsic parameters of the paté, a Hazard Analysis Critical Control Point (HACCP) system can be more readily established using predicted outcomes from FMM. General guidance on generic uses is also discussed.
Aim
To externally validate previously published predictive models of the risk of developing metachronous peritoneal carcinomatosis (PC) after resection of nonmetastatic colon or rectal cancer and to ...update the predictive model for colon cancer by adding new prognostic predictors.
Method
Data from all patients with Stage I–III colorectal cancer identified from a population‐based database in Stockholm between 2008 and 2010 were used. We assessed the concordance between the predicted and observed probabilities of PC and utilized proportional‐hazard regression to update the predictive model for colon cancer.
Results
When applied to the new validation dataset (n = 2011), the colon and rectal cancer risk‐score models predicted metachronous PC with a concordance index of 79% and 67%, respectively. After adding the subclasses of pT3 and pT4 stage and mucinous tumour to the colon cancer model, the concordance index increased to 82%.
Conclusion
In validation of external and recent cohorts, the predictive accuracy was strong in colon cancer and moderate in rectal cancer patients. The model can be used to identify high‐risk patients for planned second‐look laparoscopy/laparotomy for possible subsequent cytoreductive surgery and hyperthermic intraperitoneal chemotherapy.
Purpose
The clinical diagnosis of pulmonary sarcoidosis is based on the presence of noncaseating granulomas in an appropriate clinical setting with either bilateral hilar adenopathy and/or ...parenchymal infiltrates. Lymphocytosis with an increased CD4/CD8 T cell ratio in bronchoalveolar lavage fluid is supportive. We evaluated the diagnostic accuracy of a predictive binary logistic regression model in sarcoidosis based on sex, age, and bronchoalveolar lavage fluid cell profile with and without the inclusion of HLA-DR
+
CD8
+
T cells and natural killer T-cell fractions.
Methods
A retrospective analysis of differential cell counts and lymphocyte phenotypes by flow cytometry in bronchoalveolar lavage was performed in 183 patients investigated for possible diffuse parenchymal lung disease. A logistic regression model with age, sex, lymphocyte fraction, eosinophils, and CD4/CD8 ratio in bronchoalveolar lavage fluid (basic model) was compared with a final model, which also included fractions of HLA-DR
+
CD8
+
T cells and natural killer T cells. Diagnostic accuracy of the two models was assessed by receiver operating characteristic (ROC) curves.
Results
The area under the ROC curve for the basic and final model was 0.898 95 % confidence interval (CI) 0.852–0.945 and 0.937 (95 % CI 0.902–0.972), respectively,
p
= 0.008.
Conclusions
Assessment of HLA-DR
+
CD8
+
T cell and natural killer T-cell fractions may improve diagnostic accuracy and further strengthen the importance of bronchoalveolar lavage in the diagnostic workup of sarcoidosis.
In Melbourne, a southern hemisphere city with a cool temperate climate, the grass pollen season has been monitored using a Burkard spore trap for 12 years (11 pollen seasons, which extend from ...October through January). The onset of the grass pollen season (OGPS) has been defined in various ways using both arbitrary cumulative scores (Sum 75, Sum 100) and percentages (10% Pollen Fly). OGPS, based on the forecast model of pollen season devised by Lejoly-Gabriel (Acta Geogr. Lovan., 13 (1978) 1–260) has been most widely used in efforts to forecast the beginning of the pollen season. OGPS occurred in Melbourne between 20 October to 24 November (average 6 November), a difference of 35 days. Duration of the pollen season ranged from 46 to 81 days, with a mean of 55 days, one of the longest reported. The relationships between onset and various weather parameters for July have enabled us to modify a model, using linear regression analysis, to predict onset. The prediction model is based on a negative correlation between date of onset and the sum of rainfall for July (a winter month). The error of prediction (Ep) is 24% and predicted day of OGPS was precisely predicted on 2 occasions, and on others with a range of accuracy of 3 to 14 days.