Artificial intelligence technology is becoming increasingly essential to education. The outbreak of COVID-19 in recent years has led many schools to launch online education. Automated online ...assessments have become a hot topic of interest, and an increasing number of researchers are studying Automated Essay Scoring (AES). This work seeks to summarise the characteristics of current AES systems used in English writing assessment, identify their strengths and weaknesses, and finally, analyse the limits of recent studies and research trends. Search strings were used to retrieve papers on AES systems from 2018 to 2023 from four databases, 104 of which were chosen to be potential to address the posed research aims after study selection and quality evaluation. It is concluded that the existing AES systems, although achieving good results in terms of accuracy in specific contexts, are unable to meet the needs of teachers and students in real teaching scenarios. The improvements of these systems relate to the scalability of the system for assessing different topics or styles of the essays, the accuracy of the model's predicted scores, as well as the reliability of outcomes: improving the robustness of AES models with some adversarial inputs, the richness of AES system functionality, and the development of AES assist tools.
The objective of the study is to investigate the effect of Nuchal Fold (NF) in predicting Fetal Growth Restriction (FGR) using machine learning (ML), to explain the model's results using ...model-agnostic interpretable techniques, and to compare the results with clinical guidelines. This study used second-trimester ultrasound biometry and Doppler velocimetry were used to construct six FGR (birthweight < 3rd centile) ML models. Interpretability analysis was conducted using Accumulated Local Effects (ALE) and Shapley Additive Explanations (SHAP). The results were compared with clinical guidelines based on the most optimal model. Support Vector Machine (SVM) exhibited the most consistent performance in FGR prediction. SHAP showed that the top contributors to identify FGR were Abdominal Circumference (AC), NF, Uterine RI (Ut RI), and Uterine PI (Ut PI). ALE showed that the cutoff values of Ut RI, Ut PI, and AC in differentiating FGR from normal were comparable with clinical guidelines (Errors between model and clinical; Ut RI: 15%, Ut PI: 8%, and AC: 11%). The cutoff value for NF to differentiate between healthy and FGR is 5.4 mm, where low NF may indicate FGR. The SVM model is the most stable in FGR prediction. ALE can be a potential tool to identify a cutoff value for novel parameters to differentiate between healthy and FGR.
Bandit algorithms have been widely used in many application areas including information retrieval evaluation and ranking. This is largely due to their exceptional performance. The aim of this study ...is to examine the overall published studies in terms of trends that shape the use of bandit algorithms in the evaluation and ranking of information retrieval systems. This study also seeks to classify the bandit algorithms used in the research domain. In totality the evaluation metrics, datasets, contribution facets of primary studies as well as the bandit categories are discussed.
Object recognition systems usually require fully complete manually labeled training data to train classifier. In this paper, we study the problem of object recognition, where the training samples are ...missing during the classifier learning stage, a task also known as zero-shot learning. We propose a novel zero-shot learning strategy that utilizes the topic model and hierarchical class concept. Our proposed method advanced where cumbersome human annotation stage (i.e., attribute-based classification) is eliminated. We achieve comparable performance with state-of-the-art algorithms in four public datasets: PubFig (67.09%), Cifar-100 (54.85%), Caltech-256 (52.14%), and Animals with Attributes (49.65%), when unseen classes exist in the classification task.
•To review free-text clinical text classification approaches from six aspects.•In selected studies, mostly content-based and concept-based features were used.•The datasets used in selected studies ...were categorized into four distinct types.•Selected studies used either supervised machine learning or rule-based approaches.•Ten open research challenges are presented in clinical text classification domain.
The pervasive use of electronic health databases has increased the accessibility of free-text clinical reports for supplementary use. Several text classification approaches, such as supervised machine learning (SML) or rule-based approaches, have been utilized to obtain beneficial information from free-text clinical reports. In recent years, many researchers have worked in the clinical text classification field and published their results in academic journals. However, to the best of our knowledge, no comprehensive systematic literature review (SLR) has recapitulated the existing primary studies on clinical text classification in the last five years. Thus, the current study aims to present SLR of academic articles on clinical text classification published from January 2013 to January 2018. Accordingly, we intend to maximize the procedural decision analysis in six aspects, namely, types of clinical reports, data sets and their characteristics, pre-processing and sampling techniques, feature engineering, machine learning algorithms, and performance metrics. To achieve our objective, 72 primary studies from 8 bibliographic databases were systematically selected and rigorously reviewed from the perspective of the six aspects. This review identified nine types of clinical reports, four types of data sets (i.e., homogeneous–homogenous, homogenous–heterogeneous, heterogeneous–homogenous, and heterogeneous–heterogeneous), two sampling techniques (i.e., over-sampling and under-sampling), and nine pre-processing techniques. Moreover, this review determined bag of words, bag of phrases, and bag of concepts features when represented by either term frequency or term frequency with inverse document frequency, thereby showing improved classification results. SML-based or rule-based approaches were generally employed to classify the clinical reports. To measure the performance of these classification approaches, we used precision, recall, F-measure, accuracy, AUC, and specificity in binary class problems. In multi-class problems, we primarily used micro or macro-averaging precision, recall, or F-measure. Lastly, open research issues and challenges are presented for future scholars who are interested in clinical text classification. This SLR will definitely be a beneficial resource for researchers engaged in clinical text classification.
Introduction. Information retrieval systems are vital to meeting daily information needs of users. The effectiveness of these systems has often been evaluated using the test collections approach, ...despite the high evaluation costs of this approach. Recent methods have been proposed that reduce evaluation costs through the prediction of information retrieval performance measures at the higher cut-off depths using other measures computed at the lower cut-off depths. The purpose of this paper is to propose two methods that addresses the challenge of accurately predicting the normalised discounted cumulative gain (nDCG) measure. Method. Data from selected test collections of the Text REtrieval Conference was used. The proposed methods employ the gradient boosting and linear regression models trained with topic scores of measures partitioned by TREC Tracks. Analysis. To evaluate the proposed methods, the coefficient of determination, Kendall's tau and Spearman correlations were used. Results. The proposed methods provide better predictions of the nDCG measure at the higher cut-off depths while using other measures computed at the lower cut-off depths. Conclusions. These proposed methods have shown improvement in the predictions of the nDCG measure while reducing the evaluation costs.
•This paper tackle zero-shot learning problem in object recognition domain.•Unknown objects that have no training images are related with known objects.•A model that combines the benefits of ...attributes and image hierarchy is proposed.•The proposed method achieves state-of-the-art accuracy in AwA dataset.
Generally, training images are essential for a computer vision model to classify specific object class accurately. Unfortunately, there exist countless number of different object classes in real world, and it is almost impossible for a computer vision model to obtain a complete training images for each of the different object class. To overcome this problem, zero-shot learning algorithm was emerged to learn unknown object classes from a set of known object classes information. Among these methods, attributes and image hierarchy are the widely used methods. In this paper, we combine both the strength of attributes and image hierarchy by proposing Attributes Relationship Model (ARM) to perform zero-shot learning. We tested the efficiency of the proposed algorithm on Animals with Attributes (AwA) dataset and manage to achieve state-of-the-art accuracy (50.61%) compare to other recent methods.
•A good denoising method is vital as it able to enhance the performance of next processes.•This paper extends the Benoit et al. work from monocular image to color video domain.•VESC (Video Epitome & ...Sparse Coding) framework is proposed for the video denoising task.•We show comparable results to conventional methods in both spatial and transform domain.•We also demonstrate the strength of the proposed method in visual tracking problem.
Denoising is a process that remove noise from a signal. In this paper, we present a unified framework to deal with video denoising problems by adopting a two-steps process, namely the video epitome and sparse coding. First, the video epitome will summarize the video contents and remove the redundancy information to generate a single compact representation to describe the video content. Second, employing the single compact representation as an input, the sparse coding will generate a visual dictionary for the video sequence by estimating the most representative basis elements. The fusion of these two methods have resulted an enhanced, compact representation for the denoising task. Experiments on the publicly available datasets have shown the effectiveness of our proposed system in comparison to the state-of-the-art algorithms in the video denoising task.
Introduction. To reduce cost of the evaluation of information retrieval systems, this study proposes a method that employs deep learning to predict the precision evaluation metric. It also aims to ...show why some of existing evaluation metrics correlate with each other while considering the varying distributions of relevance assessments. It aims to ensure reproducibility of all the presented experiments. Method. Using data from several test collections of the Text REetrieval Conference (TREC) we show why some evaluation metrics correlate with each other, through mathematical intuitions. In addition, regression models were used to investigate how the predictions of the evaluation metrics are affected by queries or topics with variations of relevance assessments. Lastly, the proposed prediction method employs deep learning. Analysis. We use coefficient of determination, Kendall's tau, Spearman and Pearson correlations. Results. This study showed that the proposed method performed better predictions than other recently proposed methods in retrieval research. It also showed why the correlation exists between precision and rank biased precision metrics, and why recall and average precision metrics have reduced correlation when the cut-off depth increases. Conclusions. The proposed method and the justifications for the correlations between some pairs of retrieval metrics will be valuable to researchers for the predictions of the evaluation metrics of information retrieval systems.