Many evolutionary computation (EC) methods have been used to solve feature selection problems and they perform well on most small-scale feature selection problems. However, as the dimensionality of ...feature selection problems increases, the solution space increases exponentially. Meanwhile, there are more irrelevant features than relevant features in datasets, which leads to many local optima in the huge solution space. Therefore, the existing EC methods still suffer from the problem of stagnation in local optima on large-scale feature selection problems. Furthermore, large-scale feature selection problems with different datasets may have different properties. Thus, it may be of low performance to solve different large-scale feature selection problems with an existing EC method that has only one candidate solution generation strategy (CSGS). In addition, it is time-consuming to find a suitable EC method and corresponding suitable parameter values for a given large-scale feature selection problem if we want to solve it effectively and efficiently. In this article, we propose a self-adaptive particle swarm optimization (SaPSO) algorithm for feature selection, particularly for large-scale feature selection. First, an encoding scheme for the feature selection problem is employed in the SaPSO. Second, three important issues related to self-adaptive algorithms are investigated. After that, the SaPSO algorithm with a typical self-adaptive mechanism is proposed. The experimental results on 12 datasets show that the solution size obtained by the SaPSO algorithm is smaller than its EC counterparts on all datasets. The SaPSO algorithm performs better than its non-EC and EC counterparts in terms of classification accuracy not only on most training sets but also on most test sets. Furthermore, as the dimensionality of the feature selection problem increases, the advantages of SaPSO become more prominent. This highlights that the SaPSO algorithm is suitable for solving feature selection problems, particularly large-scale feature selection problems.
Globally, tuberculosis (TB) remains one of the most deadly diseases. Although several effective diagnosis methods exist, in lower income countries clinics may not be in a position to afford expensive ...equipment and employ the trained experts needed to interpret results. In these situations, symptoms including cough are commonly used to identify patients for testing. However, self-reported cough has suboptimal sensitivity and specificity, which may be improved by digital detection.
This study investigates a simple and easily applied method for TB screening based on the automatic analysis of coughing sounds. A database of cough audio recordings was collected and used to develop statistical classifiers.
These classifiers use short-term spectral information to automatically distinguish between the coughs of TB positive patients and healthy controls with an accuracy of 78% and an AUC of 0.95. When a set of five clinical measurements is available in addition to the audio, this accuracy improves to 82%. By choosing an appropriate decision threshold, the system can achieve a sensitivity of 95% at a specificity of approximately 72%. The experiments suggest that the classifiers are using some spectral information that is not perceivable by the human auditory system, and that certain frequencies are more useful for classification than others.
We conclude that automatic classification of coughing sounds may represent a viable low-cost and low-complexity screening method for TB.
The 2007 World Health Organization (WHO) classification of brain tumors did not use molecular abnormalities as diagnostic criteria. Studies have shown that genotyping allows a better prognostic ...classification of diffuse glioma with improved treatment selection. This has resulted in a major revision of the WHO classification, which is now for adult diffuse glioma centered around isocitrate dehydrogenase (IDH) and 1p/19q diagnostics. This revised classification is reviewed with a focus on adult brain tumors, and includes a recommendation of genes of which routine testing is clinically useful. Apart from assessment of IDH mutational status including sequencing of R132H-immunohistochemistry negative cases and testing for 1p/19q, several other markers can be considered for routine testing, including assessment of copy number alterations of chromosome 7 and 10 and of TERT promoter, BRAF, and H3F3A mutations. For "glioblastoma, IDH mutated" the term "astrocytoma grade IV" could be considered. It should be considered to treat IDH wild-type grades II and III diffuse glioma with polysomy of chromosome 7 and loss of 10q as glioblastoma. New developments must be more quickly translated into further revised diagnostic categories. Quality control and rapid integration of molecular findings into the final diagnosis and the communication of the final diagnosis to clinicians require systematic attention.
The purposes of this study were to assess whether CT texture analysis and CT features are predictive of pancreatic neuroendocrine tumor (PNET) grade based on the World Health Organization (WHO) ...classification and to identify features related to disease progression after surgery.
Preoperative contrast-enhanced CT images of 101 patients with PNETs were assessed. The images were evaluated for tumor location, tumor size, tumor pattern, predominantly solid or cystic composition, presence of calcification, presence of heterogeneous enhancement on contrast-enhanced images, presence of pancreatic duct dilatation, presence of pancreatic atrophy, presence of vascular involvement by the tumor, and presence of lymphadenopathy. Texture features were also extracted from CT images. Surgically verified tumors were graded according to the WHO classification, and patients underwent CT or MRI follow-up after surgical resection. Data were analyzed with chi-square tests, kappa statistics, logistic regression analysis, and Kaplan-Meier curves.
The CT features predictive of a more aggressive tumor (grades 2 and 3) were size larger than 2.0 cm (odds ratio OR, 3.3; p = 0.014), presence of vascular involvement (OR, 25.2; p = 0.003), presence of pancreatic ductal dilatation (OR, 6.0; p = 0.002), and presence of lymphadenopathy (OR, 6.8; p = 0.002). The texture parameter entropy (OR, 3.7; p = 0.008) was also predictive of more aggressive tumors. Differences in progression-free survival distribution were found for grade 1 versus grades 2 and 3 tumors (χ
df, 1 = 21.6; p < 0.001); for PNETs with vascular involvement (χ
df, 1 = 20.8; p < 0.001); and for tumors with entropy (spatial scale filter 2) values greater than 4.65 (χ
(df, 1) = 4.4; p = 0.037).
CT texture analysis and CT features are predictive of PNET aggressiveness and can be used to identify patients at risk of early disease progression after surgical resection.
The main purpose of this study is to identify the trends in predatory publishing and to compile a core reading list of documents on the topic of ‘predatory journals.’ The study examined 541 documents ...on the topic of ‘predatory journals’ indexed in the Web of Science database published between 2012 and 2021. The data set was analyzed quantitatively (bibliometric study) and qualitatively (document classification). For bibliometric analysis, parameters like year, disciplines, number of citations, countries, document types, and journals were used. The documents were classified into four groups, namely, General (326), Empirical Studies (89), Technical Specifics (71), and Cautionary Texts (55). The results of the analysis and co-relation between quantitative and qualitative parameters reveal that publications in medical sciences (221) form the majority in almost all groups. There is a steady growth in publications in all groups during 2018 and 2019. Research papers and editorial materials are greater in number. The largest number of documents are from the United States (163 documents). A large number of papers have been published in the journals Scientometrics (22) and Learned Publishing (28). The most highly cited (17) papers have been published in Nature. The core reading list of forty documents on predatory journals is the outcome of the study after examining the co-relationship between the two methods. The core reading list may assist new researchers in comprehending the various aspects of predatory journals. The article concludes with suggestions for further research.
The fight against offensive speech on the Internet necessitates increased efforts from linguistic analysis and artificial intelligence perspectives to develop countermeasures and preventive methods. ...Reliable predictions can only be obtained if these methods are exposed to a representative sample of the domain or environment under consideration. Datasets serve as the foundation for significant developments in this field because they are the main means of obtaining appropriate instances that reveal the multiple and varied faces of the offensive speech phenomenon. In this sense, we present Ar-PuFi, a dataset of offensive speech towards Public Figures in the Arabian community. With 24,071 comments collected from TV interviews with Egyptian celebrities belonging to six domains of public interest, Ar-PuFi is currently the largest Arabic dataset in terms of its category and size. The examples were annotated by three native speakers over the course of two months and are provided with two-class and six-class annotations based on the presence or absence of explicit and implicit offensive content. We evaluated the performance of a diverse set of classification models employing several text representations of actual examples (e.g., N-gram, TF/IDF, AraVec, and fastText), and AraBERT achieved the baseline for the new dataset in both offensive detection and group classification. Additionally, we apply the Pointwise Mutual Information (PMI) technique to comments within the target domain in order to derive a lexicon of offensive terms associated with each domain of ArPuFi. We further explored whether active learning (AL) or meta-learning (ML) frameworks could be used to reduce the labeling effort required for our dataset without affecting prediction quality and found that, though AL can reduce the amount of data annotations by 10% over the ML approach, neither approach requires less than about 70% of the full dataset to achieve baseline performance. Finally, we took advantage of the availability of relevant datasets and conducted a cross-domain experiment to back up our claims not only about the uniqueness of our dataset but also about the difficulty of adapting Arabic dialects against one another.
We consider the problem of finding an accurate representation of neuron shapes, extracting sub-cellular features, and classifying neurons based on neuron shapes. In neuroscience research, the ...skeleton representation is often used as a compact and abstract representation of neuron shapes. However, existing methods are limited to getting and analyzing "curve" skeletons which can only be applied for tubular shapes. This paper presents a 3D neuron morphology analysis method for more general and complex neuron shapes. First, we introduce the concept of skeleton mesh to represent general neuron shapes and propose a novel method for computing mesh representations from 3D surface point clouds. A skeleton graph is then obtained from skeleton mesh and is used to extract sub-cellular features. Finally, an unsupervised learning method is used to embed the skeleton graph for neuron classification. Extensive experiment results are provided and demonstrate the robustness of our method to analyze neuron morphology.