Measuring the similarity between documents is an important operation in the text processing field. In this paper, a new similarity measure is proposed. To compute the similarity between two documents ...with respect to a feature, the proposed measure takes the following three cases into account: a) The feature appears in both documents, b) the feature appears in only one document, and c) the feature appears in none of the documents. For the first case, the similarity increases as the difference between the two involved feature values decreases. Furthermore, the contribution of the difference is normally scaled. For the second case, a fixed value is contributed to the similarity. For the last case, the feature has no contribution to the similarity. The proposed measure is extended to gauge the similarity between two sets of documents. The effectiveness of our measure is evaluated on several real-world data sets for text classification and clustering problems. The results show that the performance obtained by the proposed measure is better than that achieved by other measures.
The Consequences of Reading Inaccurate Information Rapp, David N.
Current directions in psychological science : a journal of the American Psychological Society,
08/2016, Letnik:
25, Številka:
4
Journal Article
Recenzirano
We are regularly confronted with statements that are inaccurate, sometimes obviously so. Unfortunately, people can be influenced by and rely upon inaccurate information, engaging in less critical ...evaluation than might be hoped. Empirical studies have consistently demonstrated that even when people should know better, reading inaccurate information can affect their performance on subsequent tasks. What encourages people's encoding and use of false statements? The current article outlines how reliance on inaccurate information is a predictable consequence of the routine cognitive processes associated with memory, problem solving, and comprehension. This view helps identify conditions under which inaccurate information is more or less likely to influence subsequent decisions. These conditions are informative in the consideration of information-design approaches and instructional methods intended to support critical thinking.
•This work presents the methodologies and evaluation results for the WHS algorithms selected from the submissions to the Multi-Modality Whole Heart Segmentation (MM-WHS) challenge, in conjunction ...with MICCAI 2017.•This work introduces the related information to the challenge, discusses the results from the conventional methods and deep learning-based algorithms, and provides insights to the future research.•The challenge provides a fair and intuitive comparison framework for methods developed and being developed for WHS.•The challenge provides the training datasets with manually delineated ground truths and evaluation for an ongoing development of MM-WHS algorithms.
Display omitted This manuscript presents the methodologies and evaluation results for the WHS algorithms selected from the submissions to the Multi-Modality Whole Heart Segmentation (MMWHS) challenge, in conjunction with MICCAI-STACOM 2017. The challenge provides 120 three-dimensional cardiac images covering the whole heart, including 60 CT and 60 MRI volumes, all acquired in clinical environments with manual delineation. Ten algorithms for CT data and eleven algorithms for MRI data, submitted from twelve groups, have been evaluated. The results show that many of the deep learning (DL) based methods achieved high accuracy, even though the number of training datasets were limited. Several of them also reported poor results in the blinded evaluation, probably due to over fitting in their training. The conventional algorithms, mainly based on multi-atlas segmentation, demonstrated robust and stable performance, even though the accuracy is not as good as the best DL method in CT segmentation. The challenge, including provision of the annotated training data and the blinded evaluation for submitted algorithms on the test data, continues as an ongoing benchmarking resource.
Knowledge of whole heart anatomy is a prerequisite for many clinical applications. Whole heart segmentation (WHS), which delineates substructures of the heart, can be very valuable for modeling and analysis of the anatomy and functions of the heart. However, automating this segmentation can be challenging due to the large variation of the heart shape, and different image qualities of the clinical data. To achieve this goal, an initial set of training data is generally needed for constructing priors or for training. Furthermore, it is difficult to perform comparisons between different methods, largely due to differences in the datasets and evaluation metrics used. This manuscript presents the methodologies and evaluation results for the WHS algorithms selected from the submissions to the Multi-Modality Whole Heart Segmentation (MM-WHS) challenge, in conjunction with MICCAI 2017. The challenge provided 120 three-dimensional cardiac images covering the whole heart, including 60 CT and 60 MRI volumes, all acquired in clinical environments with manual delineation. Ten algorithms for CT data and eleven algorithms for MRI data, submitted from twelve groups, have been evaluated. The results showed that the performance of CT WHS was generally better than that of MRI WHS. The segmentation of the substructures for different categories of patients could present different levels of challenge due to the difference in imaging and variations of heart shapes. The deep learning (DL)-based methods demonstrated great potential, though several of them reported poor results in the blinded evaluation. Their performance could vary greatly across different network structures and training strategies. The conventional algorithms, mainly based on multi-atlas segmentation, demonstrated good performance, though the accuracy and computational efficiency could be limited. The challenge, including provision of the annotated training data and the blinded evaluation for submitted algorithms on the test data, continues as an ongoing benchmarking resource via its homepage (www.sdspeople.fudan.edu.cn/zhuangxiahai/0/mmwhs/).
This paper aims at reconstructing the evolution of all the available COVID-19 vaccines trials extracted from the COVID-NMA database by applying the phylomemy reconstruction process. We visualize the ...textual contents of 1,794 trials descriptions and explore their collective structure along with their semantic dynamics. We map the continuous progress of the main COVID-19 vaccine platforms from their early-stage trials in February 2020 to their most recent combinations driven by the rise of variants of concern, third dose issues and heterologous vaccinations. This paper brings insights for the global coordination between research teams especially in crisis situations such as the COVID-19 pandemic.
Today, extreme amounts of data are produced, and this is commonly referred to as Big Data. A significant amount of big data is composed of textual data, and as such, text processing has ...correspondingly increased in its importance. This is especially valid to the development of word embedding and other groundbreaking advancements in this field. However, When studies on text processing and word embedding are examined, it can be seen that while there have been many world language-oriented studies, especially for the English language, there has been an insufficient level of study undertaken specific to the Turkish language. As a result, Turkish was chosen as the target language for the current study. Two Turkish datasets were created for this study. Word vectors were trained using the Word2Vec method on an unlabeled large corpus of approximately 11 billion words. Using these word vectors, text classification was applied with deep neural networks on a second dataset of 1.5 million examples and 10 classes. The current study employed the Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and the Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) methods – other types of this architecture – and their variations as deep neural network architectures. The performances of the embedding methods for the words used in this study, their effects on the rate of accuracy, and the success of the deep neural network architectures were then analyzed in detail. When studying the experimental results, it was determined that the GRU and LSTM methods were more successful compared to the other deep neural network models used in this study. The results showed that the pre-trained word vectors’ (PWVs) accuracy on deep neural networks improved at rates of approximately 5% and 7%. The datasets and word vectors of the current study will be shared in order to contribute to the Turkish language literature in this field.
•The largest Turkish no-labeled dataset and word vectors were created.•Text classification was applied with deep neural networks on another Turkish multiclass dataset.•The effect of using the pre-word embeddings with deep neural networks was investigated.•Comparison of deep neural networks and word embedding methods performances was analyzed.•The accuracy rate was improved using Turkish pre-trained word vectors with transfer learning.
Emerging flexible and stretchable devices open up novel and attractive applications beyond traditional rigid wearable devices. Since the small and flexible form-factor severely limits the battery ...capacity, energy harvesting (EH) stands out as a critical enabler of new devices. Despite increasing interest in recent years, the capacity of wearable energy harvesting remains unknown. Prior work analyzes the power generated by a single and typically rigid transducer. This choice limits the EH potential and undermines physical flexibility. Moreover, current results do not translate to total harvested energy over a given period, which is crucial from a developer perspective. In contrast, this paper explores the daily energy harvesting potential of combining flexible light and motion energy harvesters. It first presents a multi-modal energy harvesting system design whose inputs are flexible photo-voltaic cells and piezoelectric patches. We measure the generated power under various light intensity and gait speeds. Finally, we construct daily energy harvesting patterns of 9593 users by integrating our measurements with the activity data from the American Time Use Survey. Our results show that the proposed system can harvest on average 0. 6mAh @ 3. 6V per day.
Sentiment analysis (SA) is a continuing field of research that lies at the intersection of many fields such as data mining, natural language processing and machine learning. It is concerned with the ...automatic extraction of opinions conveyed in a certain text. Due to its vast applications, many studies have been conducted in the area of SA especially on English texts, while other languages such as Arabic received less attention. This survey presents a comprehensive overview of the works done so far on Arabic SA (ASA). The survey groups published papers based on the SA-related problems they address and tries to identify the gaps in the current literature laying foundation for future studies in this field.
This paper gives a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. The main advantage of this ...approach is its flexibility in changing speaker identities, emotions, and speaking styles. This paper also discusses the relation between the HMM-based approach and the more conventional unit-selection approach that has dominated over the last decades. Finally, advanced techniques for future developments are described.
Robust Text Detection in Natural Scene Images Yin, Xu-Cheng; Yin, Xuwang; Huang, Kaizhu ...
IEEE transactions on pattern analysis and machine intelligence,
05/2014, Letnik:
36, Številka:
5
Journal Article
Recenzirano
Odprti dostop
Text detection in natural scene images is an important prerequisite for many content-based image analysis tasks. In this paper, we propose an accurate and robust method for detecting texts in natural ...scene images. A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSERs) as character candidates using the strategy of minimizing regularized variations. Character candidates are grouped into text candidates by the single-link clustering algorithm, where distance weights and clustering threshold are learned automatically by a novel self-training distance metric learning algorithm. The posterior probabilities of text candidates corresponding to non-text are estimated with a character classifier; text candidates with high non-text probabilities are eliminated and texts are identified with a text classifier. The proposed system is evaluated on the ICDAR 2011 Robust Reading Competition database; the f-measure is over 76%, much better than the state-of-the-art performance of 71%. Experiments on multilingual, street view, multi-orientation and even born-digital databases also demonstrate the effectiveness of the proposed method.
Methods of automatic extraction and identification of properties, quantities and units of measurement from texts are considered. The developed ontology of properties and units of measurement includes ...both fundamental connections of properties and units of measurement and connections that show the formation of concepts. The technology of property identification is based on fact that semantically significant text elements extracted within the boundaries of one or several sentences forming the semantic neighborhood of the property are correlated with the corresponding components of the ontology, which allows to restore the missing semantic fragments or identify discrepancies in the designations. The results of experimental studies of the effectiveness of the developed tools are presented.