•We proposed a specialized method for external plagiarism detection.•It integrates semantic and syntactic information to capture the meaning of sentence.•To detect various plagiarism: copied text, ...paraphrasing, transformation of sentence.•Results displayed that it is to be preferred over PAN-11systems and other methods.
Plagiarism is described as the reuse of someone else's previous ideas, work or even words without sufficient attribution to the source. This paper presents a method to detect external plagiarism using the integration of semantic relations between words and their syntactic composition. The problem with the available methods is that they fail to capture the meaning in comparison between a source document sentence and a suspicious document sentence, when two sentences have same surface text (the words are the same) or they are a paraphrase of each other. Therefore it causes inaccurate or unnecessary matching results. However, this method can improve the performance of plagiarism detection because it is able to avoid selecting the source text sentence whose similarity with suspicious text sentence is high but its meaning is different. It is executed by computing the semantic and syntactic similarity of the sentence-to-sentence. Besides, the proposed method expands the words in sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed in a different wording. This method is also capable to identify various kinds of plagiarism such as the exact copied text, paraphrasing, transformation of sentences and changing of word structure in the sentences. As a result, the experimental results have displayed that the proposed method is able to improve the performance compared with the participating systems in PAN-PC-11. The experimental results also displayed that the proposed method demonstrates better performance as compared to other existing techniques on PAN-PC-10 and PAN-PC-11 datasets.
An adaptive learning environment provides personalised information to the learner through self-directed study. An adaptive learning environment model can be subdivided into a learner model, domain ...model, instructional model and adaptive engine. Personal traits comprise part of the components in a learner model and can be identified either explicitly or implicitly in an adaptive learning environment. In such an environment, the e-learning system should adapt to a learner's needs. However, even though academic research on adaptive learning environments has increased, the field lacks a comprehensive literature analysis of learners' personal traits in these environments. This study conducts a systematic literature review to identify the most commonly used personal traits in modelling the learner and the existing techniques suitable for identifying personal traits in an adaptive learning environment. A total of 140 articles spanning the years 2010–2017 are initially reviewed, from which 78 are selected based on the inclusion and exclusion criteria relevant to this study. This study provides an overview of learners' personal traits and the techniques used to identify them to provide a basis for improving adaptive learning environments. The findings indicate that most of the previous works used a learning style from the cognition learning domain category to model individual personal traits, while the computer-based detection technique was commonly applied to identify a learner's personal traits in adaptive learning environments. This study reveals the common learner characteristics used to develop learner models and the techniques for implementing such models. The findings of this paper can guide other researchers to recognise various personal traits and the identification technique for further studies, as well as assist developers in the development of the adaptive learning system.
•Comprehensive literature on identification of learner's characteristics.•The correlation between a specific personal trait and learning object.•Learner model implementation and enhancement techniques.•Impact of integrating personal traits in progressive learning.
The tremendous development in information technology led to an explosion of data and motivated the need for powerful yet efficient strategies for knowledge discovery. Question answering (QA) systems ...made it possible to ask questions and retrieve answers using natural language queries. In ontology-based QA system, the knowledge-based data, where the answers are sought, have a structured organization. The question-answer retrieval of ontology knowledge base provides a convenient way to obtain knowledge for use. In this paper, QAPD, an ontology-based QA system applied to the physics domain, which integrates natural language processing, ontologies and information retrieval technologies to provide informative information for users, is presented. This system allows users to retrieve information from formal ontologies using input queries formulated in natural language. We proposed inferring schema mapping method, which uses the combination of semantic and syntactic information, and attribute-based inference to transform users’ questions into ontological knowledge base query. In addition, a novel domain ontology for physics domain, called EAEONT, is presented. Relevant standards and regulations have been utilized extensively during the ontology building process. The original characteristic of system is the strategy used to fill the gap between users’ expressiveness and formal knowledge representation. This system has been developed and tested on the English language and using an ontology modeling the physics domain. The performance level achieved enables the use of the system in real environments.
Opinion summarization is a process to produce concise summaries from a large number of opinionated texts. In this paper, we present a novel deep-learning-based method for the generic opinion-oriented ...extractive summarization of multi-documents (also known as RDLS). The method comprises sentiment analysis embedding space (SAS), text summarization embedding spaces (TSS) and opinion summarizer module (OSM). SAS employs recurrent neural network (RNN) which is composed by long short-term memory (LSTM) to take advantage of sequential processing and overcome several flaws in traditional methods, where order and information about a word have vanished. Furthermore, it uses sentiment knowledge, sentiment shifter rules and multiple strategies to overcome the existing drawbacks. TSS exploits multiple sources of statistical and linguistic knowledge features to augment word-level embedding and extract a proper set of sentences from multiple documents. TSS also uses the Restricted Boltzmann Machine algorithm to enhance and optimize those features and improve resultant accuracy without losing any important information. OSM consists of two phases: sentence classification and sentence selection which work together to produce a useful summary. Experiment results show that RDLS outperforms other existing methods. Moreover, the ensemble of statistical and linguistic knowledge, sentiment knowledge, sentiment shifter rules and word-embedding model allows RLDS to achieve significant accuracy.
•A novel deep-learning-based method for opinion-oriented multi-document summarization.•Pre-trained deep-learning-based methods for opinion summary.•Method comprises word embedding, sentiment, statistical and linguistic knowledge.•Integrating sentence type, contextual polarity, word sense, sentiment shifter rules.•Results displayed that the method achieved significant accuracy.
Text summarization is a process of extracting salient information from a source text and presenting that information to the user in a condensed form while preserving its main content. In the text ...summarization, most of the difficult problems are providing wide topic coverage and diversity in a summary. Research based on clustering, optimization, and evolutionary algorithm for text summarization has recently shown good results, making this a promising area. In this paper, for a text summarization, a two‐stage sentences selection model based on clustering and optimization techniques, called COSUM, is proposed. At the first stage, to discover all topics in a text, the sentences set is clustered by using k‐means method. At the second stage, for selection of salient sentences from clusters, an optimization model is proposed. This model optimizes an objective function that expressed as a harmonic mean of the objective functions enforcing the coverage and diversity of the selected sentences in the summary. To provide readability of a summary, this model also controls length of sentences selected in the candidate summary. For solving the optimization problem, an adaptive differential evolution algorithm with novel mutation strategy is developed. The method COSUM was compared with the 14 state‐of‐the‐art methods: DPSO‐EDASum; LexRank; CollabSum; UnifiedRank; 0–1 non‐linear; query, cluster, summarize; support vector machine; fuzzy evolutionary optimization model; conditional random fields; MA‐SingleDocSum; NetSum; manifold ranking; ESDS‐GHS‐GLO; and differential evolution, using ROUGE tool kit on the DUC2001 and DUC2002 data sets. Experimental results demonstrated that COSUM outperforms the state‐of‐the‐art methods in terms of ROUGE‐1 and ROUGE‐2 measures.
An architecture for Malay Tweet normalization Saloot, Mohammad Arshi; Idris, Norisma; Mahmud, Rohana
Information processing & management,
09/2014, Volume:
50, Issue:
5
Journal Article
Peer reviewed
•To observe features of Malay Tweets, three distinct corpus-based analyses are done.•A rule-based architecture is developed based on results of the analyses.•The architecture consists of seven ...distinct modules in a pipeline structure.•Experimental results indicate high accuracy in term of BLEU score.•The architecture outperforms SMT-like normalization approach.
Research in natural language processing has increasingly focused on normalizing Twitter messages. Currently, while different well-defined approaches have been proposed for the English language, the problem remains far from being solved for other languages, such as Malay. Thus, in this paper, we propose an approach to normalize the Malay Twitter messages based on corpus-driven analysis. An architecture for Malay Tweet normalization is presented, which comprises seven main modules: (1) enhanced tokenization, (2) In-Vocabulary (IV) detection, (3) specialized dictionary query, (4) repeated letter elimination, (5) abbreviation adjusting, (6) English word translation, and (7) de-tokenization. A parallel Tweet dataset, consisting of 9000 Malay Tweets, is used in the development and testing stages. To measure the performance of the system, an evaluation is carried out. The result is promising whereby we score 0.83 in BLEU against the baseline BLEU, which scores 0.46. To compare the accuracy of the architecture with other statistical approaches, an SMT-like normalization system is implemented, trained, and evaluated with an identical parallel dataset. The experimental results demonstrate that we achieve higher accuracy by the normalization system, which is designed based on the features of Malay Tweets, compared to the SMT-like system.
•We proposed a specialized method for query-answering system.•It integrates semantic and syntactic information to capture the meaning of sentence.•It expands words in query and sentences to tackle ...the problem of information limit.•It used the greedy algorithm to impose diversity penalty on the sentences.•Results displayed that it is to be preferred other methods.
Question answering system aims at retrieving precise information from a large collection of documents. This work presents a question answering method to apply on Hadith in order to provide an informative answer corresponding to the user's query. Hadith englobes stories and qualification of the prophet Muhammad (PBSL). It also includes the sayings of his companions and their disciples.
The problem with current methods is that they fail to capture the meaning when comparing a sentence and a user's query; hence there is often a conflict between the extracted sentences and user's requirements. However, our proposed method has successfully tackled this problem through: (1) avoiding extract a passage whose similarity with the query is high but whose meaning is different. (2) Computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. (3) Expanding the words in both the query and sentences to tackle the fundamental problem of term mismatch between sentences and the user's query. Furthermore, in order to reduce redundant Hadith texts, the proposed method uses the greedy algorithm to impose diversity penalty on the sentences. The experimental results display that the proposed method is able to improve performance compared with the existing methods on Hadith datasets.
The Architecture of the ASHLK Display omitted
Adaptive support within a learning environment is useful because most learners have different personal characteristics such as prior knowledge, learning progress, and learning preferences. This study ...reviews various implementation of adaptive feedback, based on the four adaptation characteristics: means, target, goal, and strategy. This review focuses on 20 different implementations of feedback in a computer-based learning environment, ranging from multimedia web-based intelligent tutoring systems, dialog-based intelligent tutoring systems, web-based intelligent e-learning systems, adaptive hypermedia systems, and adaptive learning environment. The main objective of the review is to compare computer-based learning environments according to their implementation of feedback and to identify open research questions in adaptive feedback implementations. The review resulted in categorizing these feedback implementations based on the students’ information used for providing feedback, the aspect of the domain or pedagogical knowledge that is adapted to provide feedback based on the students’ characteristics, the pedagogical reason for providing feedback, and the steps taken to provide feedback with or without students’ participation. Other information such as the common adaptive feedback means, goals, and implementation techniques are identified. This review reveals a distinct relationship between the characteristics of feedback, features of adaptive feedback, and computer-based learning models. Other information such as the common adaptive feedback means, goals, implementation techniques, and open research questions are identified.
•We identified different knowledge base modelling and manipulation techniques based on 4 categories.•Compared knowledge base modelling and manipulation technologies based on their underlying ...theories, knowledge representation technique, knowledge acquisition technique, challenges, applications, development tools and development languages.•We discussed the relevance of knowledge-based business.•We proposed a promising technique for knowledge-based business management and other knowledge related applications.
A system which represents knowledge is normally referred to as a knowledge based system (KBS). This article focuses on surveying publications related to knowledge base modelling and manipulation technologies, between the years 2000–2015. A total of 185 articles excluding the subject descriptive articles which are mentioned in the introductory parts, were evaluated in this survey. The main aim of this study is to identify different knowledge base modelling and manipulation techniques based on 4 categories; 1) linguistic knowledge base; 2) expert knowledge base; 3) ontology and 4) cognitive knowledge base. This led to the proposition of 8 research questions, which focused on the different categories of knowledge base modelling technologies, their underlying theories, knowledge representation technique, knowledge acquisition technique, challenges, applications, development tools and development languages. A part of the findings from this survey is the high dependence of linguistic knowledge base, expert knowledge base and ontology on volatile expert knowledge. A promising technique for knowledge-based business management and other knowledge related applications is also discussed.
In this paper, a query-based summarization method, which uses a combination of semantic relations between words and their syntactic composition, to extract meaningful sentences from document sets is ...introduced. The problem with current statistical methods is that they fail to capture the meaning when comparing a sentence and a user query; hence there is often a conflict between the extracted sentences and users’ requirements. However, this particular method can improve the quality of document summaries because it is able to avoid extracting a sentence whose similarity with the query is high but whose meaning is different. The method is executed by computing the semantic and syntactic similarity of the sentence-to-sentence and sentence-to-query. To reduce redundancy in summary, this method uses the greedy algorithm to impose diversity penalty on the sentences. In addition, the proposed method expands the words in both the query and the sentences to tackle the problem of information limit. It bridges the lexical gaps for semantically similar contexts that are expressed using different wording. The experimental results display that the proposed method is able to improve performance compared with the participating systems in DUC 2006. The experimental results also showed that the proposed method demonstrates better performance as compared to other existing techniques on DUC 2005 and DUC 2006 datasets.