eXtensible Markup Language (XML) is one of the most used standards for information sharing between applications and devices, both on the internet and local network. However, relational database (RDB) ...has been used by many enterprises as their data management system and will require an amount of cost to change the system completely, if they are to change to XML technology solely. Thus, a mapping scheme is required to provide seamless integration on bridging XML technologies and RDBs. In this paper, an efficient model-based mapping scheme named XML-REG is proposed. The XML document will first be read and parsed into the parser, namely Streaming API for XML (StAX) parser. Then, each node will then be assigned with unique identification label to show the exact position of nodes in the document. Subsequently, by employing the proposed algorithm, data will then be transformed into tables in the RDB storage. As the result, two tables, namely (i) value table to store information carried by text node of the document, and (ii) path table to store the hierarchy structure of the document will be created. Experimental evaluations demonstrated that XML-REG outperformed some existing approaches, such as Mini-XML, XAncestor, XMap and XRecursive in terms of data storage size, mapping time and query retrieval time. In addition, the scalability test has also been conducted to show the capability of these approaches in supporting huge datasets, by scaling the DBLP dataset by times 5, times 10 and times 15. The results showed that XML-REG has the closest to linear graph compared to other existing approaches. On average, XML-REG showed the best performance in terms of query retrieval time and database storage size.
eXtensible Markup Language (XML) is used widely to transfer data among a wide variety of systems. Due to an increase in query workloads and management of larger datasets, centralized processing is no ...longer feasible for XML query processing. To address this issue, we propose a technique that improves XML query processing through query workload distribution. An effective distributed XML query processing can be affected by several criteria such as indexing, fragmentation, distribution strategy, and well as the query handling in the distributed servers. However, we believe that an efficient labeling mechanism and an inexpensive centralized query processors or a pruning method at dedicated servers contribute greatly to the overall performance of a distributed query processor. In this paper, we present an effective centralized pruning technique that is adopted into our proposed distributed XML query processing technique to process XML queries robustly. Experimental evaluations showed that the proposed distributed query processor superseded the performance of centralized query processor.
Background: Customer churn prediction (CCP) refers to detecting which customers are likely to cancel the services provided by a service provider, for example, internet services. The class imbalance ...problem (CIP) in machine learning occurs when there is a huge difference in the samples of the positive class compared to the negative class. It is one of the major obstacles in CCP as it deteriorates performance in the classification process. Utilizing data sampling techniques (DSTs) helps to resolve the CIP to some extent.
Methods: In this paper, we review the effect of using DSTs on algorithmic fairness, i.e., to investigate whether the results pose any discrimination between male and female groups and compare the results before and after using DSTs. Three real-world datasets with unequal balancing rates were prepared and four ubiquitous DSTs were applied to them. Six popular classification techniques were utilized in the classification process. Both classifier's performance and algorithmic fairness are evaluated with notable metrics.
Results: The results indicated that the Random Forest classifier outperforms other classifiers in all three datasets and, that using SMOTE and ADASYN techniques causes more discrimination in the female group. The rate of unintentional discrimination seems to be higher in the original data of extremely unbalanced datasets under the following classifiers: Logistics Regression, LightGBM, and XGBoost.
Conclusions: Algorithmic fairness has become a broadly studied area in recent years, yet there is very little systematic study on the effect of using DSTs on algorithmic fairness. This study presents important findings to further the use of algorithmic fairness in CCP research.
In recent years, Recommender System (RS) research work has covered a wide variety of Artificial Intelligence techniques, ranging from traditional Matrix Factorization (MF) to complex Deep Neural ...Networks (DNN). Traditional Collaborative Filtering (CF) recommendation methods such as MF, have limited learning capabilities as it only considers the linear combination between user and item vectors. For learning non-linear relationships, methods like Neural Collaborative Filtering (NCF) incorporate DNN into CF methods. Though, CF methods still suffer from cold start and data sparsity. This paper proposes an improved hybrid-based RS, namely Neural Matrix Factorization++ (NeuMF++), for effectively learning user and item features to improve recommendation accuracy and alleviate cold start and data sparsity. NeuMF++ is proposed by incorporating effective latent representation into NeuMF via Stacked Denoising Autoencoders (SDAE). NeuMF++ can also be seen as the fusion of GMF++ and MLP++. NeuMF is an NCF framework which associates with GMF (Generalized Matrix Factorization) and MLP (Multilayer Perceptrons). NeuMF achieves state-of-the-art results due to the integration of GMF linearity and MLP non-linearity. Concurrently, incorporating latent representations has shown tremendous improvement in GMF and MLP, which result in GMF++ and MLP++. Latent representation obtained through the SDAEs' latent space allows NeuMF++ to effectively learn user and item features, significantly enhancing its learning capability. However, sharing feature extractions among GMF++ and MLP++ in NeuMF++ might hinder its performance. Hence, allowing GMF++ and MLP++ to learn separate features provides more flexibility and greatly improves its performance. Experiments performed on a real-world dataset have demonstrated that NeuMF++ achieves an outstanding result of a test root-mean-square error of 0.8681. In future work, we can extend NeuMF++ by introducing other auxiliary information like text or images. Different neural network building blocks can also be integrated into NeuMF++ to form a more robust recommendation model.
To create a personalized E-CRM recommendation system, the electronic customer relationship management system needs to investigate low accuracy and lack of personalization through applied hybrid ...recommendation system techniques such as fuzzy and AHP. The main purpose of this research is to enhance the accuracy and deep understanding of common recommendation techniques in E-CRM. The fuzzy and AHP techniques have been used in the current study to the available information of objects and to extend recommendation areas. The findings indicate that each of these strategies is appropriate for a recommendation system in a technological environment. The present study makes several noteworthy contributions to the fuzzy Analytic Hierarchy Process (AHP) and has the maximum accuracy of any of these approaches, with 66.67% of accuracy. However, AHP outperforms all others in terms of time complexity. We advocate the concept and implementation of an intelligent business recommendation system dependent on a hybrid approval algorithm that serves as a model for E–CRM recommendation systems. This recommendation system's whole design revolves on the hybrid recommendation system. The systems additionally incorporate the recommendation modules and the recommendation measurement updating framework. The recommendation modules include the formulation and development of material recommendation algorithms, element collaborative filtering recommendation algorithms, and demography-based recommendation algorithms.
Service-Oriented Computing (SOC) has been a cornerstone development in today’s fast-paced world, covering the lifecycle of services and contributing to service delivery through distributed ...applications. SOC is striving to integrate cloud computing with complicated mobile apps. Dynamic and adaptable web service composition is the tip of the iceberg for SOA adoption. Dynamic service binding is vital for mobile computing due to the need for distributed mobile internet consumption at runtime. This study addresses SOC difficulties associated with web service composition, whose growth creates a paradigm change in identifying data type matching solutions. This allows for data type-level matching research to ensure high-quality web service creation. In this work, the composition process is divided into three phases: web service discovery, web service selection, and web service composition, where web service personalization and workflow reliability are emphasized. The final result is a complicated mobile app that runs without depleting device resources and an adaptable, reusable web service composition workflow. This improves matching at the data type level, an SOC pain point.
Background
: As the standard for the exchange of data over the World Wide Web, it is important to ensure that the eXtensible Markup Language (XML) database is capable of supporting not only efficient ...query processing but also capable of enduring frequent data update operations over the dynamic changes of Web content. Most of the existing XML annotation is based on a labeling scheme to identify each hierarchical position of the XML nodes. This computation is costly as any updates will cause the whole XML tree to be re-labelled. This impact can be observed on large datasets. Therefore, a robust labeling scheme that avoids re-labeling is crucial.
Method: Here, we present ORD-GAP (named after Order Gap), a robust and persistent XML labeling scheme that supports dynamic updates. ORD-GAP assigns unique identifiers with gaps in-between XML nodes, which could easily identify the level, Parent-Child (P-C), Ancestor-Descendant (A-D) and sibling relationship. ORD-GAP adopts the OrdPath labeling scheme for any future insertion.
Results: We demonstrate that ORD-GAP is robust enough for dynamic updates, and have implemented it in three use cases: (i) left-most, (ii) in-between and (iii) right-most insertion. Experimental evaluations on DBLP dataset demonstrated that ORD-GAP outperformed existing approaches such as ORDPath and ME Labeling concerning database storage size, data loading time and query retrieval. On average, ORD-GAP has the best storing and query retrieval time.
Conclusion: The main contributions of this paper are: (i) A robust labeling scheme named ORD-GAP that assigns certain gap between each node to support future insertion, and (ii) An efficient mapping scheme, which built upon ORD-GAP labeling scheme to transform XML into RDB effectively.
Background: A recommender system captures the user preferences and behaviour to provide a relevant recommendation to the user. In a hybrid model-based recommender system, it requires a pre-trained ...data model to generate recommendations for a user. Ontology helps to represent the semantic information and relationships to model the expressivity and linkage among the data.
Methods: We enhanced the matrix factorization model accuracy by utilizing ontology to enrich the information of the user-item matrix by integrating the item-based and user-based collaborative filtering techniques. In particular, the combination of enriched data, which consists of semantic similarity together with rating pattern, will help to reduce the cold start problem in the model-based recommender system. When the new user or item first coming into the system, we have the user demographic or item profile that linked to our ontology. Thus, semantic similarity can be calculated during the item-based and user-based collaborating filtering process. The item-based and user-based filtering process are used to predict the unknown rating of the original matrix.
Results: Experimental evaluations have been carried out on the MovieLens 100k dataset to demonstrate the accuracy rate of our proposed approach as compared to the baseline method using (i) Singular Value Decomposition (SVD) and (ii) combination of item-based collaborative filtering technique with SVD. Experimental results demonstrated that our proposed method has reduced the data sparsity from 0.9542% to 0.8435%. In addition, it also indicated that our proposed method has achieved better accuracy with Root Mean Square Error (RMSE) of 0.9298, as compared to the baseline method (RMSE: 0.9642) and the existing method (RMSE: 0.9492).
Conclusions: Our proposed method enhanced the dataset information by integrating user-based and item-based collaborative filtering techniques. The experiment results shows that our system has reduced the data sparsity and has better accuracy as compared to baseline method and existing method.