Data analysis is a widely researched field, where innumerable applications allow to discover domain particularities that are specially useful. In this paper, we introduce the data analysis process ...that we applied to two different systems storing information about statements and testimonies of crimes against Humanity. We describe the activities, design decisions and lessons learned from implementing a specific goal, which involves transforming text data into georeferenced information.
Nowadays, the increasing technological improvements followed by data demand made businesses and organizations follow these improvements to continue operating. The Know Your Customer (KYC) assessment ...is still manually done, followed by the growing amount of data, resulting in data accumulation and affecting the time organizations spend analyzing the prospective applicant data. The unavailability of tools that can help review documents and data leads to more data accumulation and timeconsuming KYC assessments. This paper aims to create a Business Intelligence (BI) system to help financial organizations analyze, process applicant data, and determine applicants’ eligibility who are willing to get credit card services. This paper may be utilized by an enterprise operating in the financial sector that follows the KYC procedure to identify applicants who require credit card services. The development of BI systems is predicted to help reduce the time spent to validate application data, particularly in the banking or financial industry. This study has designed three dimension tables and a fact table using Microsoft SQL Server 18 for the data warehouse. Pentaho Data Integration is used for the ETL process, and Tableau creates the dashboard. The dashboard contains general information and the loan repayment status of an applicant. Two pivot tables were created using Microsoft Excel to summarize the loan repayment status of an applicant.
Today during ‘Covid-20’, people are more inclined towards online shopping. In general practice, analysis of browsing history and customer’s micro behaviour against online shopping habits have been ...used for future suggestions. Due to this, the predictions made were suffereing from over-similarity problem and the user was unable to find any novelty in the recommended items. Observing these issues, e-shopping quality can be enhanced by adding a factor other than similarity. The current research suggests and advertise those products which belongs to a person’s region. For this research work the data has been collected on the basis of area-wise, like, country-based seggregation. Here the considered dataset belongs to country, ‘India’, its culture, its handicraft and its citizens. Datasets and their combinations based on multiple attributes are input for the proposed predictive system. In this paper, existing data is also considered for collecting customers demographic details which is further mapped with the area-wise dataset. Also, a framework has been proposed which uses database and user query as input for its predictive system in order to generate default suggestions for the user other than the submitted query also.
Increasing the speed and reducing the use of resources in data integration process have always been the goal of developers and researchers in the process of data integration. The purpose of this ...study is to provide a solution using metadata as well as web browsing to speed up the process, so as to improve resources such as memory. The proposed solution is implemented using the three-layer architecture approach, which includes business logic layers, software layer and data access layer. After implementing the proposed strategy, it was tested on 5000 database records in a case study (Shahsavand Tea Company). The solution presented in addition to a comparison with several similar cases has also been surveyed by experts. The results show that the proposed solution has been able to increase the data transfer speed and improve the use of memory resources in the given data volume. Also, according to the answers given to the questionnaire by experts, it was found that user-friendly software design has been able to facilitate the use of the tool for users.
The extract, transform, and load (ETL) process is at the core of data warehousing architectures. As such, the success of data warehouse (DW) projects is essentially based on the proper modeling of ...the ETL process. As there is no standard model for the representation and design of this process, several researchers have made efforts to propose modeling methods based on different formalisms, such as unified modeling language (UML), ontology, model-driven architecture (MDA), model-driven development (MDD), and graphical flow, which includes business process model notation (BPMN), colored Petri nets (CPN), Yet Another Workflow Language (YAWL), CommonCube, entity modeling diagram (EMD), and so on. With the emergence of Big Data, despite the multitude of relevant approaches proposed for modeling the ETL process in classical environments, part of the community has been motivated to provide new data warehousing methods that support Big Data specifications. In this paper, we present a summary of relevant works related to the modeling of data warehousing approaches, from classical ETL processes to ELT design approaches. A systematic literature review is conducted and a detailed set of comparison criteria are defined in order to allow the reader to better understand the evolution of these processes. Our study paints a complete picture of ETL modeling approaches, from their advent to the era of Big Data, while comparing their main characteristics. This study allows for the identification of the main challenges and issues related to the design of Big Data warehousing systems, mainly involving the lack of a generic design model for data collection, storage, processing, querying, and analysis.
A data warehouse (DW) is a vast repository of data that facilitates decision-making for businesses and companies. This concept dates back to the 1980s and it has been widely accepted. One of the key ...points for the success of the process of data warehousing lies in the definition of the warehouse model depending on data sources and analysis needs. Once the data warehouse is designed, the content and structure of the data sources, as well as the requirements analysis are required to evolve, therefore, an evolution of the model must take place (diagram and data). In this context, several approaches have been developed to design and implement data warehouses. Nevertheless, there is no standard process that deals with designing all of the data warehouse layers, also, there is no software that encompasses this type of problem. In general, the majority of these approaches focus on a particular aspect of data warehouse such as data storage, ETL process, OLAP, reporting, etc, and does not cover its entire lifecycle. A Model-Driven Architecture (MDA) is a standard approach, its aims to support all phases of software manufacturing by promoting the use of models and the transformations between them. Moreover, this approach aims to automate the process of software engineering, thereby decreasing the cost of software development and enhancing its productivity. In this study, we present a systematic review of various works on the data warehouse design methods. We compare and discuss these works according to the criteria that seem relevant for this issue. We present a new design approach for multidimensional schemas construction from relational models using MDA techniques, we also develop the resulting research perspectives.
Extraction, Transformation and Loading (ETL) is introduced as one of the notable subjects in optimization, management, improvement and acceleration of processes and operations in data bases and data ...warehouses. The creation of ETL processes is potentially one of the greatest tasks of data warehouses and so its production is a time-consuming and complicated procedure. Without optimization of these processes, the implementation of projects in data warehouses area is costly, complicated and time-consuming. The present paper used the combination of parallelization methods and shared cache memory in systems distributed on the basis of data warehouse. According to the conducted assessment, the proposed method exhibited 7.1% speed improvement to kattle optimization instrument and 7.9% to talend instrument in terms of implementation time of the ETL process. Therefore, parallelization could notably improve the ETL process. It eventually caused the management and integration processes of big data to be implemented in a simple way and with acceptable speed.