•This research aims to address the rare item detection problem in association rule mining.•A new assessment metric, called adjusted_support, is proposed for rare items detection.•A large size dataset ...with the data of about 600,000 patients is used to test the proposed metric.
•Adjusted_support is applied to discover rare association rules for diabetes complications.•Comorbidity index of diabetic patients in various demographic groups is analyzed.
Diabetes, one of the most serious and fast growing chronic health conditions, often leads to other serious complications such as neurological, renal, ophthalmic, and heart diseases. Research has shown that more than 85% of diabetic patients develop at least one of these complications. Therefore, studying comorbidities among diabetic patients using association analysis is a worthy research endeavor. Association analysis is a well-known data mining method that aims to reveal the association/affinity patterns/rules among various items (objects or events) that occur together. One of the most critical problems in association analysis is the difficulty with the identification of rare items/patterns. In ordinary association analysis, specifying a large minimum-support leads to not discovering rare rules, while setting a small minimum-support leads to over-generating rules that may not be strong and beneficial. In this study, we propose a new assessment metric, called adjusted_support, to address this problem. Applying this new metric can retrieve rare patterns without over-generating association rules. To test the proposed metric, we extracted data from a large and feature-rich electronic medical records data warehouse and performed association analysis on the resultant data set that included 492,025 unique patients diagnosed with diabetes and related complications. By applying adjusted_support, we discovered interesting associations among diabetes complications such as neurological manifestations with diabetic arthropathy and gastroparesis; renal manifestations with retinopathy; gastroparesis with ketoacidosis and retinopathy; and skin complications with hyperglycemia, peripheral circulatory disorder, heart disease, and neurological manifestations. We also performed association analysis in various demographic groups at more granular levels. Besides association analysis, we also analyzed the comorbidity situation among different demographic groups of diabetics. Finally, we studied and compared the prevalence of diabetes complications in every demographic group of patients.
Manufacturing industries have recently promoted smart manufacturing (SM) for achieving intelligence, connectedness, and responsiveness of manufacturing objects consisting of man, machine, and ...material. Traditional manufacturing platforms, which identify generic frameworks where common functionalities are shareable and diverse applications are workable, mainly focused on remote collaboration, distributed control, and data integration; however, they are limited to incorporating those characteristic achievements. The present work introduces an SM-toward manufacturing platform. The proposed platform incorporates the capabilities of (1) virtualization of manufacturing objects for their autonomy and cooperation, (2) processing of real and various manufacturing data for mediating physical and virtual objects, and (3) data-driven decision-making for predictive planning on those objects. For such capabilities, the proposed platform advances the framework of Holonic Manufacturing Systems with the use of agent technology. It integrates a distributed data warehouse to encompass data specification, storage, processing, and retrieval. It applies a data analytics approach to create empirical decision-making models based on real and historical data. Furthermore, it uses open and standardized data interfaces to embody interoperable data exchange across shop floors and manufacturing applications. We present the architecture and technical methods for implementing the proposed platform. We also present a prototype implementation to demonstrate the feasibility and effectiveness of the platform in energy-efficient machining.
Coffee, the second-largest global soft commodity, can take advantage of a comprehensive mining of daily and historical market data for more effective informed trading decisions. Advanced ICT and data ...mining technologies can change the trading market operation. The existing systems are confronted with certain constraints, including incomplete data, insufficient documentation for storage, and a requirement for a scalable infrastructure for big data analytics, such as a data warehouse or data lakehouse. To address this issue, the paper presents a design and implementation of a coffee commodity trading big data warehouse capable of analyzing various essential parameters for supporting informed decision-making. First, the designed system can automatically collect coffee trading data for New York Arabica coffee futures prices from selected worldwide reports and financial data portals. Next, the Extract, transform, and load (ETL) process is adopted to ingest coffee futures trading crawled data into the 3 layers data warehouse. Finally, the analytical system will extract and visualize selected key dimensions that influence coffee futures prices within different observation windows and perspectives. As a result, we implement a prototype of a coffee trading data warehouse on the crawled data from January 2000 to October 2022 and visualize trends in coffee futures prices based on the collected data for informed decision-making. The construction system is capable of stably operating and processing large volumes of transaction data. This paper will be valuable documentation for reference and decision support for coffee commodity trading enterprises and contribute to the development of future forecasting algorithms.
On the basis of studying the existing problems of power enterprise informatization at this stage, the paper introduces the application of data warehouse technology in power system, proposes to use ...enterprise-level data warehouse as the data centre, and describes the information of power enterprise based on data warehouse strategy The overall design of the system model; Finally, specific solutions and application examples are given for the design of the dimensional model of the power enterprise data warehouse.
This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query and retrieve web ...data, and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semi-structured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources and the XML extensions of On-Line Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich documents collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as, to identify open research lines.
Key Performance Indicators (KPI) measure the performance of an enterprise relative to its objectives thereby enabling corrective action where there are deviations. In current practice, KPIs are ...manually integrated within dashboards and scorecards used by decision makers. This practice entails various shortcomings. First, KPIs are not related to their business objectives and strategy. Consequently, decision makers often obtain a scattered view of the business status and business concerns. Second, while KPIs are defined by decision makers, their implementation is performed by IT specialists. This often results in discrepancies that are difficult to identify. In this paper, we propose an approach that provides decision makers with an integrated view of strategic business objectives and conceptual data warehouse KPIs. The main benefit of our proposal is that it links strategic business models to the data for monitoring and assessing them. In our proposal, KPIs are defined using a modeling language where decision makers specify KPIs using business terminology, but can also perform quick modifications and even navigate data while maintaining a strategic view. This enables monitoring and what-if analysis, thereby helping analysts to compare expectations with reported results.
•Novel approach for conceptualizing and specifying Key Performance Indicators.•Transforms strategic models into analytic tools to aid in decision making.•Enables the user to analyze data subspaces from a strategic point of view.•Based on the Semantics for Business Vocabulary and Rules specification.•Implemented to support the whole process from definition to data extraction.
The organizations always need to manage their operations, process the data electronically, and find a platform that help them to support their strategic decisions. The success of the human resource ...department can reflect overall organizational success. The human resource department professionals try to ensure finding the right person at the right time for the job that fits the person according to skills and qualifications. This task needs a platform that supports making the right decision based on historical managerial information. The data mart is a departmental based decision support system that used departmental data to help decision-makers to support short term decisions. Human resource (HR) data mart is the base stone for building an enterprise data warehouse. The paper presents the implementation process of HR data mart starting from implementing data mart schema to online analytical processing (OLAP) reports. The data mart is implemented on retired employees' data of Basra Oil Company for over 15 years. A human resource data mart can provide a base platform to perform a different analysis operation to support the right decisions. Different OLAP reports are implemented to help analysts and decision-makers to get the answers for their questions as OLAP queries. Two categories of reports are implemented offline reports using Microsoft Excel Pivot Table 2010 and web OLAP reports using SQL Server Reporting Service 2014 (SSRS). The tools used to implement data mart vary from SQL Server Management Services (SSMS) 2014, SQL Server Integration Service 2014 (SSIS), SQL Server Analytical Service 2014 (SSAS), SQL Server Reporting Service 2014 (SSRS), SQL Server Data Tools 2013 (SSDT), and Microsoft Excel Pivot Table 2010.
Data warehousing refers to the process of organizing data from different sources to support the process of decision making in organizations as well as support data analytics. Data warehousing ...assimilates and systematizes data from across different departments in an organization to have a single analysis of the full information. This data is then used to make the decisions in an organization. However, data warehousing has been underestimated in education institutions. The reason is that they are non-profits organizations. With the increase in the number of educational institutions coming up, colleges and universities should consider the integration of such data-driven support systems to make better decisions and for more organization in the academic processes especially for those institutions with different branches. The intricacy in managing these institutions requires development on how consistent information can be relayed to the decision-makers of an institution. This paper explains the need for data warehouses in education institutions and how the teaching institutions can implement data warehousing to help in the management of the institutions. The paper also explores and reviews various methods that institutions can use to mine data. Various stages for promoting and actualizing the system have been shown.
A warehouse covers a wide spectrum of operations for the distribution of goods in a supply chain network. The advancement of technology and the changing global business environment have compelled the ...transformation of a warehouse. The present study attempts to revisit the warehouse transformation from 1990 to 2019 through an evolutionary lens. A systematic literature review is conducted to answer a few basic research questions: what were the issues that warehouses faced during the time period, and how did the academic world approach it? And what would be the research agenda for the warehousing in the era of Industry 4.0? The analysis of the literature shows that warehousing research has changed from a traditional storeroom to a more automated and integrated warehousing system characterised by better efficiency and effectiveness. This study contributes to the development of warehousing research by discussing the development trends, addressing the research gaps, and recommending future research directions. The study also reflects the dominance of developed countries in warehousing research and alludes to more opportunities for practitioners and academicians in developing countries. Based on the decade-wise analysis of literature, an evolutionary framework for warehouse research is proposed which is expected to ensure the supply chain resilience proactively.
Users of a business intelligence (BI) system employ an approach referred to as online analytical processing (OLAP) to view multidimensional data from different perspectives. Query languages, e.g., ...SQL or MDX, allow for flexible querying of multidimensional data but query formulation is often time-consuming and cognitively challenging for many users. Alternatives to using a query language, e.g., graphical OLAP clients, parameterized reports, or dashboards, are often not a full-blown alternative to using a query language. Experience in cooperative research projects with industry led to the following observations regarding the use of OLAP queries in practice. First, within the same organization, similar OLAP queries are repeatedly composed from scratch in order to satisfy similar information needs. Second, across different organizations and even domains, OLAP queries with similar structures are repeatedly composed from scratch. Finally, vague requirements regarding frequently composed OLAP queries in the early stages of a project potentially lead to rushed development in later stages, which can be alleviated by following best practices for OLAP query composition. In engineering, knowledge about best-practice solutions to frequently arising challenges is often documented and represented using patterns. In that spirit, an OLAP pattern describes a generic solution for composing a query that allows a BI user to satisfy a certain type of information need given fragments of a conceptual model. This paper introduces a formal definition of OLAP patterns as well as an expressive, flexible, and generally applicable definition language.
•Individual OLAP queries are insufficient for documenting and sharing best-practice solutions for analytical problems.•OLAP patterns document best-practice solutions for satisfying generic information needs•Executable queries can be automatically obtained through instantiation of OLAP patterns.•OLAP patterns can be shared within organizations as well as across organizations and even across domains.•Applicability of OLAP patterns in a specific context is checked via parameters and constraints.