PurposeDespite the increasing role of the data warehouse as a supportive decision-making tool in today's business world, academic research for measuring its effectiveness has been lacking. This ...paucity of academic interest stimulated us to evaluate data warehousing effectiveness in the organizational context of Jordanian banks.Design/methodology/approachThis paper develops a theoretical model specific to the data warehouse system domain that builds on the DeLone and McLean model. The model is empirically tested by means of structural equation modelling applying the partial least squares approach and using data collected in a survey questionnaire from 127 respondents at Jordanian banks.FindingsEmpirical data analysis supported that data quality, system quality, user satisfaction, individual benefits and organizational benefits have made strong contributions to data warehousing effectiveness in our organizational data context.Practical implicationsThe results provide a better understanding of the data warehouse effectiveness and its importance in enabling the Jordanian banks to be competitive.Originality/valueThis study is indeed one of the first empirical attempts to measure data warehouse system effectiveness and the first of its kind in an emerging country such as Jordan.
Much of the research on Business Intelligence (BI) has examined the ability of BI systems to help organizations address challenges and opportunities. However, the literature is fragmented and lacks ...an overarching framework to integrate findings and systematically guide research. Moreover, researchers and practitioners continue to question the value of BI systems. This study reviews and synthesizes empirical Information System (IS) studies to learn what we know, how well we know, and what we need to know about the processes of organizations obtaining business value from BI systems. The study aims to identify which parts of the BI business value process have been studied and are still most in need of research, and to propose specific research questions for the future. The findings show that organizations appear to obtain value from BI systems according to the process suggested by Soh and Markus (1995), as a chain of necessary conditions from BI investments to BI assets to BI impacts to organizational performance; however, researchers have not sufficiently studied the probabilistic processes that link the necessary conditions together. Moreover, the research has not sufficiently covered all relevant levels of analysis, nor examined how the levels link up. Overall, the paper identified many opportunities for researchers to provide a more complete picture of how organizations can and do obtain value from BI.
•How do organizations obtain value from BI systems?•Comprehensive review of BI literature from 1/2000 to 8/2015•Mapped literature findings to integrated framework of BI value•Results show the field's knowledge of the necessary conditions for obtaining value.•Results show the field's lack of knowledge of the processes for obtaining value.
•A bibliometric analysis on Big Data and Business Intelligence from 1990 to 2016.•Big Data papers grow much faster than Business Intelligence papers•Computer Science and information systems are two ...core disciplines.•Most influential papers are identified and a research framework is proposed.
Business Intelligence that applies data analytics to generate key information to support business decision making, has been an important area for more than two decades. In the last five years, the trend of “Big Data” has emerged and become a core element of Business Intelligence research. In this article, we review academic literature associated with “Big Data” and “Business Intelligence” to explore the development and research trends. We use bibliometric methods to analyze publications from 1990 to 2017 in journals indexed in Science Citation Index Expanded (SCIE), Social Science Citation Index (SSCI) and Arts & Humanities Citation Index (AHCI). We map the time trend, disciplinary distribution, high-frequency keywords to show emerging topics. The findings indicate that Computer Science and management information systems are two core disciplines that drive research associated with Big Data and Business Intelligence. “Data mining”, “social media” and “information system” are high frequency keywords, but “cloud computing”, “data warehouse” and “knowledge management” are more emphasized after 2016.
This paper proposes and experimentally assesses a
rewrite/merge approach for supporting real-time data warehousing via lightweight data integration
. Real-time data warehouses are becoming more and ...more relevant actually, due to emerging research challenges such as
Big Data
and
Cloud Computing
. Our contribution fulfills limitations of actual data warehousing architectures, which are no suitable to perform classical operations (e.g., loading, aggregation, indexing, OLAP query answering, and so forth) under
real-time constraints
. The proposed approach is based on
intelligent manipulation of SQL statements of input queries
, which are decomposed in suitable sub-queries (the rewrite phase) that are finally submitted as (final) input queries to an ad hoc component responsible for the cooperative query answering via a
parallel query processing
inspired method (the merge phase). This method induces in a novel data warehousing framework where
the static phase is separated by the dynamic phase, in order to achieve the real-time processing features
. We complete our analytical contributions by means of an extensive experimental campaign where we stress the performance of our proposed real-time data warehousing framework against a popular data warehouse benchmark, and in comparison with traditional architectures, which finally confirms the benefits deriving from our proposal.
This paper proposed two models of data warehouse schema for the fire department of DKI Jakarta, where the 1st model contains six tables consisting of 3 fact and 3-dimensional tables, and the 2nd ...model only contains three fact tables. The 2nd model denormalises the 1st model, where the number of tables is less than the 1st model, where at the end of the day, the 2nd model will reduce the join table process, which increases the SQL performances. These two models have been recognised as fact constellation schema with more than one fact table and sharing dimension and sub-dimension tables. The database resources were collected from http://data.jakarta.go.id under the Fire and Rescue Service Agency. Those two data warehouse schema models were developed based on a report sector list, a report on Hydrants list, and vehicle register reports. This paper proposes to support Automatic Identification Systems (AIS) research, particularly implementing the data warehouse concept.
Com o enorme fluxo de informação gerado pelas organizações, é importante que estas tenham a preocupação de armazenar os dados num sistema só, num data warehouse. Para além da extração, limpeza, ...transformação e entrega os dados de informação numa só base de dados, este sistema apoia as organizações na gestão e na tomada de decisões estratégicas. O objetivo deste artigo prende-se na exposição dos tópicos mais importantes a ter em conta no momento de implementar este sistema numa organização, como é o caso da escolha da arquitetura, fatores organizacionais internos que garantem o sucesso da implementação de um data warehouse e as questões de segurança, que não devem ser esquecidas. Por fim, é referido, de uma forma breve, uma perspetiva futura, que conjuga as tecnologias 5G e data warehouse, tornando o sistema mais eficiente, seguro e mais barato.
Presto: SQL on Everything Sethi, Raghav; Traverso, Martin; Sundstrom, Dain ...
2019 IEEE 35th International Conference on Data Engineering (ICDE),
2019-April
Conference Proceeding
Presto is an open source distributed query engine that supports much of the SQL analytics workload at Facebook. Presto is designed to be adaptive, flexible, and extensible. It supports a wide variety ...of use cases with diverse characteristics. These range from user-facing reporting applications with sub-second latency requirements to multi-hour ETL jobs that aggregate or join terabytes of data. Presto's Connector API allows plugins to provide a high performance I/O interface to dozens of data sources, including Hadoop data warehouses, RDBMSs, NoSQL systems, and stream processing systems. In this paper, we outline a selection of use cases that Presto supports at Facebook. We then describe its architecture and implementation, and call out features and performance optimizations that enable it to support these use cases. Finally, we present performance results that demonstrate the impact of our main design decisions.
With the global climate change and the rapid urbanization process, there is an increase in the risk of urban floods. Therefore, undertaking risk studies of urban floods, especially the depth ...prediction of urban flood is very important for urban flood control. In this study, an urban flood data warehouse was established with available structured and unstructured urban flood data. In this study, an urban flood data warehouse was established with available structured and unstructured urban flood data. Based on this, a regression model to predict the depth of urban flooded areas was constructed with deep learning algorithm, named Gradient Boosting Decision Tree (GBDT). The flood condition factors used in modeling were rainfall, rainfall duration, peak rainfall, evaporation, land use (the proportion of roads, woodlands, grasslands, water bodies and building), permeability, catchment area, and slope. Based on the rainfall data of different rainfall return periods, flood condition maps were produced using GIS. In addition, the feature importance of these conditioning factors was determined based on the regression model. The results demonstrated that the growth rate of the number and depth of the water accumulation points increased significantly after the rainfall return period of ‘once in every two years’ in Zhengzhou City, and the flooded areas mainly occurred in the old urban areas and parts of southern Zhengzhou. The relative error of prediction results was 11.52%, which verifies the applicability and validity of the method in the depth prediction of urban floods. The results of this study can provide a scientific basis for urban flood control and drainage.
Display omitted
•Data warehouse and deep learning algorithm were used to assess urban flood risk.•The GBDT model shows 88.48% accuracy in the depth of water accumulation prediction.•Flood risk analysis of Zhengzhou is mapping using the GBDT model.
Nowadays, the growing importance of modelling in software engineering is without a doubt reinforced by the blossoming of model-driven architecture (MDA). In this trend, MDA could be considered the ...most convenient approach to integrate the modelling process in data warehousing projects. On the other hand, decision-makers are usually unable to express their business needs in a concise way that allows getting a valid data warehouse (DW), mainly due to the lack of standard methodologies and tools devoted to supporting this situation. This fact might expand the gap between the business world and the IT world and causes troublesome difficulties to interpret and model DW requirements. Moreover, applying MDA for this kind of project requires using new tools to avoid this drawback. In this paper, we provide an MDA framework to design DW requirements and generate afterwards the multidimensional schema. The framework is based on UML profiles and presents to decision-makers a graphical tool for modelling their strategic visions in order to build the system-to-be. Besides, the proposal allows for dealing with data historization and metadata in the generated multidimensional model to perform properly the extract transform load process.
Data warehouse quality can be determined during the initial phases of data warehouse development by quantifying the structural complexity of multidimensional models using metrics. The structural ...complexity of a multidimensional model is guided by its elements, types, and relationships among those elements. So far, most of the researchers have dealt with metrics based on various elements (facts, dimensions, dimensional hierarchies, and hierarchy levels) existing in these models. However, not much consideration is given to different types of dimensions based on hierarchy types and different relationships among those elements. Therefore, this work proposes a comprehensive complexity metric for measuring multidimensional model complexity by taking into account various elements, their types and the relationships among the elements at various levels of granularity in these models. The theoretical validation of the proposed metric using the property-based framework given by Briand et al. characterises it as a complexity measure. Furthermore, the empirical study, employing statistical techniques (correlation and multinomial regression), on 26 multidimensional models and 20 subjects proved that the authors’ proposed metric is strongly correlated with multidimensional model understandability. Hence, this metric can be considered as a good predictor for data warehouse multidimensional model understandability.