Nowadays, modernizing the data warehouse ecosystem is a key challenge in decision support systems. This modernization is crucial for ensuring scalability and meeting evolving business requirements, ...especially with the advent of big data. A promising solution involves implementing data warehouses with contemporary data stores, such as NoSQL. In this context, we introduce in this paper a framework that leverages Model-Driven Architecture (MDA) to design and implement modern data warehouses across NoSQL data stores. Our MDA approach aims to offer a collaborative, dynamic, and reusable process for developing NoSQL-oriented data warehouses tailored to specific project requirements. It facilitates the automatic and dynamic generation of a hybrid data warehouse model from its conceptual model, which encompasses structural, domain, and access parameters. Moreover, our framework includes the generation of implementation code for the data warehouse, along with a set of files to validate, document, and illustrate the data warehouse schema on a target platform. Finally, we present a detailed case study to highlight the effectiveness of our MDA framework.
•Approach to support performance comparisons in a federation of data marts.•Multidimensional queries over a global model integrating local sources.•An extension to the multidimensional model with ...mathematical formulas for indicators.•A query reformulation approach exploiting aggregation and indicator decomposition.•Computational analysis of the reformulation algorithm and proof of correctness.
Measurement and comparison of performances in networked organisations is particularly critical because of heterogeneity and sparsity of data. In particular, each organization is autonomous in the definitions of which measures to use and their calculation formulas, i.e. the mathematical expressions stating how a measure is calculated from others. Hence, full integration of data marts requires a reconciliation among such heterogeneous definitions in order to support evaluation of cross-organizations performances and to produce meaningful comparisons.
To address this issue, this paper proposes (1) an extension of the traditional multidimensional model by taking into account the explicit representation of the semantics for measure formulas, and, on the top of this model, (2) a novel query reformulation approach for a scenario of federated data warehouses. The approach exploits both aggregation and, unlike traditional approaches, measure decomposition through the calculation of measure formulas. This extends usual features of query rewriting based on views, allowing to overcome heterogeneities at measure level among data mart schemas and enabling meaningful comparisons among values of different autonomous data marts. A formalization of the rewriting algorithm is proposed, together with a computational analysis, proofs of correctness and termination, and an evaluation of effectiveness that shows how the approach can lead to a significant increase in the capability of integrating indicators to answer queries in a federated scenario.
Abstract
As one of the environmentally sensitive areas, the forestry industry required fast and accurate data processing for decision making, and data warehouses could be used to meet those needs. ...Decision-makers need aggregate, historical, and multi-dimensional data. Provided a data warehouse to present aggregate data could improve efficiency in presenting data. In this study, the data warehouse was built using the fact constellation scheme and the ETL process using query commands in MySQL. The level of efficiency was measured by comparing the number of tables, record length, number of records, and total bytes needed for each report. The efficiency of the report presentation was also compared with OLTP data and from the data warehouse. The test results in data management obtained an efficiency level of 56% for the number of tables, 145% for the length of records, 15,833% for the number of records, and 48,846% for the total bytes. On average, the level of efficiency in data management was 15,720%, while the efficiency of report presentation speed was 852%. This study showed that the use of data warehouses was very efficient in managing aggregate data for the forestry industry.
The MESSAGE Integrated Assessment Model (IAM) developed by IIASA has been a central tool of energy-environment-economy systems analysis in the global scientific and policy arena. It played a major ...role in the Assessment Reports of the Intergovernmental Panel on Climate Change (IPCC); it provided marker scenarios of the Representative Concentration Pathways (RCPs) and the Shared Socio-Economic Pathways (SSPs); and it underpinned the analysis of the Global Energy Assessment (GEA). Alas, to provide relevant analysis for current and future challenges, numerical models of human and earth systems need to support higher spatial and temporal resolution, facilitate integration of data sources and methodologies across disciplines, and become open and transparent regarding the underlying data, methods, and the scientific workflow.
In this manuscript, we present the building blocks of a new framework for an integrated assessment modeling platform; the “ecosystem” comprises: i) an open-source GAMS implementation of the MESSAGE energy++ system model integrated with the MACRO economic model; ii) a Java/database back-end for version-controlled data management, iii) interfaces for the scientific programming languages Python & R for efficient input data and results processing workflows; and iv) a web-browser-based user interface for model/scenario management and intuitive “drag-and-drop” visualization of results.
The framework aims to facilitate the highest level of openness for scientific analysis, bridging the need for transparency with efficient data processing and powerful numerical solvers. The platform is geared towards easy integration of data sources and models across disciplines, spatial scales and temporal disaggregation levels. All tools apply best-practice in collaborative software development, and comprehensive documentation of all building blocks and scripts is generated directly from the GAMS equations and the Java/Python/R source code.
•We present an open-source implementation of the MESSAGEix integrated assessment model.•MESSAGEix is fully integrated with the powerful ix modeling platform (ixmp).•The framework has interfaces to the scientific programming languages Python and R.•A powerful database backend supports data version control and scientific workflows.•All modules are applying best-practice collaborative development standards.
Research addressing value in healthcare requires a measure of cost. While there are many sources and types of cost data, each has strengths and weaknesses. Many researchers appear to create ...study-specific cost datasets, but the explanations of their costing methodologies are not always clear, causing their results to be difficult to interpret. Our solution, described in this paper, was to use widely accepted costing methodologies to create a service-level, standardized healthcare cost data warehouse from an institutional perspective that includes all professional and hospital-billed services for our patients.
The warehouse is based on a National Institutes of Research-funded research infrastructure containing the linked health records and medical care administrative data of two healthcare providers and their affiliated hospitals. Since all patients are identified in the data warehouse, their costs can be linked to other systems and databases, such as electronic health records, tumor registries, and disease or treatment registries.
We describe the two institutions' administrative source data; the reference files, which include Medicare fee schedules and cost reports; the process of creating standardized costs; and the warehouse structure. The costing algorithm can create inflation-adjusted standardized costs at the service line level for defined study cohorts on request.
The resulting standardized costs contained in the data warehouse can be used to create detailed, bottom-up analyses of professional and facility costs of procedures, medical conditions, and patient care cycles without revealing business-sensitive information. After its creation, a standardized cost data warehouse is relatively easy to maintain and can be expanded to include data from other providers. Individual investigators who may not have sufficient knowledge about administrative data do not have to try to create their own standardized costs on a project-by-project basis because our data warehouse generates standardized costs for defined cohorts upon request.
Abstract
Objective
The objective was to develop and operate a cloud-based federated system for managing, analyzing, and sharing patient data for research purposes, while allowing each resource ...sharing patient data to operate their component based upon their own governance rules. The federated system is called the Biomedical Research Hub (BRH).
Materials and Methods
The BRH is a cloud-based federated system built over a core set of software services called framework services. BRH framework services include authentication and authorization, services for generating and assessing findable, accessible, interoperable, and reusable (FAIR) data, and services for importing and exporting bulk clinical data. The BRH includes data resources providing data operated by different entities and workspaces that can access and analyze data from one or more of the data resources in the BRH.
Results
The BRH contains multiple data commons that in aggregate provide access to over 6 PB of research data from over 400 000 research participants.
Discussion and conclusion
With the growing acceptance of using public cloud computing platforms for biomedical research, and the growing use of opaque persistent digital identifiers for datasets, data objects, and other entities, there is now a foundation for systems that federate data from multiple independently operated data resources that expose FAIR application programming interfaces, each using a separate data model. Applications can be built that access data from one or more of the data resources.
Due to the principal role of Data warehouses (DW) in making strategy decisions, data warehouse quality is crucial for organizations. Therefore, we should use methods, models, techniques and tools to ...help us in designing and maintaining high quality DWs. In the last years, there have been several approaches to design DWs from the conceptual, logical and physical perspectives. However, from our point of view, none of them provides a set of empirically validated metrics (objective indicators) to help the designer in accomplishing an outstanding model that guarantees the quality of the DW. In this paper, we firstly summarise the set of metrics we have defined to measure the understandability (a quality subcharacteristic) of conceptual models for DWs, and present their theoretical validation to assure their correct definition. Then, we focus on deeply describing the empirical validation process we have carried out through a family of experiments performed by students, professionals and experts in DWs. This family of experiments is a very important aspect in the process of validating metrics as it is widely accepted that only after performing a family of experiments, it is possible to build up the cumulative knowledge to extract useful measurement conclusions to be applied in practice. Our whole empirical process showed us that several of the proposed metrics seems to be practical indicators of the understandability of conceptual models for DWs.
Data warehousing (DW) is a widespread and essential practice in business organizations that support the data analytic and decision-making process. Despite the importance of DW in complex ...organizations, the adoption of a data warehouse (DWH) in education is apparently lower compared with other industries. To clarify this situation, this paper presents a systematic mapping that includes the study of empirical research papers from 2008 to 2018 on the topic of DW in education. For this paper, we applied a qualitative and quantitative approach based on a four-stage research method with the objective to have a holistic view of DWHs in education. After filtering and applying the proposed method, 34 relevant papers were identified and studied in detail. The study revealed interesting facts; for example, Kimball's approach is the most applied methodology for DWH design in education. In addition, a mapping between this comprehensive collection of research papers covering educational DW and six dimensions of analysis (schema proposal, analysis of the user requirements, analysis of the business requirements, effectiveness, implementation, and data analysis) was performed. From this analysis, we discovered that the star schema is the most implemented approach. The purpose of the mapping was to explore and identify the priority areas of research and the research gaps within the academic community. These gaps are a source of opportunities to start new lines of research.
Abstract
With the needs of economic development, many colleges and universities offer German courses. However, because German course has just started in domestic universities, its teaching method is ...more simply to apply the ordinary German teaching mode, without too much consideration of the characteristics of higher education, and its teaching effect is not satisfactory. Therefore, the purpose of this paper is to make innovative research on German teaching mode in colleges and universities from the perspective of big data(BD). Firstly, this paper discusses the current situation of German teaching mode construction and resources in Chinese universities, and summarizes the main problems existing in contemporary German teaching mode in Chinese universities. The new teaching mode of German in colleges and universities puts the feelings of college students in the main position, guides students to learn actively through behavior guidance, and pays more attention to students’ communicative competence. This teaching mode also utilizes the storage architecture of data warehouse, combined with BD and other related technologies, and realizes the storage and update of various German teaching resources, which meets the innovative needs of universities for German teaching mode. Finally, this treatise compares the innovative modes of teaching German in universities with traditional modes of teaching. Experimental results show that the innovative mode of teaching German in college is recognized and loved by students, and their love for German has increased by about 30%, which plays a significant role in improving students’ enthusiasm for learning German and provides an important reference for the innovative research of German teaching mode in colleges and universities.