In this paper, we introduce a novel approach for identifying and testing relationships and patterns on the types of sequential data that are broadly present in a number of different real-world ...scenarios and environments. The proposed two-phase framework combines data preparation, data visualization and clustering techniques in an innovative way. The first phase of the framework explores the large amount of sequential data in stages that can be undertaken iteratively. Those stages include data preparation, counting and value-based ordering, distribution visualization, and subsequence length determination, confirmation and re-visualization. The second phase of the framework explores sequence differences, based on motifs, between data cohorts that are created using descriptive attributes, and visualizes the changes over time and different attribute values. To illustrate the analytical power of the proposed framework, we present a comprehensive example that applies the framework on a large formally-maintained research data set collected and managed by the US Census Bureau. The framework, and the presented example, utilize visualization as an analytics tool and not just a presentation accessory.
•a framework is introduced for identifying patterns on the types of sequential data that are broadly present in real-world•the purpose and details of the framework are described in a formal way, and also via a series of illustrative examples•the framework is tested and validated through a comprehensive example using a large formally-maintained data set•the broad applicability of the introduced network is discussed
Augmenting Data Warehouses with Big Data Jukić, Nenad; Sharma, Abhishek; Nestorov, Svetlozar ...
Information systems management,
01/2015, Letnik:
32, Številka:
3
Journal Article
Recenzirano
In the past decade, corporations are increasingly engaging in efforts whose aim is the analysis and wide-ranging use of big data. The majority of academic big data articles have been focused on ...methods, approaches, opportunities, and organizational impact of big data analytics. In this article, the focus is on the ability of big data (while acting as a direct source for impactful analysis) to also augment and enrich the analytical power of data warehouses.
We propose a methodology for providing clear and consistent integration of the process and data logic in the analysis stage of information systems' development lifecycle. While our proposed approach ...is applicable across a variety of data and process modeling schemas, in this paper we discuss it in the context of UML use cases for process modeling and ER diagrams for data modeling. We illustrate our approach through an example of modeling an execution of a retail transaction. In our example we integrate a step-by-step process model and the corresponding data model at the attribute level detail. We discuss the potential benefits of this approach by illustrating how this methodology, by providing a critical link between process and data models, can result in better conceptual testing early in the analysis process, ensuring better semantic quality of both process and data models.
Polyinstantiation is the situation where multiple records sharing the same identifier value occur in one table. Multi-level secure (MLS) data models manage and utilize polyinstantiation to provide a ...secure way of handling classified information. Customer Relationship Management (CRM) systems for e-business can leverage the strategy of managed polyinstantiation by implementing MLS technology to coordinate B2C interactions in order to build long-term loyalty. This approach can be used to address some of the challenges faced by providers and adopters of e-business CRM technology solutions. A pilot study evaluated polyinstantiated information-presentation strategy as a means of enhancing relationships between e-businesses and their customers. The results support the idea that customers perceive the benefits of their special customer status as a function of how the relevant data are presented.
The approaches and discussions given in this paper offer applicable solutions for a number of scenarios taking place in the contemporary world that are dealing with performance issues in development ...and use of analytical databases for the support of both tactical and strategic decision making. The paper introduces a novel method for expediting the development and use of analytical databases that combines columnar database technology with an approach based on denormalizing data tables for analysis and decision support. This method improves the feasibility and quality of tactical decision making by making critical information more readily available. It also improves the quality of longer term strategic decision making by widening the range of feasible queries against the vast amounts of available information. The advantages include the improvements in the performance of the ETL process (the most common time-consuming bottleneck in most implementations of data warehousing for quality decision support) and in the performance of the individual analytical queries. These improvements in the critical decision support infrastructure are achieved without resulting in insurmountable storage-size increase requirements. The efficiencies and advantages of the introduced approach are illustrated by showing the application in two relevant real-world cases.
•A novel method for development and use of analytical databases is introduced.•The method combines columnar database technology with denormalizing data tables for analysis and decision support.•The method making critical information more readily available and widens the range of feasible queries.•The efficiencies and advantages of the introduced approach are illustrated by showing two real-world implementations.
Although numerous academic and industry studies have identified a variety of connections among strategies, processes, and systems, the phenomenon of information system adoption and ...post-implementation failure is still common. One factor in system adoption that has not yet been thoroughly considered is the conceptualization of processes and systems at multiple levels of complexity. This paper uses a case study to illustrate a framework that outlines the flow of conceptualization efforts during system adoption and implementation.
The Internet is making a significant transition from primarily a network of desktop computers to a network variety of connected information devices such as personal digital assistants and global ...positioning system-based devices. On the other hand, new paradigms such as overlay networks are defining service-based logical architecture for the network services that make locating content and routing more efficient. Along with Internet2's proposed service-based routing, overlay networks will create a new set of challenges in the provision and management of content over the network. However, a lack of proper infrastructure investment incentive may lead to an environment where network growth may not keep pace with the service requirements. In this paper, we present an analysis of investment incentives for network infrastructure owners under two different pricing strategies: congestion-based negative externality pricing and the prevalent flat-rate pricing. We develop a theoretically motivated gradient-based heuristic to compute maximum capacity that a network provider will be willing to invest in under different pricing schemes. The heuristic appropriates different capacities to different network components based on demand for these components. We then use a simulation model to compare the impact of dynamic congestion-based pricing with flat-rate pricing on the choice of capacity level by the infrastructure provider. The simulation model implements the heuristic and ensures that near-optimal level of capacity is allocated to each network component by checking theoretical optimality conditions. We investigate the impact of a variety of factors, including the per unit cost of capacity of a network resource, average value of the users' requests, average level of users' tolerance for delay, and the level of exogenous demand for services on the network. Our results indicate that relationships between these factors are crucial in determining which of the two pricing schemes results in a higher level of socially optimal network capacity. The simulation results provide a possible explanation for the evolution of the Internet pricing from time-based to flat-rate pricing. The results also indicate that regardless of how these factors are related, the average stream of the net benefits realized under congestion-based pricing tends to be higher than the average net benefits realized under flat-rate pricing. These central results point to the fallacy of the arguments presented by the supporters of net neutrality that do not consider the incentives for private investment in network capacity.
Though data warehousing is widely recognized in the industry as the principal decision support system architecture and an integral part of the corporate information system, the majority of academic ...institutions in the US and world-wide have been slow in developing curriculums that reflect this. The authors examine the issues that have contributed to the lag in the coverage of data warehousing topics at universities and introduce methods, concepts and resources that can enable business educators to deal with these issues and conduct comprehensive, detailed, and meaningful coverage of data warehouse related topics.
Data warehouse is widely recognized in the industry as the principal decision support system architecture and an integral part of the corporate information system. However, the majority of academic ...institutions in the US and world-wide have been slow in developing curriculums that reflect this reality. This paper examines the issues that have contributed to the lag in the coverage of data warehousing topics at universities.
In this paper we propose a novel framework for providing clear and consistent integration of the process and data logic in the analysis stage of information systems' development lifecycle. While our ...proposed approach is generally applicable across a variety of modeling schemas, we will discuss it in the concrete context of UML use cases for process modeling and Entity Relationship diagrams for data modeling. We illustrate our approach through an example of integrating a step-by-step process model and the corresponding data model at the attribute level detail. We discuss the potential benefits of this approach by illustrating its potential for providing better conceptual testing early in the analysis process, ensuring better semantic quality of both process and data models.