This article evaluates the potential gains a workflow-aware storage system can bring. Two observations make us believe such storage system is crucial to efficiently support workflow-based ...applications: First, workflows generate irregular and application-dependent data access patterns. These patterns render existing generic storage systems unable to harness all optimization opportunities as this often requires enabling conflicting optimizations or even conflicting design decisions at the storage system level. Second, most workflow runtime engines make suboptimal scheduling decisions as they lack the detailed data location information that is generally hidden by the storage system. This paper presents a limit study that evaluates the potential gains from building a workflow-aware storage system that supports per-file access optimizations and exposes data location. Our evaluation using synthetic benchmarks and real applications shows that a workflow-aware storage system can bring significant performance gains: up to 3x performance gains compared to a vanilla distributed storage system deployed on the same resources yet unaware of the possible file-level optimizations.
Distributed computing systems employ replication to improve overall system robustness, scalability, and performance. A replica location service (RLS) offers a mechanism to maintain and provide ...information about physical locations of replicas. This paper defines a design framework for RLSs that supports a variety of deployment options. We describe the RLS implementation that is distributed with the Globus toolkit and is in production use in several grid deployments. Features of our modular implementation include the use of soft-state protocols to populate a distributed index and Bloom filter compression to reduce overheads for distribution of index information. Our performance evaluation demonstrates that the RLS implementation scales well for individual servers with millions of entries and up to 100 clients. We describe the characteristics of existing RLS deployments and discuss how RLS has been integrated with higher-level data management services.
Cooperative Secondary Authorization Recycling Qiang Wei; Ripeanu, M.; Beznosov, K.
IEEE transactions on parallel and distributed systems,
02/2009, Letnik:
20, Številka:
2
Journal Article
Recenzirano
Odprti dostop
As enterprise systems, Grids, and other distributed applications scale up and become increasingly complex, their authorization infrastructures--based predominantly on the request-response ...paradigm--are facing the challenges of fragility and poor scalability. We propose an approach where each application server recycles previously received authorizations and shares them with other application servers to mask authorization server failures and network delays. This paper presents the design of our cooperative secondary authorization recycling system and its evaluation using simulation and prototype implementation. The results demonstrate that our approach improves the availability and performance of authorization infrastructures. Specifically, by sharing authorizations, the cache hit rate--an indirect metric of availability--can reach 70 percent, even when only 10 percent of authorizations are cached. Depending on the deployment scenario, the average time for authorizing an application request can be reduced by up to a factor of two compared with systems that do not employ cooperation.
Mapping the Gnutella network Matei, R.; Iamnitchi, A.; Foster, P.
IEEE internet computing,
2002-Jan.-Feb., 2002-01-00, 20020101, Letnik:
6, Številka:
1
Journal Article
Recenzirano
We studied the topology and protocols of the public Gnutella network. Its substantial user base and open architecture make it a good large-scale, if uncontrolled, testbed. We captured the network's ...topology, generated traffic, and dynamic behavior to determine its connectivity structure and how well (if at all) Gnutella's overlay network topology maps to the physical Internet infrastructure. Our analysis of the network allowed us to evaluate costs and benefits of the peer-to-peer (P2P) approach and to investigate possible improvements that would allow better scaling and increased reliability in Gnutella and similar networks. A mismatch between Gnutella's overlay network topology and the Internet infrastructure has critical performance implications.
A decentralized, adaptive replica location mechanism Ripeanu, M.; Foster, I.
High Performance Distributed Computing: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing; 24-26 July 2002,
2002
Conference Proceeding
We describe a decentralized, adaptive mechanism for replica location in wide-area distributed systems. Unlike traditional, hierarchical (e.g, DNS) and more recent (e.g., CAN, Chord, Gnutella) ...distributed search and indexing schemes, nodes in our location mechanism do not route queries, instead, they organize into an overlay network and distribute location information. We contend that this approach works well in environments where replica location queries are prevalent but the dynamic component of the system (e.g., node and network failures, replica add/delete operations) cannot be neglected. We argue that a replica location mechanism that combines probabilistic representations of replica location information with soft-state protocols and a flat overlay network of nodes brings important benefits: genuine decentralization, low query latency, and flexibility to introduce adaptive communication schedules. We support these claims in two ways. First, we provide a rough resource consumption evaluation: we show that, for environments similar to those encountered in large scientific data analysis projects, generated network traffic is limited and, more importantly, is comparable to the traffic generated by a request routing scheme. Second, we provide encouraging performance data from a prototype implementation.
The Small World of File Sharing Iamnitchi, A; Ripeanu, M; Santos-Neto, E ...
IEEE transactions on parallel and distributed systems,
07/2011, Letnik:
22, Številka:
7
Journal Article
Recenzirano
Web caches, content distribution networks, peer-to-peer file-sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who use shared data. ...In each case, overall system performance can be improved significantly by first identifying and then exploiting the structure of community's data access patterns. We propose a novel perspective for analyzing data access workloads that considers the implicit relationships that form among users based on the data they access. We propose a new structure-the interest-sharing graph-that captures common user interests in data and justify its utility with studies on four data-sharing systems: a high-energy physics collaboration, the Web, the Kazaa peer-to-peer network, and a BitTorrent file-sharing community. We find small-world patterns in the interest-sharing graphs of all four communities. We investigate analytically and experimentally some of the potential causes that lead to this pattern and conclude that user preferences play a major role. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and it suggests the potential to exploit these naturally emerging patterns. As a proof of concept, we design and evaluate an information dissemination system that exploits the small-world interest-sharing graphs by building an interest-aware network overlay. We show that this approach leads to improved information dissemination performance.
DiPerF Dumitrescu, Catalin; Raicu, Ioan; Ripeanu, Matei ...
Fifth IEEE/ACM International Workshop on Grid Computing,
11/2004
Conference Proceeding
We present DiPerF, a distributed performance-testing framework, aimed at simplifying and automating service performance evaluation. DiPerF coordinates a pool of machines that test a target service, ...collects and aggregates performance metrics, and generates performance statistics. The aggregate data collected provide information on service throughput, on service 'fairness' when serving multiple clients concurrently, and on the impact of network latency on service performance. Furthermore, using this data, it is possible to build predictive models that estimate a service performance given the service load. We have tested DiPerF on 100+ machines on two testbeds, Grid3 and PlanetLab, and explored the performance of job submission services (pre-WS GRAM and WS GRAM) included with Globus Toolkit ® 3.2.
GPUs as Storage System Accelerators Al-Kiswany, S.; Gharaibeh, A.; Ripeanu, M.
IEEE transactions on parallel and distributed systems,
08/2013, Letnik:
24, Številka:
8
Journal Article
Recenzirano
Odprti dostop
Massively multicore processors, such as graphics processing units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost ...of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.
This paper explores the feasibility of a storage architecture that offers the reliability and access performance characteristics of a high-end system, yet is cost-efficient. We propose ThriftStore, a ...storage architecture that integrates two types of components: volatile, aggregated storage and dedicated, yet low-bandwidth durable storage. On the one hand, the durable storage forms a back end that enables the system to restore the data the volatile nodes may lose. On the other hand, the volatile nodes provide a high-throughput front-end. Although integrating these components has the potential to offer a unique combination of high throughput and durability at a low cost, a number of concerns need to be addressed to architect and correctly provision the system. To this end, we develop analytical and simulation-based tools to evaluate the impact of system characteristics (e.g., bandwidth limitations on the durable and the volatile nodes) and design choices (e.g., the replica placement scheme) on data availability and the associated system costs (e.g., maintenance traffic). Moreover, to demonstrate the high-throughput properties of the proposed architecture, we prototype a GridFTP server based on ThriftStore. Our evaluation demonstrates an impressive, up to 800 Mbps transfer throughput for the new GridFTP service.
Web caches, content distribution networks, peer-to-peer file sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who generate requests ...for shared data. In each case, overall system performance can be improved significantly if we can first identify and then exploit interesting structure within a community's access patterns. To this end, we propose a novel perspective on file sharing that considers the relationships that form among users based on the files in which they are interested. We propose a new structure that captures common user interests in data - the data-sharing graph - and justify its utility with studies on three data-distribution systems: a high-energy physics collaboration, the Web, and the Kazaa peer-to-peer network. We find small-world patterns in the data-sharing graphs of all three communities. We analyze these graphs and propose some probable causes for these emergent small-world patterns. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and, perhaps most importantly, it suggests ways to design mechanisms that exploit these naturally emerging patterns.