Mapping the Gnutella network Matei, R.; Iamnitchi, A.; Foster, P.
IEEE internet computing,
2002-Jan.-Feb., 2002-01-00, 20020101, Letnik:
6, Številka:
1
Journal Article
Recenzirano
We studied the topology and protocols of the public Gnutella network. Its substantial user base and open architecture make it a good large-scale, if uncontrolled, testbed. We captured the network's ...topology, generated traffic, and dynamic behavior to determine its connectivity structure and how well (if at all) Gnutella's overlay network topology maps to the physical Internet infrastructure. Our analysis of the network allowed us to evaluate costs and benefits of the peer-to-peer (P2P) approach and to investigate possible improvements that would allow better scaling and increased reliability in Gnutella and similar networks. A mismatch between Gnutella's overlay network topology and the Internet infrastructure has critical performance implications.
Distributed computing systems employ replication to improve overall system robustness, scalability, and performance. A replica location service (RLS) offers a mechanism to maintain and provide ...information about physical locations of replicas. This paper defines a design framework for RLSs that supports a variety of deployment options. We describe the RLS implementation that is distributed with the Globus toolkit and is in production use in several grid deployments. Features of our modular implementation include the use of soft-state protocols to populate a distributed index and Bloom filter compression to reduce overheads for distribution of index information. Our performance evaluation demonstrates that the RLS implementation scales well for individual servers with millions of entries and up to 100 clients. We describe the characteristics of existing RLS deployments and discuss how RLS has been integrated with higher-level data management services.
Web caches, content distribution networks, peer-to-peer file sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who generate requests ...for shared data. In each case, overall system performance can be improved significantly if we can first identify and then exploit interesting structure within a community's access patterns. To this end, we propose a novel perspective on file sharing that considers the relationships that form among users based on the files in which they are interested. We propose a new structure that captures common user interests in data - the data-sharing graph - and justify its utility with studies on three data-distribution systems: a high-energy physics collaboration, the Web, and the Kazaa peer-to-peer network. We find small-world patterns in the data-sharing graphs of all three communities. We analyze these graphs and propose some probable causes for these emergent small-world patterns. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and, perhaps most importantly, it suggests ways to design mechanisms that exploit these naturally emerging patterns.
A peer-to-peer approach to resource location in grid environments Iamnitchi, A.; Foster, I.; Nurmi, D.C.
High Performance Distributed Computing: Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing; 24-26 July 2002,
2002
Conference Proceeding
Computational grids provide mechanisms for sharing and accessing large and heterogeneous collections of remote resources such as computers, online instruments, storage space, data, and applications. ...Resources are requested by specifying a set of desired attributes. Resource attributes have various degrees of dynamism, from mostly static attributes, such as operating system version, to highly dynamic ones, such as available network bandwidth or CPU load. Another dimension of dynamism is introduced by variable and highly diverse sharing policies: resources are made available to the grid community based on locally defined and potentially changing policies.
Information dissemination is a fundamental and frequently occurring problem in large, dynamic, distributed systems. We propose a novel approach to this problem, interest-aware information ...dissemination, that takes advantage of small-world usage patterns in data-sharing communities. These small-world characteristics suggest that users naturally form groups of common interest. We propose algorithms for identifying these groups dynamically, without a need for explicit classification of topics or declaration of user interests. These algorithms use information about the data consumed by users to identify, via online computation, groups with similar interests. As a proof of concept, we apply this methodology to the problem of locating files in large user communities. Using real-world traces from a scientific community and from a peer-to-peer system, we show that proactive information dissemination within groups of common interest can reduce the search load by up to 70%. In addition, this approach naturally supports the efficient discovery of collections of files, a requirement specific to scientific data analysis tasks. We hypothesize that our algorithms can find numerous other uses in distributed systems, such as reputation management.
The Small World of File Sharing Iamnitchi, A; Ripeanu, M; Santos-Neto, E ...
IEEE transactions on parallel and distributed systems,
07/2011, Letnik:
22, Številka:
7
Journal Article
Recenzirano
Web caches, content distribution networks, peer-to-peer file-sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who use shared data. ...In each case, overall system performance can be improved significantly by first identifying and then exploiting the structure of community's data access patterns. We propose a novel perspective for analyzing data access workloads that considers the implicit relationships that form among users based on the data they access. We propose a new structure-the interest-sharing graph-that captures common user interests in data and justify its utility with studies on four data-sharing systems: a high-energy physics collaboration, the Web, the Kazaa peer-to-peer network, and a BitTorrent file-sharing community. We find small-world patterns in the interest-sharing graphs of all four communities. We investigate analytically and experimentally some of the potential causes that lead to this pattern and conclude that user preferences play a major role. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and it suggests the potential to exploit these naturally emerging patterns. As a proof of concept, we design and evaluate an information dissemination system that exploits the small-world interest-sharing graphs by building an interest-aware network overlay. We show that this approach leads to improved information dissemination performance.
The idle computers on a local area, campus area, or even wide area network represent a significant computational resource-one that is, however, also unreliable, heterogeneous, and opportunistic. We ...describe an algorithm that allows branch-and-bound problems to be solved in such environments. In designing this algorithm, we faced two challenges: (1) scalability, to effectively exploit the variably sized pools of resources available, and (2) fault tolerance, to ensure the reliability of services. We achieve scalability through a fully decentralized algorithm, in which the dynamically available resources are managed through a membership protocol. We guarantee fault tolerance in the sense that the loss of up to all but one resource will not affect the quality of the solution. For propagating information reliably, we use epidemic communication for both the membership protocol and the fault-tolerance mechanism. We have developed a simulation framework that allows us to evaluate design alternatives. Results obtained in this framework suggest that our techniques can execute scalably and reliably.
As the Internet's hourglass architecture connects various resources to various applications, an infrastructure that collects information from various social signals can support an ever-evolving set ...of socially aware applications and services. Among the proposed infrastructure's features are social sensors to capture and interpret social signals from user interactions, a personal social information aggregator, and a set of social-inference functions as its API for social applications.
Efficient data sharing in global peer-to-peer systems is complicated by erratic node failure, unreliable network connectivity and limited bandwidth. Replicating data on multiple nodes can improve ...availability and response time. Yet determining when and where to replicate data in order to meet performance goals in large-scale systems with many users and files, dynamic network characteristics, and changing user behavior is difficult. We propose an approach in which peers create replicas automatically in a decentralized fashion, as required to meet availability goals. The aim of our framework is to maintain a threshold level of availability at all times. We identify a set of factors that hinder data availability and propose a model that decides when more replication is necessary. We evaluate the accuracy and performance of the proposed model using simulations. Our preliminary results show that the model is effective in predicting the required number of replicas in the system.