One role for workload generation is as a means for understanding how servers and networks respond to variation in load. This enables management and capacity planning based on current and projected ...usage. This paper applies a number of observations of Web server usage to create a realistic Web workload generation tool which mimics a set of real users accessing a server. The tool, called
Surge
(Scalable URL Reference Generator) generates references matching empirical measurements of 1) server file size distribution; 2) request size distribution; 3) relative file popularity; 4) embedded file references; 5) temporal locality of reference; and 6) idle periods of individual users. This paper reviews the essential elements required in the generation of a representative Web workload. It also addresses the technical challenges to satisfying this large set of simultaneous constraints on the properties of the reference stream, the solutions we adopted, and their associated accuracy. Finally, we present evidence that
Surge
exercises servers in a manner significantly different from other Web server benchmarks.
Topology discovery systems are starting to be introduced in the form of easily and widely deployed software. However, little consideration has been given as to how to perform large-scale topology ...discovery efficiently and in a network-friendly manner. In prior work, we have described how large numbers of traceroute monitors can coordinate their efforts to map the network while reducing their impact on routers and end-systems. The key is for them to share information regarding the paths they have explored. However, such sharing introduces considerable communication overhead. Here, we show how to improve the communication scaling properties through the use of Bloom filters to encode a probing stop set. Also, any system in which every monitor traces routes towards every destination has inherent scaling problems. We propose capping the number of monitors per destination, and dividing the monitors into clusters, each cluster focusing on a different destination list.
The ability of an ISP to infer traffic volumes that are not directly measurable can be useful for research, engineering, and business intelligence. Previous work has shown that traffic matrix ...completion is possible, but there is as yet no clear understanding of which ASes are likely to be able to perform TM completion, and which traffic flows can be inferred.
In this paper we investigate the relationship between the AS-level topology of the Internet and the ability of an individual AS to perform traffic matrix completion. We take a three-stage approach, starting from abstract analysis on idealized topologies, and then adding realistic routing and topologies, and finally incorporating realistic traffic on which we perform actual TM completion.
Our first set of results identifies which ASes are best-positioned to perform TM completion. We show, surprisingly, that for TM completion it does not help for an AS to have many peering links. Rather, the most important factor enabling an AS to perform TM completion is the number of direct customers it has. Our second set of results focuses on which flows can be inferred. We show that topologically close flows are easier to infer, and that flows passing through customers are particularly well suited for inference.
In this paper we develop a framework for analyzing patterns of a disease or pandemic such as Covid. Given a dataset which records information about the spread of a disease over a set of locations, we ...consider the problem of identifying both the disease's intrinsic waves (temporal patterns) and their respective spatial epicenters. To do so we introduce a new method of spatio-temporal decomposition which we call diffusion NMF (D-NMF). Building upon classic matrix factorization methods, D-NMF takes into consideration a spatial structuring of locations (features) in the data and supports the idea that locations which are spatially close are more likely to experience the same set of waves. To illustrate the use of D-NMF, we analyze Covid case data at various spatial granularities. Our results demonstrate that D-NMF is very useful in separating the waves of an epidemic and identifying a few centers for each wave.
Do online reviews reflect the true quality of products? Several articles, in both the popular press and the research community, have publicized that the average rating for top review sites is above 4 ...out of 5 stars. In this paper, we study the phenomena of review rating trends and convergence. We analyze data obtained from a popular restaurant review website, and present several models of increasing sophistication for the dynamics of the review ratings we observe.
Improving the performance of data transfers in the Internet (such as Web transfers) requires a detailed understanding of when and how delays are introduced. Unfortunately, the complexity of data ...transfers like those using HTTP is great enough that identifying the precise causes of delays is difficult. We describe a method for pinpointing where delays are introduced into applications like HTTP by using critical path analysis. By constructing and profiling the critical path, it is possible to determine what fraction of total transfer latency is due to packet propagation, network variation (e.g., queueing at routers or route fluctuation), packet losses, and delays at the server and at the client. We have implemented our technique in a tool called tcpeval that automates critical path analysis for Web transactions. We show that our analysis method is robust enough to analyze traces taken for two different TCP implementations (Linux and FreeBSD). To demonstrate the utility of our approach, we present the results of critical path analysis for a set of Web transactions taken over 14 days under a variety of server and network conditions. The results show that critical path analysis can shed considerable light on the causes of delays in Web transfers, and can expose subtleties in the behavior of the entire end-to-end system.
A major limiting factor for prediction algorithms is the forecast of new or never before-visited locations. Conventional personal models utterly relying on personal location data perform poorly when ...it comes to discoveries of new regions. The reason is explained by the prediction relying only on previously visited/seen (or known) locations. As a side effect, locations that were never visited before (or explorations) by a user cause disturbance to known location's prediction. Besides, such explorations cannot be accurately predicted. We claim the tackling of such limitation first requires identifying the purpose of the next probable movement. In this context, we propose a novel framework for adjusting prediction resolution when probable explorations are going to happen. As recently demonstrated 3, 15, there exist regularities in returning and exploring visits. Moreover, the geographical occurrences of explorations are far from being random in a coarser-grained spatial resolution. Exploiting these properties, instead of directly predicting a user's next location, we design a two-step predictive framework. First, we infer an individual's next type of transition: (i) a return, i.e., a visit to a previously known location, or (ii) an exploration, i.e., a discovery of a new place. Next, we predict the next location or the next coarse-grained zone depending on the inferred type of movement. We conduct extensive experiments on three real-world GPS mobility traces. The results demonstrate substantial improvements in the accuracy of prediction by dint of fruitfully forecasting coarse-grained zones used for exploration activities. To the best of our knowledge, we are the first to propose a framework solely based on personal location data to tackle the prediction of visits to new places.
Delegation forwarding Erramilli, Vijay; Crovella, Mark; Chaintreau, Augustin ...
Proceedings of the 9th ACM international symposium on Mobile ad hoc networking and computing,
05/2008
Conference Proceeding
Mobile opportunistic networks are characterized by unpredictable mobility, heterogeneity of contact rates and lack of global information. Successful delivery of messages at low costs and delays in ...such networks is thus challenging. Most forwarding algorithms avoid the cost associated with flooding the network by forwarding only to nodes that are likely to be good relays, using a quality metric associated with nodes. However it is non-trivial to decide whether an encountered node is a good relay at the moment of encounter. Thus the problem is in part one of online inference of the quality distribution of nodes from sequential samples, and has connections to optimal stopping theory. Based on these observations we develop a new strategy for forwarding, which we refer to as delegation forwarding.
We analyse two variants of delegation forwarding and show that while naive forwarding to high contact rate nodes has cost linear in the population size, the cost of delegation forwarding is proportional to the square root of population size. We then study delegation forwarding with different metrics using real mobility traces and show that delegation forwarding performs as well as previously proposed algorithms at much lower cost. In particular we show that the delegation scheme based on destination contact rate does particularly well.