Open-access distributed systems such as peer-to-peer systems are particularly vulnerable to sybil attacks , where a malicious user creates multiple fake identities (called sybil nodes ). Without a ...trusted central authority that can tie identities to real human beings, defending against sybil attacks is quite challenging. Among the small number of decentralized approaches, our recent SybilGuard protocol leverages a key insight on social networks to bound the number of sybil nodes accepted. Despite its promising direction, SybilGuard can allow a large number of sybil nodes to be accepted. Furthermore, SybilGuard assumes that social networks are fast-mixing, which has never been confirmed in the real world. This paper presents the novel SybilLimit protocol that leverages the same insight as SybilGuard, but offers dramatically improved and near-optimal guarantees. The number of sybil nodes accepted is reduced by a factor of Θ(√n ), or around 200 times in our experiments for a million-node system. We further prove that SybilLimit's guarantee is at most a logn factor away from optimal when considering approaches based on fast-mixing social networks. Finally, based on three large-scale real-world social networks, we provide the first evidence that real-world social networks are indeed fast-mixing. This validates the fundamental assumption behind SybilLimit's and SybilGuard's approach.
Decentralized distributed systems such as peer-to-peer systems are particularly vulnerable to sybil attacks, where a malicious user pretends to have multiple identities (called sybil nodes). Without ...a trusted central authority, defending against sybil attacks is quite challenging. Among the small number of decentralized approaches, our recent SybilGuard protocol H. Yu et al., 2006 leverages a key insight on social networks to bound the number of sybil nodes accepted. Although its direction is promising, SybilGuard can allow a large number of sybil nodes to be accepted. Furthermore, SybilGuard assumes that social networks are fast mixing, which has never been confirmed in the real world. This paper presents the novel SybilLimit protocol that leverages the same insight as SybilGuard but offers dramatically improved and near-optimal guarantees. The number of sybil nodes accepted is reduced by a factor of ominus(radicn), or around 200 times in our experiments for a million-node system. We further prove that SybilLimit's guarantee is at most a log n factor away from optimal, when considering approaches based on fast-mixing social networks. Finally, based on three large-scale real-world social networks, we provide the first evidence that real-world social networks are indeed fast mixing. This validates the fundamental assumption behind SybilLimit's and SybilGuard's approach.
A case for richer cross-layer abstractions Vijaykumar, Nandita; Jain, Abhilasha; Majumdar, Diptesh ...
2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA),
06/2018
Conference Proceeding
This paper makes a case for a new cross-layer interface, Expressive Memory (XMem), to communicate higher-level program semantics from the application to the system software and hardware architecture. ...XMem provides (i) a flexible and extensible abstraction, called an Atom, enabling the application to express key program semantics in terms of how the program accesses data and the attributes of the data itself, and (ii) new cross-layer interfaces to make the expressed higher-level information available to the underlying OS and architecture. By providing key information that is otherwise unavailable, XMem exposes a new, rich view of the program data to the OS and the different architectural components that optimize memory system performance (e.g., caches, memory controllers).
By bridging the semantic gap between the application and the underlying memory resources, XMem provides two key benefits. First, it enables architectural/system-level techniques to leverage key program semantics that are challenging to predict or infer. Second, it improves the efficacy and portability of software optimizations by alleviating the need to tune code for specific hardware resources (e.g., cache space). While XMem is designed to enhance and enable a wide range of memory optimizations, we demonstrate the benefits of XMem using two use cases: (i) improving the performance portability of software-based cache optimization by expressing the semantics of data locality in the optimization and (ii) improving the performance of OS-based page placement in DRAM by leveraging the semantics of data structures and their access properties.
Previous approaches for computing duplicate-sensitive aggregates in wireless sensor networks have used a tree topology, in order to conserve energy and to avoid double-counting sensor readings. ...However, a tree topology is not robust against node and communication failures, which are common in sensor networks. In this article, we present
synopsis diffusion
, a general framework for achieving significantly more accurate and reliable answers by combining energy-efficient multipath routing schemes with techniques that avoid double-counting. Synopsis diffusion avoids double-counting through the use of
order- and duplicate-insensitive (ODI) synopses
that compactly summarize intermediate results during in-network aggregation. We provide a surprisingly simple test that makes it easy to check the correctness of an ODI synopsis. We show that the properties of ODI synopses and synopsis diffusion create
implicit
acknowledgments of packet delivery. Such acknowledgments enable energy-efficient adaptation of message routes to dynamic message loss conditions, even in the presence of asymmetric links. Finally, we illustrate using extensive simulations the significant robustness, accuracy, and energy-efficiency improvements of synopsis diffusion over previous approaches.
We introduce a set of new Compression-Aware Management Policies (CAMP) for on-chip caches that employ data compression. Our management policies are based on two key ideas. First, we show that it is ...possible to build a more efficient management policy for compressed caches if the compressed block size is directly used in calculating the value (importance) of a block to the cache. This leads to Minimal-Value Eviction (MVE), a policy that evicts the cache blocks with the least value, based on both the size and the expected future reuse. Second, we show that, in some cases, compressed block size can be used as an efficient indicator of the future reuse of a cache block. We use this idea to build a new insertion policy called Size-based Insertion Policy (SIP) that dynamically prioritizes cache blocks using their compressed size as an indicator. We compare CAMP (and its global variant G-CAMP) to prior on-chip cache management policies (both size-oblivious and size-aware) and find that our mechanisms are more effective in using compressed block size as an extra dimension in cache management decisions. Our results show that the proposed management policies (i) decrease off-chip bandwidth consumption (by 8.7% in single-core), (ii) decrease memory subsystem energy consumption (by 7.2% in single-core) for memory intensive workloads compared to the best prior mechanism, and (iii) improve performance (by 4.9%/9.0%/10.2% on average in single-/two-/four-core workload evaluations and up to 20.1%) CAMP is effective for a variety of compression algorithms and different cache designs with local and global replacement strategies.
Tributaries and deltas Manjhi, Amit; Nath, Suman; Gibbons, Phillip B.
International Conference on Management of Data: Proceedings of the 2005 ACM SIGMOD international conference on Management of data; 14-16 June 2005,
06/2005
Conference Proceeding
Existing energy-efficient approaches to in-network aggregation in sensor networks can be classified into two categories, tree-based and multi-path-based, with each having unique strengths and ...weaknesses. In this paper, we introduce Tributary-Delta, a novel approach that combines the advantages of the tree and multi-path approaches by running them simultaneously in different regions of the network. We present schemes for adjusting the regions in response to changes in network conditions, and show how many useful aggregates can be readily computed within this new framework. We then show how a difficult aggregate for this context---finding frequent items---can be efficiently computed within the framework. To this end, we devise the first algorithm for frequent items (and for quantiles) that provably minimizes the worst case total communication for non-regular trees. In addition, we give a multi-path algorithm for frequent items that is considerably more accurate than previous approaches. These algorithms form the basis for our efficient Tributary-Delta frequent items algorithm. Through extensive simulation with real-world and synthetic data, we show the significant advantages of our techniques. For example, in computing Count under realistic loss rates, our techniques reduce answer error by up to a factor of 3 compared to any previous technique.
Many modern high-performance processors prefetch blocks into the on-chip cache. Prefetched blocks can potentially pollute the cache by evicting more useful blocks. In this work, we observe that both ...accurate and inaccurate prefetches lead to cache pollution, and propose a comprehensive mechanism to mitigate prefetcher-caused cache pollution. First, we observe that over 95% of useful prefetches in a wide variety of applications are not reused after the first demand hit (in secondary caches). Based on this observation, our first mechanism simply demotes a prefetched block to the lowest priority on a demand hit. Second, to address pollution caused by inaccurate prefetches, we propose a self-tuning prefetch accuracy predictor to predict if a prefetch is accurate or inaccurate. Only predicted-accurate prefetches are inserted into the cache with a high priority. Evaluations show that our final mechanism, which combines these two ideas, significantly improves performance compared to both the baseline LRU policy and two state-of-the-art approaches to mitigating prefetcher-caused cache pollution (up to 49%, and 6% on average for 157 two-core multiprogrammed workloads). The performance improvement is consistent across a wide variety of system configurations.
Multi-party communication complexity involves distributed computation of a function over inputs held by multiple distributed players. A key focus of distributed computing research, since the very ...beginning, has been to tolerate failures. It is thus natural to ask “If we want to compute a certain function in a fault-tolerant way, what will the communication complexity be?” For this question, this article will focus specifically on (i) tolerating node crash failures, and (ii) computing the function over general topologies (instead of, e.g., just cliques).
One way to approach this question is to first develop results in a simpler failure-free setting, and then “amend” the results to take into account failures' impact. Whether this approach is effective largely depends on how big a difference failures can make. This article proves that the impact of failures is significant, at least for the Sum aggregate function in general topologies: As our central contribution, we prove that there exists (at least) an exponential gap between the non-fault-tolerant and fault-tolerant communication complexity of S
um
. This gap attests that fault-tolerant communication complexity needs to be studied separately from non-fault-tolerant communication complexity, instead of being considered as an “amended” version of the latter. Such exponential gap is not obvious: For some other functions such as the M
ax
aggregate function, the gap is only logarithmic.
Part of our results are obtained via a novel reduction from a new two-party problem U
nion
S
ize
CP that we introduce. U
nion
S
ize
CP comes with a novel
cycle promise
, which is the key enabler of our reduction. We further prove that this cycle promise and U
nion
S
ize
CP likely play a fundamental role in reasoning about fault-tolerant communication complexity.
Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the standard hash join algorithm for disk-oriented databases (i.e. GRACE) spends over 80% of its user time stalled ...on CPU cache misses, and explores the use of CPU cache
prefetching
to improve its cache performance. Applying prefetching to hash joins is complicated by the data dependencies, multiple code paths, and inherent randomness of hashing. We present two techniques,
group prefetching
and
software-pipelined prefetching
, that overcome these complications. These schemes achieve 1.29--4.04X speedups for the join phase and 1.37--3.49X speedups for the partition phase over GRACE and simple prefetching approaches. Moreover, compared with previous cache-aware approaches (i.e. cache partitioning), the schemes are at least 36% faster on large relations and do not require exclusive use of the CPU cache to be effective. Finally, comparing the elapsed real times when disk I/Os are in the picture, our cache prefetching schemes achieve 1.12--1.84X speedups for the join phase and 1.06--1.60X speedups for the partition phase over the GRACE hash join algorithm.
As robots increasingly permeate modern society, it is crucial for the system and hardware research community to bridge its long-standing gap with robotics. This divide has persisted due to the lack ...of (i) a systematic performance evaluation of robotics on different computing platforms and (ii) a comprehensive, open-source, cross-platform benchmark suite. To address these gaps, we present a systematic performance study of robotics on modern hardware and introduce RoWild, an open-source benchmark suite for robotics that is comprehensive and cross-platform. Our workloads encompass a broad range of robots, including driverless vehicles, pilotless drones, and stationary robotic arms, and we evaluate their performance on a spectrum of modern computing platforms, from low-end embedded CPUs to high-end server-grade GPUs. The source code of the benchmark suite is available in https://cmu-roboarch.github.io/rowild/. Our findings reveal that current architectures experience significant inefficiencies when executing robotic workloads, highlighting the need for architectural advancements that satisfy the primary requirements of robotic tasks. We discuss approaches for meeting these requirements, offering insights for improving the performance of robotics.