State-of-the-art approaches to design, develop and optimize software packet-processing programs are based on static compilation: the compiler's input is a description of the forwarding plane ...semantics and the output is a binary that can accommodate any control plane configuration or input traffic. In this paper, we demonstrate that tracking control plane actions and packet-level traffic dynamics at run time opens up new opportunities for code specialization. We present, a system working alongside static compilers that continuously optimizes the targeted networking code. We introduce a number of new techniques, from static code analysis to adaptive code instrumentation, and we implement a toolbox of domain specific optimizations that are not restricted to a specific data plane framework or programming language. We apply to several systems, from eBPF and DPDK programs including Katran, Meta's production-grade load balancer to container orchestration solutions such a Kubernets. We compare to state-of-the-art optimization frameworks and show that it can bring up to 2x throughput improvement, while halving the 99th percentile latency.
An intrusion detection system (IDS) checks the content of headers and payload of packets to detect intrusions from the network. It is an essential function for network security. Traditionally, an ...IDS, such as Snort, which is a widely used open source IDS, is implemented as a program running in the user space on a hardware server. Recently, with the availability of Extended BPF (eBPF) in the Linux kernel, efficiently checking and filtering arriving packets directly in the kernel becomes feasible. In this work, we design and implement an IDS that has two parts working together. The first part runs in the Linux kernel. Its uses eBPF to perform fast patterns matching to pre-drop a very large portion of packets that have no chance to match any rule. The second part runs in the user space. It examines the packets left by the first part to find the rules that match them. Using a modified version of the registered ruleset of Snort, experimental results show that the maximum throughput of our IDS system can outperform that of Snort by a factor of 3 under many tested conditions.
Memory-intensive applications, such as in-memory databases, caching systems, and key-value stores, are increasingly demanding larger main memory to fit their working sets. Conventional swapping can ...enlarge the memory capacity by paging out inactive pages to backend stores. However, existing swapping solutions suffer several performance and compatibility issues, making them unsuitable for high-concurrency and memoryintensive applications. In this paper, we redesign the swapping system and propose Lightswap, a high-performance user-space swapping solution that supports paging with both local SSDs and remote memories. First, to avoid kernel involvement, we propose to leverage the extended Berkeley Packet Filter (eBPF) for handling page faults in user space and further eliminate the heavy I/O stack with the help of user-space I/O drivers. Then, we co-design the page fault handling with lightweight thread (LWT) scheduling to improve system throughput and reduce the end-to-end page fault latency. Finally, we propose a try-catch framework in Lightswap to deal with swap-in errors which have been exacerbated by the scaling in process technology. We implement Lightswap in our production-level system and evaluate it with various benchmarks. Results show that Lightswap achieves scalable page fault notification latency (4μs under 128 LWTs), reduces the page fault handling latency by 3-5 times, and improves the throughput of memcached by more than 40% compared with the state-of-art swapping systems.
Serverless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight ...components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. 'Cold-start' latency is another deterrent. We describe, a lightweight, high-performance, responsive serverless framework. exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF's socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, achieves the same dataplane performance with 10<inline-formula> <tex-math notation="LaTeX">\times</tex-math> </inline-formula> less CPU usage under realistic workloads. Additionally, eBPF benefits, by replacing heavyweight serverless components, allowing us to keep functions 'warm' with negligible penalty. Our preliminary experimental results show that achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for 'cold-start'.
Segment Routing with IPv6 (SRv6) is a leading Hybrid SDN (HSDN) architecture, as it fully exploits standard IP routing and forwarding both in the control plane and in the data plane. In this paper we ...design, implement and evaluate a programmable data plane solution for Linux routers called HIKE (HybrId Kernel/eBPF forwarding), integrated in an HSDN/SRv6 architecture. HIKE integrates the conventional Linux kernel packet forwarding with custom designed eBPF/XDP (extended Berkeley Packet Filter/eXtreme Data Path) bypass to speed up performance of SRv6 software routers. Thus, in addition to the hybrid IP/SDN forwarding, we foster an additional hybrid approach inside a Linux forwarding engine combining eBPF/XDP and kernel based forwarding, taking the best from both worlds. Therefore, considering the two different conceptual levels of hybridization, we call our overall solution Hybrid squared or Hˆ2.
We have applied the Hˆ2 solution to Performance Monitoring (PM) in Hybrid SDNs, and we show how our HIKE data plane architecture supports SRv6 networking and Performance Monitoring (in particular Loss Monitoring) allowing a significant increase in performance: our implementation results show a remarkable throughput improvement (5x) with respect to a conventional Linux based solution.
An increasing number of 5G core network modules are being implemented in the X86 cloud environment. Improving the modules’ packet processing ability can accelerate the 5G data plane. EBPF is a ...promising way to reduce the packet processing consumption in the Linux kernel. In this paper, we propose Cable, a framework to accelerate the 5G data plane by using eBPF. Cable can not only reduce the packet processing time in a single UPF, but also schedule PDU sessions to suitable UPFs by the monitoring information. We implemented Cable in the open source 5G core project Free5GC and made it publicly available. Evaluation in a simulation environment shows that Cable can reduce packet processing time in a single UPF by over 30% and improve the whole system’s UPF usage by 25%.
NFV and SDN enable flexibility and programmability at the data plane. In addition, offloading packet processing to a hardware saves processing resources to compute other workloads. However, ...fulfilling requirements such as high throughput and low latency with a flexible and programmable data plane is challenging. This paper introduces eBPFlow, a platform for seamlessly accelerating network computation. It builds upon eBPF. eBPFlow combines flexibility and programmability in software with high performance using an FPGA. We implemented our system on the NetFPGA SUME, performing tests on a physical testbed. We built a range of NFs. Our results show that the eBPFlow supports offloading of NFs with throughput at the line rate, latency between <inline-formula> <tex-math notation="LaTeX">20~\mu \text{s} </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">40~\mu \text{s} </tex-math></inline-formula>, communication with host, and consumption of 22 W. Moreover, eBPFlow processes 12.05 Mpps more than the kernel. eBPFlow has a throughput of 2.59 Gbps higher than the hXDP, a system similar to eBPFlow.
Most congestion control algorithms (CCAs) are designed for specific network environments. As such, there is no known algorithm that achieves uniformly good performance in all scenarios for all flows. ...Rather than devising a one-size-fits-all algorithm (which is a likely impossible task), we propose a system to dynamically switch between the most suitable CCAs for specific flows in specific environments. This raises a number of challenges, which we address through the design and implementation of Antelope, a system that can dynamically reconfigure the stack to use the most suitable CCA for individual flows. We build a machine learning model to learn which algorithm works best for individual conditions and implement kernel-level support for dynamically switching between CCAs. The framework also takes application requirements of performance into consideration to fine-tune the selection based on application-layer needs. Moreover, to reduce the overhead introduced by machine learning on individual front-end servers, we (optionally) implement the CCA selection process in the cloud, which allows the share of models and the selection among front-end servers. We have implemented Antelope in Linux, and evaluated it in both emulated and production networks. The results demonstrate the effectiveness of Antelope via dynamic adjusting the CCAs for individual flows. Specifically, Antelope achieves an average 16% improvement in throughput compared with BBR, and an average 19% improvement in throughput and 10% reduction in delay compared with CUBIC.
By moving network functionality from dedicated hardware to software running on end-hosts, Network Functions Virtualization (NFV) pledges the benefits of cloud computing to packet processing. While ...most of the NFV frameworks today rely on kernel-bypass approaches, no attention has been given to kernel packet processing, which has always proved hard to evolve and to program. In this article, we present Polycube, a software framework whose main goal is to bring the power of NFV to in-kernel packet processing applications, enabling a level of flexibility and customization that was unthinkable before. Polycube enables the creation of arbitrary and complex network function chains, where each function can include an efficient in-kernel data plane and a flexible user-space control plane with strong characteristics of isolation, persistence, and composability. Polycube network functions, called Cubes, can be dynamically generated and injected into the kernel networking stack, without requiring custom kernels or specific kernel modules, simplifying the debugging and introspection, which are two fundamental properties in recent cloud environments. We validate the framework by showing significant improvements over existing applications, and we prove the generality of the Polycube programming model through the implementation of complex use cases such as a network provider for Kubernetes.