Over-privileged Linux containers might put the underlying OS at risk by permitting pointless system calls that could be exploited as entry points to the kernel. However, finding such security ...profiles is a difficult task as it demands examining the implementation/operation of containers in the absence of knowledge regarding its required system calls. In this article, we propose a hybrid approach to limit the system call usage during the execution of containers. Specifically, given an application container, we maintain an initial fine-grained whitelist by dynamic tracking to control the run-time security along with a complementary whitelist extracted via static analysis to maintain container's functionality while addressing the coverage limitation of dynamic analysis. Our method automatically analyzes the container behavior to identify three execution phases and dynamically enforce the corresponding fine-grained system call whitelists. The invoked system call will be compared with both whitelists to decide if it should be killed to guarantee the container security or logged for further analysis. Our evaluation results with 193 Docker images demonstrate the effectiveness of our approach in significantly reducing the required system calls during the applications' life-cycle. Furthermore, we discuss the reduced attack surface and demonstrate the efficiency of our approach through empirical analysis results.
A container is a group of processes isolated from other groups via distinct kernel namespaces and resource allocation quota. Attacks against containers often leverage kernel exploits through the ...system call interface. In this paper, we present an approach that mines sandboxes and enables fine-grained sandbox enforcement for containers. We first explore the behavior of a container by running test cases and monitor the accessed system calls including types and arguments during testing. We then characterize the types and arguments of system call invocations and translate them into sandbox rules for the container. The mined sandbox restricts the container’s access to system calls which are not seen during testing and thus reduces the attack surface. In the experiment, our approach requires less than eleven minutes to mine a sandbox for each of the containers. The estimation of system call coverage of sandbox mining ranges from 96.4% to 99.8% across the containers under the limiting assumptions that the test cases are complete and only static system/application paths are used. The enforcement of mined sandboxes incurs low performance overhead. The mined sandboxes effectively reduce the attack surface of containers and can prevent the containers from security breaches in reality.
While container adoption has witnessed significant growth in facilitating the operation of large-scale applications, this increased attention has also attracted adversaries who exploit numerous ...vulnerabilities present in contemporary containers. Unfortunately, existing security solutions largely overlooked the need to restrict container access to the shared host kernel, particularly exhibiting critical limitations in enforcing the least privilege for containers during runtime. Hence, we propose Optimus, an automated and comprehensive system that confines container operations and governs their interactions with the host kernel using an association-based system call filtering. Optimus efficiently identifies the essential system calls required by containers and enhances their security posture by dynamically enforcing the minimal set of system calls for each container during runtime. This is achieved through (1) lightweight system call monitoring leveraging eBPF, (2) system call validation via association analysis, and (3) dynamic system call filtering by adopting covert container renewal. Our evaluation shows that Optimus effectively minimizes the necessary system calls for containers while maintaining their serviceability and operational efficiency during runtime.
Container-based clouds—in which containers are the basic unit of isolation—face security concerns because, unlike Virtual Machines, containers directly interface with the underlying highly privileged ...kernel through the wide and vulnerable system call interface. Regardless of whether a container itself requires dangerous system calls, a compromised or malicious container sharing the host (a bad neighbor) can compromise the host kernel using a vulnerable syscall, thereby compromising all other containers sharing the host.
In this paper, rather than attempting to eliminate host compromise, we limit the effectiveness of attacks by bad neighbors to a subset of the cluster. To do this, we propose a new metric dubbed Extraneous System call Exposure (ExS). Scheduling containers to minimize ExS reduces the number of nodes that expose a vulnerable system call and as a result the number of affected containers in the cluster. Experimenting with 42 popular containers on SySched, our greedy scheduler implementation in Kubernetes, we demonstrate that SySched can reduce up to 46% more victim nodes and up to 48% more victim containers compared to the Kubernetes default scheduling while also reducing overall host attack surface by 20%.
BinWrap: Hybrid Protection against Native Node.js Add-ons Christou, George; Ntousakis, Grigoris; Lahtinen, Eric ...
Proceedings of the 2023 ACM Asia Conference on Computer and Communications Security,
07/2023
Conference Proceeding
Open access
Modern applications, written in high-level programming languages, enjoy the security benefits of memory and type safety. Unfortunately, even a single memory-unsafe library can wreak havoc on the rest ...of an otherwise safe application, nullifying all the security guarantees offered by the high-level language and its managed runtime. We perform a study across the Node.js ecosystem to understand the use patterns of binary add-ons. Taking the identified trends into account, we propose a new hybrid permission model aimed at protecting both a binary add-on and its language-specific wrapper. The permission model is applied all around a native add-on and is enforced through a hybrid language-binary scheme that interposes on accesses to sensitive resources from all parts of the native library. We infer the add-on’s permission set automatically over both its binary and JavaScript sides, via a set of novel program analyses. Applied to a wide variety of native add-ons, we show that our framework, BinWrap, reduces access to sensitive resources, defends against real-world exploits, and imposes an overhead that ranges between 0.71%–10.4%.
With the widespread use of container technology, attackers may invade the kernel by maliciously executing certain system calls, causing damage to the host and other containers. In order to reduce the ...attack surface of the underlying system, Docker supports specifying a container's allowlist of system calls with seccomp configurations. Java is a mainstream programming language used by the container projects in the Docker Hub, but how to generate the allowlist of system calls for Java containers is still an open question. Firstly, most of previous efforts about container allowlist of system calls focused on the C/C++ binary code rather than Java bytecode. Secondly, some existing works on Java bytecode mainly paid attention to the security vulnerabilities analysis, and cannot be used to analyze system calls required by Java programs. In this paper, we propose the first bytecode-based system call analysis approach, named SWAT 4J, tailored for Java containers operating on x86_64 architecture. SWAT4J can generate the allowlist of system calls required for Java containers by combining static and dynamic analysis. For static analysis, SWAT4J can identify the indirect calling relationships between Java bytecode and system calls, and determine the system calls required for a containerized application. For dynamic analysis, SWAT4J can trace the system calls required for container startup. The seccomp configuration file is optimized through the combining set of system calls. In the end, we experimented with 5 types of popular open source Java containers projects from Docker Official Images. Compared to 323 system calls in Ubuntu 16.04, SWAT4J successfully reduce the number of system calls by 56.04%-59.44%, and reduce the probability of vulnerabilities without affecting the functionality of the container.
Software vulnerabilities undermine the security of applications. By blocking unused functionality, the impact of potential exploits can be reduced. While seccomp provides a solution for filtering ...syscalls, it requires manual implementation of filter rules for each individual application. Recent work has investigated approaches to automate this task. However, as we show, these approaches make assumptions that are not necessary or require overly time-consuming analysis.
In this paper, we propose Chestnut, an automated approach for generating strict syscall filters with lower requirements and limitations. Chestnut comprises two phases, with the first phase consisting of two static components, i.e., a compiler and a binary analyzer, that statically extract the used syscalls. The compiler-based approach of Chestnut is up to factor 73 faster than previous approaches with the same accuracy. On the binary level, our approach extends over previous ones by also applying to non-PIC binaries. An optional second phase of Chestnut is dynamic refinement to restrict the set of allowed syscalls further. We demonstrate that Chestnut on average blocks 302 syscalls (86.5%) via the compiler and 288 (82.5%) using the binary analysis on a set of 18 applications. Chestnut blocks the dangerous exec syscall in 50% and 77.7% of the tested applications using the compiler- and binary-based approach, respectively. For the tested applications, Chestnut blocks exploitation of more than 61% of the 175 CVEs that target the kernel via syscalls.
Container escape, which exploits vulnerabilities in the shared kernel to break container isolation, is a severe security threat in cloud-native computing. To alleviate the threat, we should allow the ...minimum number of system calls required by individual containers, but figuring out which system calls an arbitrary container will need is a challenging problem. This paper presents Prof-gen that automatically creates a restrictive system call policy using static binary analysis and dynamic analysis without any prior knowledge. The tool only requires a container image and a run command. We compared the created system call policy with the results of Confine, a recent study for container attack surface reduction. For 120 official images, Prof-gen reduced the attack surface by 20.2% compared to Confine. All the test containers that applied the profile generated in the application-specific tests ran without failure.
One critical attack that exploits kernel vulnerabilities through system call invocations is considered a serious threat to container security since it results in the privilege escalation followed by ...the infamous container escape. The seccomp kernel feature provides the first line of defense against it. Further, secure container runtimes such as gVisor also make use of it to strengthen security. However, it is known to be brittle since it operates at the granularity of the individual system call. Inadvertent filtering of necessary system calls may inhibit the correct execution while overly generous rules allow the attacks. We believe that, by looking at the sequence of system calls, we can achieve more accurate and effective blocking of attacks in containers. To this end, we built a software tool, Nimos, that performs a combination of static and dynamic analyses of exploit codes in an automated way and investigated the existence of such commonly occurring system call sequences. Then, we analyzed the expected defensive power from applying the sequence-based filtering mechanisms using a large set of collected kernel vulnerabilities to assess the feasibility. We found that there exist a significant number and forms of commonly appearing system call sequences that can be used as a clear signature of the class of attacks. We characterize these common system call sequences that exist among the exploit codes and evaluate the expected effectiveness of a sequence-based system call filtering mechanism for containers.
One critical attack that exploits kernel vulnerabilities through system call invocations is the privilege escalation followed by the infamous container escape. The seccomp provides the first line of ...defense against it. However, it is known to be brittle since it operates at the granularity of the individual system call. Inadvertent filtering of necessary system calls may inhibit the correct execution while overly generous rules allow the attacks.We believe that, by looking at the sequence of system calls, we can achieve more accurate and effective blocking of attacks in containers. To this end, we analyzed the expected defensive power from applying the sequence-based filtering mechanisms by thoroughly analyzing a large set of collected kernel vulnerabilities to assess the feasibility.