Context: Run-time detection of system anomalies at the host level remains a challenging task. Existing techniques suffer from high rates of false alarms, hindering large-scale deployment of anomaly ...detection techniques in commercial settings.Objective: To reduce the false alarm rate, we present a new anomaly detection system based on a novel feature extraction technique, which combines the frequency with the temporal information from system call traces, and on one-class support vector machine (OC-SVM) detector.Method: The proposed feature extraction approach starts by segmenting the system call traces into multiple n-grams of variable length and mapping them to fixed-size sparse feature vectors, which are then used to train OC-SVM detectors.Results: The results achieved on a real-world system call dataset show that our feature vectors with up to 6-grams outperform the term vector models (using the most common weighting schemes) proposed in related work. More importantly, our anomaly detection system using OC-SVM with a Gaussian kernel, trained on our feature vectors, achieves a higher-level of detection accuracy (with a lower false alarm rate) than that achieved by Markovian and n-gram based models as well as by the state-of-the-art anomaly detection techniques.Conclusion: The proposed feature extraction approach from traces of events provides new and general data representations that are suitable for training standard one-class machine learning algorithms, while preserving the temporal dependencies among these events.
Open science is a practice that makes scientific research publicly accessible to anyone, hence is highly beneficial. Given the benefits, the software engineering (SE) community has been diligently ...advocating open science policies during peer reviews and publication processes. However, to this date, there has been few studies that look into the status and issues of open science in SE from a systematic perspective. In this paper, we set out to start filling this gap. Given the great breadth of SE in general, we constrained our scope to a particular topic area in SE as an example case. Recently, an increasing number of deep learning (DL) approaches have been explored in SE, including DL-based software vulnerability detection , a popular, fast-growing topic that addresses an important problem in software security. We exhaustively searched the literature in this area and identified 55 relevant works that propose a DL-based vulnerability detection approach. This was then followed by comprehensively investigating the four integral aspects of open science: availability , executability , reproducibility , and replicability . Among other findings, our study revealed that only a small percentage (25.5%) of the studied approaches provided publicly available tools. Some of these available tools did not provide sufficient documentation and complete implementation, making them not executable or not reproducible . The uses of balanced or artificially generated datasets caused significantly overrated performance of the respective techniques, making most of them not replicable . Based on our empirical results, we made actionable suggestions on improving the state of open science in each of the four aspects. We note that our results and recommendations on most of these aspects ( availability , executability , reproducibility ) are not tied to the nature of the chosen topic (DL-based vulnerability detection) hence are likely applicable to other SE topic areas. We also believe our results and recommendations on replicability to be applicable to other DL-based topics in SE as they are not tied to (the particular application of DL in) detecting software vulnerabilities.
Observability and explainability are on the pathway to assemble data, tools, methods and architectures to gain insights of complex behaviors and transparency in the context of decision making in ...software systems.
Ensemble-based anomaly detection systems (ADSs), using Boolean combination, have been shown to reduce the false alarm rate over that of a single detector. However, the existing Boolean combination ...methods rely on an exponential number of combinations making them impractical, even for a small number of detectors. In this paper, we propose weighted pruning-based Boolean combination, an efficient approach for selecting and combining accurate and diverse anomaly detectors. It works in three phases. The first phase selects a subset of the available base diverse soft detectors by pruning all the redundant soft detectors based on a weighted version of Cohen's kappa measure of agreement. The second phase selects a subset of diverse and accurate crisp detectors from the base soft detectors (selected in Phase1) based on the unweighted kappa measure. The selected complementary crisp detectors are then combined in the final phase using Boolean combinations. The results on two large scale datasets show that the proposed weighted pruning approach is able to maintain and even improve the accuracy of existing Boolean combination techniques, while significantly reducing the combination time and the number of detectors selected for combination.
Empirical study of android repackaged applications Khanmohammadi, Kobra; Ebrahimi, Neda; Hamou-Lhadj, Abdelwahab ...
Empirical software engineering : an international journal,
12/2019, Letnik:
24, Številka:
6
Journal Article
Recenzirano
The growing popularity of Android applications has generated increased concerns over the danger of piracy and the spread of malware, and particularly of adware: malware that seeks to present unwanted ...advertisements to the user. A popular way to distribute malware in the mobile world is through repackaging of legitimate apps. This process consists of downloading, unpacking, manipulating, recompiling an application, and publishing it again in an app store. In this paper, we conduct an empirical study of over 15,000 apps to gain insights into the factors that drive the spread of repackaged apps. We also examine the motivations of developers who publish repackaged apps and those of users who download them, as well as the factors that determine which apps are chosen for repackaging, and the ways in which the apps are modified during the repackaging process. Having observed that adware is particularly prevalent in repackaged apps, we focus on this type of malware and examine how the app is modified when it is injected in an app’s code. Our findings shed much needed light on this class of malware that can be useful to security experts, and allow us to make recommendations that could lead to the creation of more effective malware detection tools, Furthermore, on the basis of our results, we propose a novel app indexing scheme that minimizes the number of comparisons needed to detect repackaged apps.
Model-Driven Engineering is a development paradigm that uses models instead of code as primary development artifacts. In this paper, we focus on
executable models
, which are used to abstract the ...behavior of systems for the purpose of verifying and validating (V&V) a system’s properties. Model execution tracing (i.e., obtaining and analyzing traces of model executions) is an important enabler for many V&V techniques including testing, model checking, and system comprehension. This may explain the increase in the number of proposed approaches on tracing model executions in the last years. Despite the increased attention, there is currently no clear understanding of the state of the art in this research field, making it difficult to identify research gaps and opportunities. The goal of this paper is to survey and classify existing work on model execution tracing, and identify promising future research directions. To achieve this, we conducted a systematic mapping study where we examined 64 primary studies out of 645 found publications. We found that the majority of model execution tracing approaches has been developed for the purpose of testing and dynamic analysis. Furthermore, most approaches target specific modeling languages and rely on custom trace representation formats, hindering the synergy among tools and exchange of data. This study also revealed that most existing approaches were not validated empirically, raising doubts as to their effectiveness in practice. Our results suggest that future research should focus on developing a common trace exchange format for traces, designing scalable trace representations, as well as conducting empirical studies to assess the effectiveness of proposed approaches.
Model transformation plays an important role in developing software systems using the model-driven engineering paradigm. Examples of applications of model transformation include forward engineering, ...reverse engineering of code into models, and refactoring. Poor-quality model transformation code is costly and hard to maintain. There is a need to develop techniques and tools that can support transformation engineers in designing high-quality model transformations. The goal of this paper is to present a process, called MUPPIT (method for using proper patterns in model transformations), which can be used by transformation engineers to improve the quality of model transformations by detecting anti-patterns in the transformations and automatically applying pattern solutions. MUPPIT consists of four phases: (1) identifying a transformation anti-pattern, (2) proposing a pattern-solution, (3) applying the pattern-solution, and (4) evaluating the transformation model. MUPPIT takes a transformation design model (TDM), which is a representation of the given transformation, to search for the presence of an anti-pattern of interest. If found, MUPPIT proposes a pattern solution from a catalogue of patterns to the transformation engineer. The application of the pattern solution results in the restructuring of the TDM. While MUPPIT, as a process, is independent of any transformation language and transformation engineering framework, we have implemented an instance of it as a tool using transML and MeTAGeM, which support exogenous transformations using rule-based transformation and OCL-based languages such as ATL and ETL. We evaluate MUPPIT through a number of case studies in which we show how MUPPIT can detect four anti-patterns and propose the corresponding pattern solutions. We also evaluate MUPPIT by collecting a number of metrics to assess the quality of the resulting transformations. The results show that MUPPIT optimizes the transformations by improving reusability, modularity, simplicity, and maintainability, as well as decreasing the complexity. MUPPIT can help transformation engineers to produce high-quality transformations using a pattern-based approach. An immediate future direction would be to experiment with more anti-patterns and pattern solutions. Moreover, we need to implement MUPPIT using other transformation engineering frameworks.
•We look into the execution-structural underpinnings of Android app behaviors via a multi-faceted, longitudinal dynamic characterization.•Our study reveals a number of new findings about app ...behaviors in addition to novel understanding about the evolutionary dynamics of apps in Android.•Our study provides a first look into the security implications of run-time app behaviors in terms of code-level execution structures.•We offer insights into the implications of our findings for enhancing app understanding, code analysis, and security defense.•Our datasets are shared publicly to facilitate reproduction and future research on mobile software engineering and security.
The constant evolution of the Android platform and its applications have imposed significant challenges both to understanding and securing the Android ecosystem. Yet, despite the growing body of relevant research, it remains unclear how Android apps evolve in terms of their run-time behaviors in ways that impede our gaining consistent empirical knowledge about the workings of the ecosystem and developing effective technical solutions to defending it against security threats. Intuitively, an essential step towards addressing these challenges is to first understand the evolution itself. Among others, one avenue to examining a program’s run-time behavior is to dissect the program’s execution in terms of its syntactic and semantic structure.
In this paper, we study how benign Android apps execute differently from malware over time, in terms of their execution structures measured by the distribution and interaction among functionality scopes, app components, and callbacks. In doing so, we attempt to reveal how relevant app execution structure is to app security orientation (i.e., benign or malicious).
By tracing the method calls and inter-component communications (ICCs) of 15,451 benign apps and 15,183 malware developed during eight years (2010–2017), we systematically characterized the execution structure of malware versus benign apps and revealed similarities and disparities between them that are not previously known.
Our results show, among other findings, that (1) despite their similarity in execution distribution over functionality scopes, malware accessed framework functionalities mainly through third-party libraries, while benign apps were dominated by calls within the framework; (2) use of Activity component had been rising in malware while benign apps saw continuous drop in such uses; (3) malware invoked significantly more Services but less Content Providers than benign apps during the evolution of both groups; (4) malware carried ICC data significantly less often via standard data fields than benign apps, albeit both groups did not carry any data in most ICCs; and (5) newer malware tended to have more even distribution of callbacks among event-handler categories, while the distribution remained constant in benign apps over time.
We discussed how these findings inform understanding app behaviors, optimizing static and dynamic code analysis of Android apps, and developing sustainable app security defense solutions.
High Performance Computing (HPC) systems are used in a variety of industrial and research sectors to solve complex problems that require powerful computing platforms. For these systems to remain ...reliable, we should be able to debug and analyze their behavior in order to detect root causes of potential poor performance. Execution traces hold important information regarding the events and interactions among communicating processes, which are essential for the debugging of inter-process communication. Traces, however, tend to be considerably large, hindering their applicability. In previous work, we presented an approach for automatically detecting communication patterns and segmenting large HPC traces into execution phases. The goal is to reduce the effort of analyzing traces by allowing software analysts to focus on smaller parts of interest. In this paper, we propose an approach for detecting and localizing inefficient communication patterns using statistical and trace segmentation methods. In addition, we use the Analytic Hierarchy Process to categorize slow communication patterns based on their severity and complexity levels. Using our approach, an analyst can quickly locate slow communication patterns that may be the cause of important performance problems. We show the effectiveness of our approach by applying it to large traces from three HPC systems.
•This paper describes a novel approach for detecting inefficient communication patterns in HPC system execution traces using information theory and statistical analysis methods.•The approach can categorize inefficient patterns based on their various severity and complexity levels to better guide analysts.•The approach is useful for software developers and engineers when debugging and analyzing performance issues in HPC systems.•The effectiveness of the approach is shown by applying it to five large traces from three open HPC programs that use MPI for inter-process communication.•The threats to validity and the limitations of the approach are carefully discussed along with important future directions in the field.