Heteroepitaxial growth is a process of profound fundamental importance as well as an avenue to realize nanostructures such as Ge/Si quantum dots (QDs), with appealing properties for applications in ...opto‐ and nanoelectronics. However, controlling the Ge/Si QD size, shape, and composition remains a major obstacle to their practical implementation. Here, Ge nanostructures on Si(111) were investigated in situ and in real‐time by low energy electron microscopy (LEEM), enabling the observation of the transition from wetting layer formation to 3D island growth and decay. The island size, shape, and distribution depend strongly on the growth temperature. As the deposition temperature increases, the islands become larger and sparser, consistent with Brownian nucleation and capture dynamics. At 550°C, two distinct Ge/Si nanostructures are formed with bright and dark appearances that correspond to flat, atoll‐like and tall, faceted islands, respectively. During annealing, the faceted islands increase in size at the expense of the flat ones, indicating that the faceted islands are thermodynamically more stable. In contrast, triangular islands with uniform morphology are obtained from deposition at 600°C, suggesting that the growth more closely follows the ideal shape. During annealing, the islands formed at 600°C initially show no change in morphology and size and then rupture simultaneously, signaling a homogeneous chemical potential of the islands. These observations reveal the role of dynamics and energetics in the evolution of Ge/Si QDs, which can serve as a step towards the precise control over the Ge nanostructure size, shape, composition, and distribution on Si(111).
Real‐time low energy electron microscopy observations reveal the growth dynamics and stability of Ge quantum dots formed on Si(111) at temperatures between 450°C and 600°C. A mix of metastable, flat islands and tall, faceted islands are produced at 550°C and below, whereas growth at 600°C yields uniform large, triangular islands.
Code clone detection is an important aspect of software development and maintenance. The extensive research in this domain has helped reduce the complexity and increase the robustness of source code, ...thereby assisting bug detection tools. However, the majority of the clone detection literature is confined to a single language. With the increasing prevalence of cross-platform applications, functionality replication across multiple languages is common, resulting in code fragments having similar functionality but belonging to different languages. Since such clones are syntactically unrelated, single language clone detection tools are not applicable in their case. In this article, we propose a semi-supervised deep learning-based tool Rubhus , capable of detecting clones across different programming languages. Rubhus uses the control and data flow enriched abstract syntax trees (ASTs) of code fragments to leverage their syntactic and structural information and then applies graph neural networks (GNNs) to extract this information for the task of clone detection. We demonstrate the effectiveness of our proposed system through experiments conducted over datasets consisting of Java, C, and Python programs and evaluate its performance in terms of precision, recall, and F1 score. Our results indicate that Rubhus outperforms the state-of-the-art cross-language clone detection tools.
Instruction reordering and interleavings in program execution under relaxed memory semantics result in non-intuitive behaviors, making it difficult to provide assurances about program correctness. ...Studies have shown that up to 90% of the concurrency bugs reported by state-of-the-art static analyzers are false alarms. As a result, filtering false alarms and detecting real concurrency bugs is a challenging problem. Unsurprisingly, this problem has attracted the interest of the research community over the past few decades. Nonetheless, many of the existing techniques rely on analyzing source code, rarely consider the effects introduced by compilers, and assume a sequentially consistent memory model. In a practical setting, however, developers often do not have access to the source code, and even commodity architectures such as x86 and ARM are not sequentially consistent.In this work, we present Bird, a prototype tool, to dynamically detect harmful data races in x86 binaries under relaxed memory models, TSO and PSO. Bird employs source-DPOR to explore all distinct feasible interleavings for a multithreaded application. Our evaluation of Bird on 42 publicly available benchmarks and its comparison with the state-of-the-art tools indicate Bird’s potential in effectively detecting data races in software binaries.
Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A ...substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semantic clones. Recently, motivated by the success of deep learning models in other fields, including natural language processing and computer vision, researchers have attempted to adopt deep learning techniques to detect code clones. These approaches use lexical information (tokens) and(or) syntactic structures like abstract syntax trees (ASTs) to detect code clones. However, they do not make sufficient use of the available structural and semantic information, hence limiting their capabilities. This paper addresses the problem of semantic code clone detection using program dependency graphs and geometric neural networks, leveraging the structured syntactic and semantic information. We have developed a prototype tool Holmes , based on our novel approach and empirically evaluated it on popular code clone benchmarks. Our results show that Holmes performs considerably better than the other state-of-the-art tool, TBCCD. We also assessed Holmes on unseen projects and performed cross dataset experiments to evaluate the generalizability of Holmes . Our results affirm that Holmes outperforms TBCCD since most of the pairs that Holmes detected were either undetected or suboptimally reported by TBCCD.
Mining Similar Methods for Test Adaptation Sondhi, Devika; Jobanputra, Mayank; Rani, Divya ...
IEEE transactions on software engineering,
07/2022, Letnik:
48, Številka:
7
Journal Article
Recenzirano
Developers may choose to implement a library despite the existence of similar libraries, considering factors such as computational performance, language or platform dependency, accuracy, convenience, ...and completeness of an API. As a result, GitHub hosts several library projects that have overlaps in their functionalities. These overlaps have been of interest to developers from the perspective of code reuse or the preference of one implementation over the other. Through an empirical study, we explore the extent and nature of existence of these similarities in the library functions. We have further studied whether the similarity of functions across different libraries and their associated test suites can be leveraged to reveal defects in one another. We see scope for effectively using the mining of test suites from the perspective of revealing defects in a program or its documentation. Another noteworthy observation made in the study is that similar functions may exist across libraries implemented in the same language as well as in different languages. Identifying the challenges that lie in building a testing tool, we automate the entire process in Metallicus , a test mining and recommendation tool. Metallicus returns a test suite for the given input of a query function and a template for its test suite. On a dataset of query functions taken from libraries implemented in Java or Python, Metallicus revealed 46 defects.
Concurrency errors due to poorly handled exceptions are common. Developers often make mistakes in writing proper code logic to relinquish the resources in the exception-handlers and cleanup blocks ...such as finally in Java. Our observations suggest that these mistakes often go unnoticed because the exception-handling code is generally not tested in the development phase. These errors materialize when the exception handlers execute within the production environment. Therefore, verifying multi-threaded programs augmented with their exception handlers is necessary to guarantee their correctness. This paper proposes a dynamic technique to verify exception-handling code in concurrent libraries. The technique detects the presence of deadlocks originating from exception-handling code. We also present a prototype called Lumina,that implements our technique to demonstrate that it can detect the deadlocks effectively, unlike the state-of-the-art dynamic verifier JavaPathfinder (JPF).
Interprocedural alias analyses often sacrifice precision for scalability. Thus, modern compilers such as GCC and LLVM implement more scalable but less precise intraprocedural alias analyses. This ...compromise makes the compilers miss out on potential optimization opportunities, affecting the performance of the application. Modern compilers implement loop-versioning with dynamic checks for pointer disambiguation to enable the missed optimizations. Polyhedral access range analysis and symbolic range analysis enable (1) range checks for non-overlapping of memory accesses inside loops. However, these approaches work only for the loops in which the loop bounds are loop invariants. To address this limitation, researchers proposed a technique that requires ( ) memory accesses for pointer disambiguation. Others improved the performance of dynamic checks to single memory access by constraining the object size and alignment. However, the former approach incurs noticeable overhead due to its dynamic checks, whereas the latter has a noticeable allocator overhead. Thus, scalability remains a challenge. In this work, we present a tool, Rapid, that further reduces the overheads of the allocator and dynamic checks proposed in the existing approaches. The key idea is to identify objects that need disambiguation checks using a profiler and allocate them in different regions, which are disjoint memory areas. The disambiguation checks simply compare the regions corresponding to the objects. The regions are aligned such that the top 32 bits in the addresses of any two objects allocated in different regions are always different. As a consequence, the dynamic checks do not require any memory access to ensure that the objects belong to different regions, making them efficient. Rapid achieved a maximum performance benefit of around 52.94% for Polybench and 1.88% for CPU SPEC 2017 benchmarks. The maximum CPU overhead of our allocator is 0.57% with a geometric mean of -0.2% for CPU SPEC 2017 benchmarks. Due to the low overhead of the allocator and dynamic checks, Rapid could improve the performance of 12 out of 16 CPU SPEC 2017 benchmarks. In contrast, a state-of-the-art approach used in the comparison could improve only five CPU SPEC 2017 benchmarks.
Segate Sondhi, Devika; Purandare, Rahul
2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE),
11/2019
Conference Proceeding
Automated testing techniques are often assessed on coverage based metrics. However, despite giving good coverage, the test cases may miss the gap between functional specification and the code ...implementation. This gap may be subtle in nature, arising due to the absence of logical checks, either in the implementation or in the specification, resulting in inconsistencies in the input definition. The inconsistencies may be prevalent especially for structured inputs, commonly specified using string-based data types. Our study on defects reported over popular libraries reveals that such gaps may not be limited to input validation checks. We propose a test generation technique for structured string inputs where we infer inconsistencies in input definition to expose semantic gaps in the method under test and the method specification. We assess this technique using our tool Segate, Semantic Gap Tester. Segate uses static analysis and automaton modeling to infer the gap and generate test cases. On our benchmark dataset, comprising of defects reported in 15 popular open-source libraries, written in Java, Segate was able to generate tests to expose 80% of the defects.
The gender gap is a significant concern facing the software industry as the development becomes more geographically distributed. Widely shared reports indicate that gender differences may be specific ...to each region. However, how complete can these reports be with little to no research reflective of the Open Source Software (OSS) process and communities software is now commonly developed in? Our study presents a multi-region geographical analysis of gender inclusion on GitHub. This mixed-methods approach includes quantitatively investigating differences in gender inclusion in projects across geographic regions and investigate these trends over time using data from contributions to 21,456 project repositories. We also qualitatively understand the unique experiences of developers contributing to these projects through a survey that is strategically targeted to developers in various regions worldwide. Our findings indicate that gender diversity is low across all parts of the world, with no substantial difference across regions. However, there has been statistically significant improvement in diversity worldwide since 2014, with certain regions such as Africa improving at faster pace. We also find that most motivations and barriers to contributions (e.g., lack of resources to contribute and poor working environment) were shared across regions, however, some insightful differences, such as how to make projects more inclusive, did arise. From these findings, we derive and present implications for tools that can foster inclusion in open source software communities and empower contributions from everyone, everywhere.
Context-sensitive inter-procedural alias analyses are more precise than intra-procedural alias analyses. However, context-sensitive inter-procedural alias analyses are not scalable. As a consequence, ...most of the production compilers sacrifice precision for scalability and implement intra-procedural alias analysis. The alias analysis is used by many compiler optimizations, including loop transformations. Due to the imprecision of alias analysis, the program’s performance may suffer, especially in the presence of loops.
Previous work proposed a general approach based on code-versioning with dynamic checks to disambiguate pointers at runtime. However, the overhead of dynamic checks in this approach is O(log n), which is substantially high to enable interesting optimizations. Other suggested approaches, e.g., polyhedral and symbolic range analysis, have O(1) overheads, but they only work for loops with certain constraints. The production compilers, such as LLVM and GCC, use scalar evolution analysis to compute an O(1) range check for loops to resolve memory dependencies at runtime. However, this approach also can only be applied to loops with certain constraints.
In this work, we present our tool, Scout, that can disambiguate two pointers at runtime using single memory access. Scout is based on the key idea to constrain the allocation size and alignment during memory allocations. Scout can also disambiguate array accesses within a loop for which the existing O(1) range checks technique cannot be applied. In addition, Scout uses feedback from static optimizations to reduce the number of dynamic checks needed for optimizations.
Our technique enabled new opportunities for loop-invariant code motion, dead store elimination, loop vectorization, and load elimination in an already optimized code. Our performance improvements are up to 51.11% for Polybench and up to 0.89% for CPU SPEC 2017 suites. The geometric means for our allocator’s CPU and memory overheads for CPU SPEC 2017 benchmarks are 1.05%, and 7.47%, respectively. For Polybench benchmarks, the geometric mean of CPU and memory overheads are 0.21% and 0.13%, respectively.