Stack Overflow, the world's largest software Q&A (SQA) website, is facing a significant traffic drop due to the emergence of generative AI techniques. ChatGPT is banned by Stack Overflow after only 6 ...days from its release. The main reason provided by the official Stack Overflow is that the answers generated by ChatGPT are of low quality. To verify this, we conduct a comparative evaluation of human-written and ChatGPT-generated answers. Our methodology employs both automatic comparison and a manual study. Our results suggest that human-written and ChatGPT-generated answers are semantically similar, however, human-written answers outperform ChatGPT-generated ones consistently across multiple aspects, specifically by 10% on the overall score. We release the data, analysis scripts, and detailed results at https://anonymous.4open.science/r/GAI4SQA-FD5C.
In this work, we thoroughly investigated the dispersion in SiO2-based photonic crystal fibers with a C7H8-infiltrated hollow core. By cleverly modifying the air hole diameters and lattice constants ...in the structural design, we achieved ultra-flat near-zero dispersion as small as 0.462 ps·nm–1·km–1 and diverse dispersion properties of PCFs, very beneficial for supercontinuum generation. Based on the simulation results, we propose three optimal structures with small and flat dispersion capable of generating a broad and smooth supercontinuum spectrum. The results of our study can be advantageous for fabricating fibers in low-cost all-fiber laser systems.
Due to convenience, open-source software is widely used. For beneficial reasons, open-source maintainers often fix the vulnerabilities silently, exposing their users unaware of the updates to ...threats. Previous works all focus on black-box binary detection of the silent dependency alerts that suffer from high false-positive rates. Open-source software users need to analyze and explain AI prediction themselves. Explainable AI becomes remarkable as a complementary of black-box AI models, providing details in various forms to explain AI decisions. Noticing there is still no technique that can discover silent dependency alert on time, in this work, we propose a framework using an encoder-decoder model with a binary detector to provide explainable silent dependency alert prediction. Our model generates 4 types of vulnerability key aspects including vulnerability type, root cause, attack vector, and impact to enhance the trustworthiness and users' acceptance to alert prediction. By experiments with several models and inputs, we confirm CodeBERT with both commit messages and code changes achieves the best results. Our user study shows that explainable alert predictions can help users find silent dependency alert more easily than black-box predictions. To the best of our knowledge, this is the first research work on the application of Explainable AI in silent dependency alert prediction, which opens the door of the related domains.
The 2019 edition of Stack Overflow developer survey highlights that, for the first time, Python outperformed Java in terms of popularity. The gap between Python and Java further widened in the 2020 ...edition of the survey. Unfortunately, despite the rapid increase in Python's popularity, there are not many testing and debugging tools that are designed for Python. This is in stark contrast with the abundance of testing and debugging tools for Java. Thus, there is a need to push research on tools that can help Python developers. One factor that contributed to the rapid growth of Java testing and debugging tools is the availability of benchmarks. A popular benchmark is the Defects4J benchmark; its initial version contained 357 real bugs from 5 real-world Java programs. Each bug comes with a test suite that can expose the bug. Defects4J has been used by hundreds of testing and debugging studies and has helped to push the frontier of research in these directions. In this project, inspired by Defects4J, we create another benchmark database and tool that contain 493 real bugs from 17 real-world Python programs. We hope our benchmark can help catalyze future work on testing and debugging tools that work on Python programs.
Existing work on software patches often use features specific to a single task. These works often rely on manually identified features, and human effort is required to identify these features for ...each task. In this work, we propose CC2Vec, a neural network model that learns a representation of code changes guided by their accompanying log messages, which represent the semantic intent of the code changes. CC2Vec models the hierarchical structure of a code change with the help of the attention mechanism and uses multiple comparison functions to identify the differences between the removed and added code. To evaluate if CC2Vec can produce a distributed representation of code changes that is general and useful for multiple tasks on software patches, we use the vectors produced by CC2Vec for three tasks: log message generation, bug fixing patch identification, and just-in-time defect prediction. In all tasks, the models using CC2Vec outperform the state-of-the-art techniques.
The right to be forgotten (RTBF) allows individuals to request the removal of personal information from online platforms. Researchers have proposed machine unlearning algorithms as a solution for ...erasing specific data from trained models to support RTBF. However, these methods modify how data are fed into the model and how training is done, which may subsequently compromise AI ethics from the fairness perspective. To help AI practitioners make responsible decisions when adopting these unlearning methods, we present the first study on machine unlearning methods to reveal their fairness implications. We designed and conducted experiments on two typical machine unlearning methods (SISA and AmnesiacML) along with a retraining method (ORTR) as baseline using three fairness datasets under three different deletion strategies. Results show that non-uniform data deletion with the variant of SISA leads to better fairness compared to ORTR and AmnesiacML, while initial training and uniform data deletion do not necessarily affect the fairness of all three methods. This research can help practitioners make informed decisions when implementing RTBF solutions that consider potential trade-offs on fairness.
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Autonomous vehicle is the topic that gains a lot of attentions in recent years. To enable autonomous vehicle, a huge amount of data needs to be collected and processed. One of the most promising ...method to improve the accuracy of data collection is using vehicle communication. With the help of vehicle communication, controlling and warning signal can be transmitted between vehicles. Consequently, necessary information such as identity, position, speed, moving direction, future behavior can be obtained with higher accuracy. This paper focuses on a type of vehicle communication system called optical camera communication. One of the most important components of optical camera communication system is image processing. In this paper, a deep learning-based image processing algorithm is proposed for optical camera communication used for proactive data collecting system for autonomous vehicle. The experiment results show that the proposed algorithm can achieve much higher performance compared to its counterparts.
Using the culturomics approach, the previously unknown strain Marseille-P8953
T
, was isolated and classified within the
Weizmannia
genus. Strain Marseille-P8953
T
was isolated from the faeces of a ...healthy subject and consisted of Gram-stain positive, spore-forming, motile rod-shaped cells. A 99.2% similarity was observed between the 16S rRNA gene of strain Marseille-P8953
T
(accession number LR735539) and that of
Weizmannia coagulans
strain NBRC 12583
T
(accession number KX261624), its closest phylogenetic relative, while the genome of strain Marseille-P8953
T
(3.5 Mpb long, 46.5% GC content) shared the average nucleotide identity by Orthology and digital DNA-DNA Hybridisation values of 95 and 60.4%, respectively. Given the phylogenetic classification and phenotypic characteristics of strain Marseille-P8953
T
, we propose the creation of a new species within the
Weizmannia
genus named
Weizmannia faecalis
(= CSUR P8953
T
= CECT 9904
T
).
Full text
Available for:
EMUNI, FIS, FZAB, GEOZS, GIS, IJS, IMTLJ, KILJ, KISLJ, MFDPS, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, SBMB, SBNM, UKNU, UL, UM, UPUK, VKSCE, ZAGLJ
Automated software engineering has been greatly empowered by the recent advances in Large Language Models (LLMs) for programming. While current benchmarks have shown that LLMs can perform various ...software engineering tasks like human developers, the majority of their evaluations are limited to short and self-contained algorithmic tasks. Solving challenging and practical programming tasks requires the capability of utilizing diverse function calls as tools to efficiently implement functionalities like data analysis and web development. In addition, using multiple tools to solve a task needs compositional reasoning by accurately understanding complex instructions. Fulfilling both of these characteristics can pose a great challenge for LLMs. To assess how well LLMs can solve challenging and practical programming tasks, we introduce Bench, a benchmark that challenges LLMs to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained programming tasks. To evaluate LLMs rigorously, each programming task encompasses 5.6 test cases with an average branch coverage of 99%. In addition, we propose a natural-language-oriented variant of Bench, Benchi, that automatically transforms the original docstrings into short instructions only with essential information. Our extensive evaluation of 60 LLMs shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%. The results underscore the need for further advancements in this area.
Linux kernel stable versions serve the needs of users who value stability of the kernel over new features. The quality of such stable versions depends on the initiative of kernel developers and ...maintainers to propagate bug fixing patches to the stable versions. Thus, it is desirable to consider to what extent this process can be automated. A previous approach relies on words from commit messages and a small set of manually constructed code features. This approach, however, shows only moderate accuracy. In this paper, we investigate whether deep learning can provide a more accurate solution. We propose PatchNet, a hierarchical deep learning-based approach capable of automatically extracting features from commit messages and commit code and using them to identify stable patches. PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of commit code, making it distinctive from the existing deep learning models on source code. Experiments on 82,403 recent Linux patches confirm the superiority of PatchNet against various state-of-the-art baselines, including the one recently-adopted by Linux kernel maintainers.