Social collaborative platforms such as GitHub and Stack Overflow have been increasingly used to improve work productivity via collaborative efforts. To improve user experiences in these platforms, it ...is desirable to have a recommender system that can suggest not only items (e.g., a GitHub repository) to a user, but also activities to be performed on the suggested items (e.g., forking a repository). To this end, we propose a new approach dubbed Keen2Act, which decomposes the recommendation problem into two stages: the Keen and Act steps. The Keen step identifies, for a given user, a (sub)set of items in which he/she is likely to be interested. The Act step then recommends to the user which activities to perform on the identified set of items. This decomposition provides a practical approach to tackling complex activity recommendation tasks while producing higher recommendation quality. We evaluate our proposed approach using two real-world datasets and obtain promising results whereby Keen2Act outperforms several baseline models.
Kakovost vode (WQ) je skrb vs eh, zato je treba sprej eti sanacijske ukrepe za zascito človeških življenj. V tem príspevku je predstavljena ocena kakovosti površinskih voda napodlagi indeksa ...kakovosti vode (WQI) po vietnamském standardu QCVN 08-MT2015/BTNMT v kombinaciji s satelitskimi podatki za zbiralnik Dau Tieng. Metoda izvajanja temelji na korelaciji vseh kazalnikov kakovosti vode s spektrálními znacilnostmi satelitskih posnetkov za oblikovanje regresijske funkcije, ki simulira prostorsko porazdelitev celotnega rezervoara. Uporabljeni so bili satelitskiposnetki Sentinel'2A z odsevnimipašovi v vidném in bližnjem infrardečem (NIR) spektrálném obmocju. Regresijske funkcije so pokazale, daje korelacija kazalnikov povezana s posameznimi pašovi, razmerja med pašovi pa vključujejo modre in NIR, rdece/modre, zelene/modre ter NIRJmodre pasové. Rezultati simulacije prostorske porazdelitve WQI so pokazali, da je voda v rezervoarju Dau Tieng većinama onesnažena, pri čemer je indeks WQI v razponu od 0 do 50. Namakalni sistem Dau Tieng ima pomembno vlogo v sistemu distribucije vode za provinco Tay Ninh in pomembno južno gospodarsko regijo. Zato je treba pred njegovo morebitno uporabo za oskrbo gospodinjstev z vodo sprejeti ustrezne ukrepe za čišćenje.
Full text
Available for:
IZUM, KILJ, NUK, ODKLJ, PILJ, PNG, SAZU, UL, UM, UPUK
This work proposes PatchNet, an automated tool based on hierarchical deep learning for classifying patches by extracting features from log messages and code changes. PatchNet contains a deep ...hierarchical structure that mirrors the hierarchical and sequential structure of a code change, differentiating it from the existing deep learning models on source code. PatchNet provides several options allowing users to select parameters for the training process. The tool has been validated in the context of automatic identification of stable-relevant patches in the Linux kernel and is potentially applicable to automate other software engineering tasks that can be formulated as patch classification problems. Our video demonstration on the performance of PatchNet is publicly available at https://goo.gl/CZjG6X.
Social collaborative platforms such as GitHub and Stack Overflow have been increasingly used to improve work productivity via collaborative efforts. To improve user experiences in these platforms, it ...is desirable to have a recommender system that can suggest not only items (e.g., a GitHub repository) to a user, but also activities to be performed on the suggested items (e.g., forking a repository). To this end, we propose a new approach dubbed Keen2Act, which decomposes the recommendation problem into two stages: the Keen and Act steps. The Keen step identifies, for a given user, a (sub)set of items in which he/she is likely to be interested. The Act step then recommends to the user which activities to perform on the identified set of items. This decomposition provides a practical approach to tackling complex activity recommendation tasks while producing higher recommendation quality. We evaluate our proposed approach using two real-world datasets and obtain promising results whereby Keen2Act outperforms several baseline models.
Software quality assurance efforts often focus on identifying defective code. To find likely defective code early, change-level defect prediction - aka. Just-In-Time (JIT) defect prediction - has ...been proposed. JIT defect prediction models identify likely defective changes and they are trained using machine learning techniques with the assumption that historical changes are similar to future ones. Most existing JIT defect prediction approaches make use of manually engineered features. Unlike those approaches, in this paper, we propose an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects. Experiments on two popular software projects (i.e., QT and OPENSTACK) on three evaluation settings (i.e., cross-validation, short-period, and long-period) show that the best variant of DeepJIT (DeepJIT-Combined), compared with the best performing state-of-the-art approach, achieves improvements of 10.36-11.02% for the project QT and 9.51-13.69% for the project OPENSTACK in terms of the Area Under the Curve (AUC).
Developers often spend much effort and resources to debug a program. To help the developers debug, numerous information retrieval (IR)-based and spectrum-based bug localization techniques have been ...devised. IR-based techniques process textual information in bug reports, while spectrum-based techniques process program spectra (i.e., a record of which program elements are executed for each test case). While both techniques ultimately generate a ranked list of program elements that likely contain a bug, they only consider one source of information--either bug reports or program spectra--which is not optimal. In light of this deficiency, this paper presents a new approach dubbed Network-clustered Multi-modal Bug Localization (NetML), which utilizes multi-modal information from both bug reports and program spectra to localize bugs. NetML facilitates an effective bug localization by carrying out a joint optimization of bug localization error and clustering of both bug reports and program elements (i.e., methods). The clustering is achieved through the incorporation of network Lasso regularization, which incentivizes the model parameters of similar bug reports and similar program elements to be close together. To estimate the model parameters of both bug reports and methods, NetML employs an adaptive learning procedure based on Newton method that updates the parameters on a per-feature basis. Extensive experiments on 355 real bugs from seven software systems have been conducted to benchmark NetML against various state-of-the-art localization methods. The results show that NetML surpasses the best-performing baseline by 31.82%, 22.35%, 19.72%, and 19.24%, in terms of the number of bugs successfully localized when a developer inspects the top 1, 5, and 10 methods and Mean Average Precision (MAP), respectively.
Efficient and commuter friendly public transportation system is a critical part of a thriving and sustainable city. As cities experience fast growing resident population, their public transportation ...systems will have to cope with more demands for improvements. In this paper, we propose a crowdsensing and analysis framework to gather and analyze realtime commuter feedback from Twitter. We perform a series of text mining tasks identifying those feedback comments capturing bus related micro-events; extracting relevant entities; and, predicting event and sentiment labels. We conduct a series of experiments involving more than 14K labeled tweets. The experiments show that incorporating domain knowledge or domain specific labeled data into text analysis methods improves the accuracies of the above tasks. We further apply the tasks on nearly 200M public tweets from Singapore over a six month period to show that interesting insights about bus services and bus events can be derived in a scalable manner.
Artificial Intelligence (AI) systems, which benefit from the availability of large-scale datasets and increasing computational power, have become effective solutions to various critical tasks, such ...as natural language understanding, speech recognition, and image processing. The advancement of these AI systems is inseparable from open-source software (OSS). Specifically, many benchmarks, implementations, and frameworks for constructing AI systems are made open source and accessible to the public, allowing researchers and practitioners to reproduce the reported results and broaden the application of AI systems. The development of AI systems follows a data-driven paradigm and is sensitive to hyperparameter settings and data separation. Developers may encounter unique problems when employing open-source AI repositories.This paper presents an empirical study that investigates the issues in the repositories of open-source AI repositories to assist developers in understanding problems during the process of employing AI systems. We collect 576 repositories from the PapersWithCode platform. Among these repositories, we find 24,953 issues by utilizing GitHub REST APIs. Our empirical study includes three phases. First, we manually analyze these issues to categorize the problems that developers are likely to encounter in open-source AI repositories. Specifically, we provide a taxonomy of 13 categories related to AI systems. The two most common issues are runtime errors (23.18%) and unclear instructions (19.53%). Second, we see that 67.5% of issues are closed. We also find that half of these issues resolve within four days. Moreover, issue management features, e.g., label and assign, are not widely adopted in open-source AI repositories. In particular, only 7.81% and 5.9% of repositories label issues and assign these issues to assignees, respectively. Finally, we empirically show that employing GitHub issue management features and writing issues with detailed descriptions facilitate the resolution of issues. Based on our findings, we make recommendations for developers to help better manage the issues of open-source AI repositories and improve their quality.
The advent of Generative AI has marked a significant milestone in artificial intelligence, demonstrating remarkable capabilities in generating realistic images, texts, and data patterns. However, ...these advancements come with heightened concerns over data privacy and copyright infringement, primarily due to the reliance on vast datasets for model training. Traditional approaches like differential privacy, machine unlearning, and data poisoning only offer fragmented solutions to these complex issues. Our paper delves into the multifaceted challenges of privacy and copyright protection within the data lifecycle. We advocate for integrated approaches that combines technical innovation with ethical foresight, holistically addressing these concerns by investigating and devising solutions that are informed by the lifecycle perspective. This work aims to catalyze a broader discussion and inspire concerted efforts towards data privacy and copyright integrity in Generative AI.CCS CONCEPTS* Software and its engineering Software architectures; * Information systems World Wide Web; * Security and privacy Privacy protections; * Social and professional topics Copyrights; * Computing methodologies Machine learning.