Past studies have proposed solutions that analyze Stack Overflow content to help users find desired information or aid various downstream software engineering tasks. A common step performed by those ...solutions is to extract suitable representations of posts; typically, in the form of meaningful vectors. These vectors are then used for different tasks, for example, tag recommendation, relatedness prediction, post classification, and API recommendation. Intuitively, the quality of the vector representations of posts determines the effectiveness of the solutions in performing the respective tasks. In this work, to aid existing studies that analyze Stack Overflow posts, we propose a specialized deep learning architecture Post2Vec which extracts distributed representations of Stack Overflow posts. Post2Vec is aware of different types of content present in Stack Overflow posts, i.e., title, description, and code snippets, and integrates them seamlessly to learn post representations. Tags provided by Stack Overflow users that serve as a common vocabulary that captures the semantics of posts are used to guide Post2Vec in its task. To evaluate the quality of Post2Vec's deep learning architecture, we first investigate its end-to-end effectiveness in tag recommendation task. The results are compared to those of state-of-the-art tag recommendation approaches that also employ deep neural networks. We observe that Post2Vec achieves 15-25 percent improvement in terms of F1-score@5 at a lower computational cost. Moreover, to evaluate the value of representations learned by Post2Vec, we use them for three other tasks, i.e., relatedness prediction, post classification, and API recommendation. We demonstrate that the representations can be used to boost the effectiveness of state-of-the-art solutions for the three tasks by substantial margins (by 10, 7, and 10 percent in terms of F1-score, F1-score, and correctness, respectively). We release our replication package at https://github.com/maxxbw/Post2Vec .
Linux kernel stable versions serve the needs of users who value stability of the kernel over new features. The quality of such stable versions depends on the initiative of kernel developers and ...maintainers to propagate bug fixing patches to the stable versions. Thus, it is desirable to consider to what extent this process can be automated. A previous approach relies on words from commit messages and a small set of manually constructed code features. This approach, however, shows only moderate accuracy. In this paper, we investigate whether deep learning can provide a more accurate solution. We propose PatchNet, a hierarchical deep learning-based approach capable of automatically extracting features from commit messages and commit code and using them to identify stable patches. PatchNet contains a deep hierarchical structure that mirrors the hierarchical and sequential structure of commit code, making it distinctive from the existing deep learning models on source code. Experiments on 82,403 recent Linux patches confirm the superiority of PatchNet against various state-of-the-art baselines, including the one recently-adopted by Linux kernel maintainers.
Network-Clustered Multi-Modal Bug Localization Hoang, Thong; Oentaryo, Richard J.; Le, Tien-Duy B. ...
IEEE transactions on software engineering,
10/2019, Volume:
45, Issue:
10
Journal Article
Peer reviewed
Open access
Developers often spend much effort and resources to debug a program. To help the developers debug, numerous information retrieval (IR)-based and spectrum-based bug localization techniques have been ...devised. IR-based techniques process textual information in bug reports, while spectrum-based techniques process program spectra (i.e., a record of which program elements are executed for each test case). While both techniques ultimately generate a ranked list of program elements that likely contain a bug, they only consider one source of information-either bug reports or program spectra-which is not optimal. In light of this deficiency, this paper presents a new approach dubbed Network-clustered Multi-modal Bug Localization (NetML), which utilizes multi-modal information from both bug reports and program spectra to localize bugs. NetML facilitates an effective bug localization by carrying out a joint optimization of bug localization error and clustering of both bug reports and program elements (i.e., methods). The clustering is achieved through the incorporation of network Lasso regularization, which incentivizes the model parameters of similar bug reports and similar program elements to be close together. To estimate the model parameters of both bug reports and methods, NetML employs an adaptive learning procedure based on Newton method that updates the parameters on a per-feature basis. Extensive experiments on 355 real bugs from seven software systems have been conducted to benchmark NetML against various state-of-the-art localization methods. The results show that NetML surpasses the best-performing baseline by 31.82, 22.35, 19.72, and 19.24 percent, in terms of the number of bugs successfully localized when a developer inspects the top 1, 5, and 10 methods and Mean Average Precision (MAP), respectively.
The prediction of the weld bead geometry parameters is an important aspect of welding processes due to it is related to the strength of the welded joint. This research focuses on using statistical ...design techniques and a deep learning neural network to predict the weld bead shape parameters of shielded metal arc welding (SMAW), metal inert gas (MIG), and tungsten inert gas (TIG) welding processes. With the statistical design techniques, experiments were carried out to obtain the data for generating the regression models. Establishing mathematical models that shows the relationship between welding process parameters and weld bead size is significant for practical applications. The mathematical model enables the determination of the weld bead size when setting specific welding process parameters. In this research, experimental research results were obtained to build mathematical models showing the relationship between welding process parameters and weld bead geometries for SMAW, MIG, and TIG welding processes. The research results serve as the basis for establishing predictive systems or optimizing welding process parameters. With deep learning neural network techniques, we developed an artificial intelligence-based system for predicting complicated relations between the welding process parameters and the weld bead size. Both a regression model and the deep learning model result in a good correlation between the welding process parameters and the weld bead geometry.
In the last two decades, the development of satellite technology has made it easier and more convenient to apply remote sensing in warning and solving environmental problems, especially the tracking, ...monitoring, and evaluation of environmental objects. Algae blooms are one of the top environmental issues of concern today. The blooms cause many harmful effects on the water environment and ecosystems in the area, such as reducing dissolved oxygen levels and producing harmful toxins, causing aquatic organisms to lack oxygen and be poisoned and dead. This study presents the research results in monitoring and calculating the concentration of algae in the Dau Tieng Reservoir by remote sensing. By constructing a regression function between monitoring data and qualitative algorithms from Landsat image spectrum reflection, the study conducted quantification and mapping of the distribution of algae concentration in the Dau Tieng Reservoir area. The calculation results show that the qualitative algorithm 3BDA(3) from the spectral bands in the GREEN, RED, and near-infrared NIR wavelength bands shows a reasonable degree of correlation with the monitoring data. Since then, the author has mapped the distribution of algae concentration and the current status of blooming algae on the reservoir at 3-time points. The study's results show the feasibility of applying remote sensing technology in monitoring, evaluating, and analyzing the concentration of algae in Dau Tieng Reservoir. The calculation results are an essential source of advice in managing reservoir water quality to prevent and minimize environmental and ecosystem damage in the area local water bodies.
Wildfire is an environmental hazard that has both local and global effects, causing economic losses and various severe environmental problems. Due to the adverse effects of climate changes and ...anthropogenic activities, wildfire is anticipated more frequent and extreme; therefore, new and more efficient tools for forest fire prevention and control are essential. This study proposes a new deep neural computing approach for spatial prediction of wildfire in a tropical climate area. For this purpose, deep neural computing (Deep-NC) with a structure of 3 hidden layers was proposed. The Rectified Linear Unit (ReLU) activation function was adopted to infer wildfire dangers from the input factors. To search and optimize the weights of the model, Stochastic Gradient Descent (SGD), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), and Adadelta optimizers were employed. Also, this study has established a Geographic Information System (GIS) database for Gia Lai province (Vietnam) to train and verify the newly developed deep computing approach. The twelve ignition factors, namely, slope, aspect, elevation, curvature, land use, NVDI, NDWI, NDMI, temperature, wind speed, relative humidity, and rainfall, have been used to characterize the study area with respect to forest fire susceptibility. According to experimental results, the Adam optimized Deep-NC model delivered the highest predictive accuracy (AUC = 0.894, Kappa = 0.63). Accordingly, this model has been employed to establish a forest fire susceptibility map for Gia Lai province. The proposed Deep-NC model and the newly constructed forest fire susceptibility map can help local authorities in land use planning and hazard mitigation/prevention.
Display omitted
•The performance of Deep-NC for wildfire danger modeling was assessed•The ADAM optimized Deep-NC has the best and outperformed other models.•ADAM based Deep-NC is a tool for spatial prediction of wildfire danger.
The catalytic conversion of lignin model compounds was performed using Ru/C catalysts and an autoclave reactor. The Ru/C catalysts were prepared by the impregnation method using highly porous ...homemade activated carbon and characterized by XRD, SEM, and specific surface area. The catalytic reactions were performed in a high pressure/temperature reactor at different temperatures and with different solvents. The results showed that the novel Ru/C catalysts prepared from carbon supports activated by the KOH agent showed higher catalytic activity than the commercial catalyst. Ethanol and 2-propanol were suitable solvents for the cleavage of the β–O–4 ether bond of 2-phenoxy-1-phenyl ethanol (~65–70% conversion) over a Ru/C-KOH-2 catalyst at 220 °C in comparison to tert-butanol and 1-propanol solvents (~43–47% conversion of 2-phenoxy-1-phenyl ethanol). Also, the increase in reaction temperature from 200 °C to 240 °C enhanced the cleavage of the ether bond with an increase in phenol selectivity from 9.4% to 19.5% and improved the catalytic conversion of 2-phenoxy-1-phenyl ethanol from 46.6% to 98.5% over the Ru/C-KOH-2 catalyst and ethanol solvent. The Ru/C-KOH-2 catalyst showed outstanding conversion (98.5%) of 2-phenoxy-1-phenylethanol at 240 °C, 1 h, ethanol solvent. This novel hierarchical porous activated carbon-supported ruthenium catalyst (Ru/C-KOH-2) can be applied for the further conversion of the lignin compound.
Taxono-genomics is an innovative concept coined for the description of new bacterial species. Phenotypic characteristics were combined with a genomic approach to describe two new species within the
...Clostridium senso stricto
genus:
Clostridium culturomicium
strain CL-6
T
and
Clostridium jeddahitimonense
strain CL-2
T
, both isolated from the gut microbiota of an obese man from Saudi Arabia. Strains CL-6
T
and CL-2
T
shared a similarity of 98.4% with the 16S rRNA gene of
Clostridium subterminale
strain JCM 1417
T
(accession number NR113027) and 98% with that of
Clostridium disporicum
strain DS1
T
(accession number NR026491), respectively. The highest OrthoANI values were shared with
Clostridium punense
for strain CL-6
T
(70.8%) and with
Clostridium disporicum
for strain CL-2
T
(87.1%). Additionally, strain CL-6
T
and strain CL-2
T
shared a 16S rRNA similarity of 91.4%. Both strains were anaerobic, spore-forming and Gram-stain-positive non-motile bacilli. The genome of
Clostridium culturomicium
strain CL-6
T
is 4,325,182 bp long with 32.2% GC content. As for
Clostridium jeddahitimonense
strain CL-2
T
, the genome is 4,074,758 bp long with 29.2% GC content.
Water quality (WQ) pollution is a matter of concern to everyone, and it is necessary to take remedial measures to protect human life. This paper presents the assessment of surface WQ through the ...Water Quality Index (WQI) followed by the Vietnam Standard QCVN 08-MT:2015/BTNMT combined with satellite data for the Dau Tieng reservoir. The implementation method is based on the correlation of each WQ indicator with the spectral features of satellite images to build a regression function, simulating the spatial distribution of the entire reservoir. The Sentinel-2A satellite images were used with reflective bands in the visible and near-infrared (NIR) spectrum region. The regression functions showed that the correlation of the WQ indicators was related to single bands, and the band ratios include blue and NIR, red/blue, green/blue, and NIR/blue bands. Simulation results of the WQI spatial distribution expressed that the WQ of the Dau Tieng reservoir was in a polluted state at the level of WQI fluctuation in the range of 0–50 for the most part. Dau Tieng irrigation system plays a vital role in the water distribution system for Tay Ninh province and the southern key economic region. Therefore, appropriate treatment measures should be taken if it needs to be used for domestic water supply purposes.
Are We Ready to Embrace Generative AI for Software Q&A? Xu, Bowen; Nguyen, Thanh-Dat; Le-Cong, Thanh ...
2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE),
2023-Sept.-11
Conference Proceeding
Stack Overflow, the world's largest software Q&A (SQA) website, is facing a significant traffic drop due to the emergence of generative AI techniques. ChatGPT is banned by Stack Overflow after only 6 ...days from its release. The main reason provided by the official Stack Overflow is that the answers generated by ChatGPT are of low quality. To verify this, we conduct a comparative evaluation of human-written and ChatGPT-generated answers. Our methodology employs both automatic comparison and a manual study. Our results suggest that human-written and ChatGPT-generated answers are semantically similar, however, human-written answers outperform ChatGPT-generated ones consistently across multiple aspects, specifically by 10% on the overall score. We release the data, analysis scripts, and detailed results at https://github.com/maxxbw54/GAI4SQA.