Software defect prediction is one of the most popular research topics in software engineering. It aims to predict defect-prone software modules before defects are discovered, therefore it can be used ...to better prioritise software quality assurance effort. In recent years, especially for recent 3 years, many new defect prediction studies have been proposed. The goal of this study is to comprehensively review, analyse and discuss the state-of-the-art of defect prediction. The authors survey almost 70 representative defect prediction papers in recent years (January 2014–April 2017), most of which are published in the prominent software engineering journals and top conferences. The selected defect prediction papers are summarised to four aspects: machine learning-based prediction algorithms, manipulating the data, effort-aware prediction and empirical studies. The research community is still facing a number of challenges for building methods and many research opportunities exist. The identified challenges can give some practical guidelines for both software engineering researchers and practitioners in future software defect prediction.
Heterogeneous defect prediction (HDP) refers to predicting defect-proneness of software modules in a target project using heterogeneous metric data from other projects. Existing HDP methods mainly ...focus on predicting target instances with single source. In practice, there exist plenty of external projects. Multiple sources can generally provide more information than a single project. Therefore, it is meaningful to investigate whether the HDP performance can be improved by employing multiple sources. However, a precondition of conducting HDP is that the external sources are available. Due to privacy concerns, most companies are not willing to share their data. To facilitate data sharing, it is essential to study how to protect the privacy of data owners before they release their data. In this paper, we study the above two issues in HDP. Specifically, to utilize multiple sources effectively, we propose a multi-source selection based manifold discriminant alignment (MSMDA) approach. To protect the privacy of data owners, a sparse representation based double obfuscation algorithm is designed and applied to HDP. Through a case study of 28 projects, our results show that MSMDA can achieve better performance than a range of baseline methods. The improvement is 3.4-15.3 percent in g-measure and 3.0-19.1 percent in AUG.
Person re-identification has been widely studied due to its importance in surveillance and forensics applications. In practice, gallery images are high resolution (HR), while probe images are usually ...low resolution (LR) in the identification scenarios with large variation of illumination, weather, or quality of cameras. Person re-identification in this kind of scenarios, which we call super-resolution (SR) person re-identification, has not been well studied. In this paper, we propose a semi-coupled low-rank discriminant dictionary learning (SLD 2 L) approach for SR person re-identification task. With the HR and LR dictionary pair and mapping matrices learned from the features of HR and LR training images, SLD 2 L can convert the features of the LR probe images into HR features. To ensure that the converted features have favorable discriminative capability and the learned dictionaries can well characterize intrinsic feature spaces of the HR and LR images, we design a discriminant term and a low-rank regularization term for SLD 2 L. Moreover, considering that low resolution results in different degrees of loss for different types of visual appearance features, we propose a multi-view SLD 2 L (MVSLD 2 L) approach, which can learn the type-specific dictionary pair and mappings for each type of feature. Experimental results on multiple publicly available data sets demonstrate the effectiveness of our proposed approaches for the SR person re-identification task.
Background
This study aimed at exploring high-risk factors associated with survival outcomes in patients with advanced primary laryngeal carcinoma and at developing and validating a ...survival-predicting model to help to select the appropriate treatment for each patient.
Methods
Data of patients with advanced primary laryngeal cancer in 2003–2015 were extracted from the Surveillance, Epidemiology, and End Results (SEER) database. High-risk factors were identified and integrated to build a nomogram, which was internally validated using bootstrap and externally validated with a patient cohort from China. The impact of various treatments was examined on model-defined high-, moderate- and low-risk patient groups, respectively.
Results
A total of 6070 patients were analyzed. Patients’ age, gender, tumor T stage, N stage, and differentiation grade were recognized and integrated into the model. The concordance index of this model (0.602) was significantly higher than that of the TNM staging system (0.547). The calibration curve showed a good agreement between model-predicted and actual survival outcomes. Patients were categorized into three different subgroups with incremental risks of overall mortality. The roles of three treatment strategies in these subgroups are varied.
Conclusion
In this large SEER-based study, we established a practical model to predict overall survival for patients with advanced primary laryngeal cancer. For patients identified as high-risk and moderate-risk, surgery plus adjuvant therapy is recommended, while for patients in the low-risk group, surgery alone plus regular re-examination is recommended as the primary treatment strategy.
As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which ...can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can be used to indicate change or loss of protein function, integrating these protein carbonylation site data has been a promising method in prediction. Based on these protein carbonylation site data, some protein carbonylation prediction methods have been proposed. However, most data is highly class imbalanced, and the number of un-carbonylation sites greatly exceeds that of carbonylation sites. Unfortunately, existing methods have not addressed this issue adequately. In this work, we propose a novel two-way rebalancing strategy based on the attention technique and generative adversarial network (Carsite_AGan) for identifying protein carbonylation sites. Specifically, Carsite_AGan proposes a novel undersampling method based on attention technology that allows sites with high importance value to be selected from un-carbonylation sites. The attention technique can obtain the value of each sample's importance. In the meanwhile, Carsite_AGan designs a generative adversarial network-based oversampling method to generate high-feasibility carbonylation sites. The generative adversarial network can generate high-feasibility samples through its generator and discriminator. Finally, we use a classifier like a nonlinear support vector machine to identify protein carbonylation sites. Experimental results demonstrate that our approach significantly outperforms other resampling methods. Using our approach to resampling carbonylation data can significantly improve the effect of identifying protein carbonylation sites.
Multi-spectral face recognition has been attracting increasing interest. In the last decade, several multi-spectral face recognition methods have been presented. However, it has not been well studied ...that how to jointly learn effective features with favorable discriminability from multiple spectra even when multi-spectral face images are severely contaminated by noise. Multi-view dictionary learning is an effective feature learning technique, which learns dictionaries from multiple views of the same object and has achieved state-of-the-art classification results. In this paper, we for the first time introduce the multi-view dictionary learning technique into the field of multi-spectral face recognition and propose a multi-spectral low-rank structured dictionary learning (MLSDL) approach. It learns multiple structured dictionaries, including a spectrum-common dictionary and multiple spectrum-specific dictionaries, which can fully explore both the correlated information and the complementary information among multiple spectra. Each dictionary contains a set of class-specified sub-dictionaries. Based on the low-rank matrix recovery theory, we apply low-rank regularization in multi-spectral dictionary learning procedure such that MLSDL can well solve the problem of multi-spectral face recognition with high levels of noise. We also design the low-rank structural incoherence term for multi-spectral dictionary learning, so as to reduce the redundancy among multiple spectrum-specific dictionaries. In addition, to enhance the efficiency of classification procedure, we design a low-rank structured collaborative representation classification scheme for MLSDL. Experimental results on HK PolyU, CMU and UWA hyper-spectral face databases demonstrate the effectiveness of the proposed approach.
•We propose a multi-spectral low-rank structured dictionary learning approach.•We learn spectrum-common dictionary and spectrum-specific dictionaries.•Low-rank structured regularization and incoherence terms are designed.•Low-rank structured collaborative representation classification is provided.
Abstract
Background
Second primary malignancy (SPM) represents the leading long-term cause of death among patients with index head and neck squamous cell carcinoma (HNSCC). We aimed to quantify the ...association between postoperative radiotherapy (PORT) and the risk of SPM development for index HNSCC among adolescent and young patients, who are particularly vulnerable to radiation-associated impacts due to their increased tissue susceptibilities and longer life expectancies.
Methods
This study was conducted using the Surveillance, Epidemiology, and End Results (SEER) database to collect the data of 5 year survivors of index young-onset HNSCC from 1975 to 2011. The outcome of interest was SPM, a new, metachronous malignancy after the index HNSCC. Standardized incidence ratios (SIRs) and excess absolute risks (EARs) were used to quantify the PORT-associated risks externally, and relative risks (RRs) were estimated by the multivariate Poisson regression analysis to quantify the PORT-associated risks internally.
Results
Of the included 2771 5 year survivors with index young-onset HNSCCs, the receipt of PORT (37.6%) was associated with higher risk of SPMs (RR, 1.23; 95% CI 1.07 to 1.43). PORT-associated risks were elevated for the majority of sites, including head and neck (RR, 1.19; 95% CI 0.95 to 1.50) and lung (RR, 1.67; 95% CI 1.18 to 2.34). With regarding to the subsites of head and neck, RRs were above unity in oral cavity squamous cell carcinoma (SCC) (RR, 1.68; 95% CI 1.39 to 2.03) and laryngeal SCC (RR, 1.02; 95% CI 0.73 to 1.43). A relatively greater RR was observed for patients younger than 35 years (RR, 1.44, 95% CI 0.37 to 5.57) and those diagnosed with localized diseases (RR, 1.16, 95% CI 0.9 to 1.5). PORT-associated risks were increased remarkably after 15 years of follow-up (RR, 1.24; 95% CI 0.97 to 1.58).
Conclusions
An association was discovered between PORT treatment and increased long-term risk of SPM among patients with index young-onset HNSCC. The findings suggest long-term follow-up surveillance for these patients, particularly those with oral cavity SCC or laryngeal SCC.
Colorization is the computer-assisted application of color to a gray scale image, which presents two problems to modern deep learning-based approaches. One is to provide colorization models with both ...high expressibility and strong learning ability, as current models have difficulty both excelling at coloring and being easy to train. The other is to return a picture without uneven overlap. This paper proposes a deep convolutional network framework called Color-UNet++ for the end-to-end solution of these colorization problems. Color-UNet++ is adjusted to settle gradient dispersion and explosion by capturing more transfer and intermediate results during backpropagation. We adjust the de-convolution structure to solve the problem of uneven overlap. We design the model in YUV instead of RGB color space, with an objective function that is appropriate to the coloring problem and can capture a wide range of colors. A large number of experimental results on LFW and LSUN datasets confirm the method’s superiority.
Hypopharyngeal squamous cell carcinoma (HPSCC) has the worst prognosis among head and neck squamous cell carcinomas. The lack of available tumor cell lines poses a significant obstacle to the ...development of efficient treatments for HPSCC. In this study, we successfully established a novel cell line, named CZH1, from the postcricoid region of a Chinese male patient with a T3N0M0 HPSCC. Short tandem repeat analysis confirmed the uniqueness of CZH1. The cell line was characterized by its phenotypes, biomarkers, and genetics. Importantly, CZH1 cells retained the typical features of epithelial malignancy, similar to the primary tumor tissue. Furthermore, CZH1 demonstrated a greater capacity for invasion and increased susceptibility to irradiation in comparison to FaDu, which is the most commonly used HPSCC cell line. Whole-exome sequencing analysis revealed that CZH1 cells had typical genomic features of HNSCC, including mutations of
TP53
and amplifications of multiple transcripts. Therefore, our newly developed CZH1 cell line could serve as an efficient tool for the in vitro investigation of the etiology, pathogenesis, and preclinical treatment of HPSCC.
Person re-identification plays an important role in video surveillance and forensics applications. In many cases, person re-identification needs to be conducted between image and video clip, e.g., ...re-identifying a suspect from large quantities of pedestrian videos given a single image of the suspect. We call re-identification in this scenario as image to video person reidentification (IVPR). In practice, image and video are usually represented with different features, and there usually exist large variations between frames within each video. These factors make matching between image and video become a very challenging task. In this paper, we propose a joint feature projection matrix and heterogeneous dictionary pair learning (PHDL) approach for IVPR. Specifically, the PHDL jointly learns an intra-video projection matrix and a pair of heterogeneous image and video dictionaries. With the learned projection matrix, the influence caused by the variations within each video on the matching can be reduced. With the learned dictionary pair, the heterogeneous image and video features can be transformed into coding coefficients with the same dimension, such that the matching can be conducted by using the coding coefficients. Furthermore, to ensure that the obtained coding coefficients own favorable discriminability, the PHDL designs a point-to-set coefficient discriminant term. To make better use of the complementary spatial-temporal and visual appearance information contained in pedestrian video data, we further propose a multi-view PHDL approach, which can fuse different video information effectively in the dictionary learning process. Experiments on four publicly available person sequence data sets demonstrate the effectiveness of the proposed approaches.