Fairness has become a critical value online, and the latest studies consider it in many problems. In recommender systems, fairness is important since the visibility of items is controlled by systems. ...Previous fairness-aware recommender systems assume that sufficient relationship data between users and items are available. However, it is common that new users and items are frequently introduced, and they have no relationship data yet. In this paper, we study recommendation methods to enhance fairness in a cold-start state. Fairness is more significant when the preference of a user or the popularity of an item is unknown. We propose a meta-learning-based cold-start recommendation framework called FaRM to alleviate the unfairness of recommendations. The proposed framework consists of three steps. We first propose a fairness-aware meta-path generation method to eliminate bias in sensitive attributes. In addition, we construct fairness-aware user representations through the meta-path aggregation approach. Then, we propose a novel fairness objective function and introduce a joint learning method to minimize the trade-off between relevancy and fairness. In extensive experiments with various cold-start scenarios, it is shown that FaRM is significantly superior in fairness performance while preserving relevance accuracy over previous work.
Code comments explain the operational process of a computer program and increase the long-term productivity of programming tasks such as debugging and maintenance. Therefore, developing methods that ...automatically generate natural language comments from programming code is required. With the development of deep learning, various excellent models in the natural language processing domain have been applied for comment generation tasks, and recent studies have improved performance by simultaneously using the lexical information of the code token and the syntactical information obtained from the syntax tree. In this paper, to improve the accuracy of automatic comment generation, we introduce a novel syntactic sequence, Code-Aligned Type sequence (CAT), to align the order and length of lexical and syntactic information, and we propose a new neural network model, Aligned Lexical and Syntactic information-Transformer (ALSI-Transformer), based on a transformer that encodes the aligned multi-modal information with convolution and embedding aggregation layers. Through in-depth experiments, we compared ALSI-Transformer with current baseline methods using standard machine translation metrics and demonstrate that the proposed method achieves state-of-the-art performance in code comment generation.
World Wide Web is the most useful source of information. In order to achieve high productivity of publishing, the webpages in many websites are automatically populated by using the common templates ...with contents. The templates provide readers easy access to the contents guided by consistent structures. However, for machines, the templates are considered harmful since they degrade the accuracy and performance of web applications due to irrelevant terms in templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. In this paper, we present novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates. We cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. We develop a novel goodness measure with its fast approximation for clustering and provide comprehensive analysis of our algorithm. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms.
In general, as the amount of training data is increased, a deep learning model gains a higher training accuracy. To assign labels to training data for use in supervised learning, human resources are ...required, which incur temporal and economic costs. Therefore, if a sufficient amount of training data cannot be constructed owing to existing cost constraints, it becomes necessary to select the training data that can maximize the accuracy of the deep learning model with only a limited amount of training data. However, although conventional studies on such training data selections take into consideration the training data labeling cost, the selection cost required in the training data selection process is not taken into consideration, which is a problem. Therefore, with the consideration of the selection cost constraint in addition to the data labeling cost constraint, we introduce a training data selection problem and propose novel algorithms to solve it. The advantage of the proposed algorithms is that they can be applied to any network model or data model of deep learning. The performance was verified through experiments using various network models and data.
Currently, the size of data becomes much larger and the distributed data processing is getting very important to manage the huge size of data. The MapReduce well known as Google’s data processing ...environment is the most popular distributed platform with good scalability and fault tolerance. Many traditional algorithms in the single machine environment are being adopted to the MapReduce platform. In this paper we analyze a novel algorithm to generate wavelet synopses on the distributed MapReduce framework. Wavelet synopsis is one of the most popular dimensionality reduction methods and has been studied in various areas such as query optimization, approximate query answering, feature selection, etc. In the proposed algorithm, the wavelet synopsis can be calculated by a single MapReduce phase, and, by minimizing the amount of data communicated through the network of the distributed MapReduce platform, all computations are processed within almost linear time complexity. We theoretically study the properties of constructing wavelet synopsis on partitioned data sets and the correctness of the proposed algorithm.
NoSQL systems are increasingly adopted for Web applications requiring scalability that relational database systems cannot meet. Although NoSQL systems have not been designed to support joins, as they ...are applied to a wide variety of applications, the need to support joins has emerged. Furthermore, joins performed in NoSQL systems are generally similarity joins, rather than exact-match joins, which find similar pairs of records. Since Web applications often use the MapReduce framework, we develop a solution to perform similarity joins in NoSQL systems using the MapReduce framework.
•We developed a set-similarity join solution in NoSQL using MapReduce.•Our set-similarity join algorithm can avoid redundant comparisons between join attribute values in the MapReduce framework.•We decreased substantially the amount of network traffic in the MapReduce framework.•We reduced the number of comparisons to find all similar pairs by extending the prefix filtering technique for the MapReduce Framework.•Our solution resulted in up to an order of magnitude improvement in performance over the most efficient existing solution.
The Internet and Web technologies have originally been developed assuming an ideal world where all users are honorable. However, the dark side has emerged and bedeviled the world. This includes spam, ...malware, hacking, phishing, denial of service attacks, click fraud, invasion of privacy, defamation, frauds, violation of digital property rights, etc. The responses to the dark side of the Internet have included technologies, legislation, law enforcement, litigation, public awareness efforts, etc. In this paper, we explore and provide taxonomies of the causes and costs of the attacks, and types of responses to the attacks.
►We provide taxonomy of the dark side of the Internet; a summary of the damages done by people misusing or abusing the Internet. ►We analyze the causes of the dark side. ►We analyze the technology responses to the dark side. ►We provide a brief prognosis of the future of the dark side.
A nearly complete collection of gene-deletion mutants (96% of annotated open reading frames) of the yeast Saccharomyces cerevisiae has been systematically constructed. Tag microarrays are widely used ...to measure the fitness of each mutant in a mutant mixture. The tag array experiments can have a complex experimental design, such as time course measurements and drug treatment with multiple dosages.
TagSmart is a web application for analysis and visualization of Saccharomyces cerevisiae mutant fitness data measured by tag microarrays. It implements a robust statistical approach to assess the concentration differences among S. cerevisiae mutant strains. It also provides an interactive environment for data analysis and visualization. TagSmart has the following advantages over previously described analysis procedures: 1) it is user-friendly software rather than merely a description of analytical procedure; 2) It can handle complicated experimental designs, such as multiple time points and treatment with multiple dosages; 3) it has higher sensitivity and specificity; 4) It allows users to mask out "bad" tags in the analysis. Two biological tests were performed to illustrate the performance of TagSmart. First, we generated titration mixtures of mutant strains, in which the relative concentration of each strain was controlled. We used tag microarrays to measure the numbers of tag copies in each titration mixture. The data was analyzed with TagSmart and the result showed high precision and recall. Second, TagSmart was applied to a dataset in which heterozygous deletion strain mixture pools were treated with a new drug, Cincreasin. TagSmart identified 53 mutant strains as sensitive to Cincreasin treatment. We individually tested each identified mutant, and found 52 out of the 53 predicted mutants were indeed sensitive to Cincreasin.
TagSmart is provided "as is" to analyze tag array data produced by Affymetrix and Agilent arrays. TagSmart web application is assessable by Windows, Mac, and Linux users. It also has a downloadable version for execution on PCs running Windows. TagSmart is available for academic use at: http://biocomp.bioen.uiuc.edu/tagsmart.
As the Internet flourishes, online advertising becomes essential for marketing campaigns for business applications. To perform a marketing campaign, advertisers provide their advertisements to ...Internet publishers and commissions are paid to the publishers of the advertisements based on the clicks made for the posted advertisements or the purchases of the products of which advertisements posted. Since the payment given to a publisher is proportional to the amount of clicks received for the advertisements posted by the publisher, dishonest publishers are motivated to inflate the number of clicks on the advertisements hosted on their web sites. Since the click frauds are critical for online advertising to be reliable, the online advertisers make the efforts to prevent them effectively. However, the methods used for click frauds are also becoming more complex and sophisticated.
In this paper, we study the problem of detecting coalition attacks of click frauds. The coalition attacks of click fraud is one of the latest sophisticated techniques utilized for click frauds because the fraudsters can obtain not only more gain but also less probability of being detected by joining a coalition. We introduce new definitions for the coalition and propose the novel algorithm called CATCH to find such coalitions. Extensive experiments with synthetic and real-life data sets confirm that our notion of coalition allows us to detect coalitions much more effectively than that of previous work.
► We study the problem of detecting coalition attacks of click fraud. ► We introduce a new definition of coalition attack based on the ratio of gain to cost. ► We develop a novel algorithm called CATCH to efficiently detect coalition attacks. ► Experiments confirm the effectiveness and efficiency of our notion of coalition.