On June 17, 2013, MilkyWay-2 (Tianhe-2) supercomputer was crowned as the fastest supercomputer in the world on the 41th TOP500 list. This paper provides an overview of the MilkyWay-2 project and ...describes the design of hardware and software systems. The key architecture features of MilkyWay-2 are highlighted, including neo-heterogeneous compute nodes integrating commodity-off-the-shelf processors and accelerators that share similar instruction set architecture, powerful networks that employ proprietary interconnection chips to support the massively parallel message-passing communications, proprietary 16-core processor designed for scientific computing, efficient software stacks that provide high performance file system, emerging programming model for heterogeneous systems, and intelligent system administration. We perform extensive evaluation with wide-ranging applications from LINPACK and Graph500 benchmarks to massively parallel software deployed in the system.
On the 41st Top500 list announced in June 2013, the MilkyWay-2 system produced by National University of Defense Technology (NUDT) in China won the first place with a LINPACK test result of 33.86 ...PFLOPS. It has been one and a half year since its predecessor, MilkyWay-1 (TH-1), reached the same place for the first time. On the newest Top500 list published in November 2013, MilkyWay-2 continued to win the champion.
The knowledge-enhanced pre-trained language models attempt to use the structured knowledge stored in the knowledge graph to strengthen the pre-trained language models, so that they can learn not only ...the general semantic knowledge from the free text, but also the factual entity knowledge behind the text. In this way, the enhanced models can effectively solve downstream knowledge-driven tasks. Although this is a promising research direction, the current works are still in the exploratory stage, and there is no comprehensive summary and systematic arrangement. This paper aims to address the lack of comprehensive reviews of this direction. To this end, on the basis of summarizing and sorting out a large number of relevant works, this paper firstly explains the background information from three aspects: the reasons, the advantages, and the difficulties of introducing knowledge, summarizes the basic concepts involved in the knowledge-enhanced pre-trained language models. Then, it discusses three types of knowledge
There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be ...efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions.
To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome.
Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.
The Type II clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated proteins (Cas) is a powerful genome editing technology, which is more and more popular in gene ...function analysis. In CRISPR/Cas, RNA guides Cas nuclease to the target site to perform DNA modification.
The performance of CRISPR/Cas depends on well-designed single guide RNA (sgRNA). However, the off-target effect of sgRNA leads to undesired mutations in genome and limits the use of CRISPR/Cas. Here, we present OffScan, a universal and fast CRISPR off-target detection tool.
OffScan is not limited by the number of mismatches and allows custom protospacer-adjacent motif (PAM), which is the target site by Cas protein. Besides, OffScan adopts the FM-index, which efficiently improves query speed and reduce memory consumption.
Bug reports, as a frequently consulted software asset, are maintained and evolved in software communities. A large number of bug reports with complex discussions are accumulated during the software ...evolution. It has been proven that an accurate and concise summary can help developers reduce the time effort spent going through the entire content of bug reports. Prior works select salient sentences that contain the most semantic information to form summaries. Their performance is limited due to the lack of consideration of controversial standpoints among developers’ comments and the redundancy in sentences. In this paper, we study the possibility of assessing comments’ opinions from discussions, and which kind of sentences are more likely to have redundant information. Based on these studies, we propose two new factors,
Believability
and
Informativeness
. The former measures the degree of approved or disapproved to a sentence within discussions, and the latter assesses the amount of information contained in the summary. Accordingly, we design
BugSum
, a supervised approach to generate summaries with a two-phase method. In the measuring phase, we propose a classification method that combines the advantages of Deep Pyramid CNN and Random Forest to assess the believability of sentences in bug reports. In the selection phase, BugSum integrates an auto-encoder network for semantic feature extraction with the believability of sentences, and optimizes the informativeness of generated summaries through a dynamic selection of salient sentences. Extensive experiments show that our approach outperforms 8 comparative approaches over two public datasets and one customized dataset. In particular, the probability of adding controversial sentences that are clearly disapproved by other developers into the summary is reduced by up to 64.7%.
Despite the importance of log statements in postmortem debugging, developers are difficult to establish good logging practices. There are mainly two reasons. First, there are no rigorous ...specifications or systematic processes to instruct logging practices. Second, logging code evolves with bug fixes or feature updates. Without considering the impact of software evolution, previous works on log enhancement can partially release the first problem but are hard to solve the latter. To fill this gap, this paper proposes to guide log revisions by learning from evolution history. Motivated by code clones, we assume that logging code with similar context is pervasive and deserves similar modifications and conduct an empirical study on 12 open-source projects to validate our assumption. Upon this, we design and implement LogTracker, an automatic tool that learns log revision rules by mining the correlation between logging context and modifications and recommends candidate log revisions by applying these rules. With an enhanced modeling of logging context, LogTracker can instruct more intricate log revisions that cannot be covered by existing tools. Our experiments show that LogTracker can detect 369 instances of candidates when applied to the latest versions of software. So far, we have reported 79 of them, and 52 have been accepted.
In this paper, we present a new fourth-order method for finding multiple roots of nonlinear equations. It requires one evaluation of the function and two of its first derivative per iteration. ...Finally, some numerical examples are given to show the performance of the presented method compared with some known third-order methods.
Influence Maximization aims to find the top-(K) influential individuals to maximize the influence spread within a social network, which remains an important yet challenging problem. Proven to be ...NP-hard, the influence maximization problem attracts tremendous studies. Though there exist basic greedy algorithms which may provide good approximation to optimal result, they mainly suffer from low computational efficiency and excessively long execution time, limiting the application to large-scale social networks. In this paper, we present IMGPU, a novel framework to accelerate the influence maximization by leveraging the parallel processing capability of graphics processing unit (GPU). We first improve the existing greedy algorithms and design a bottom-up traversal algorithm with GPU implementation, which contains inherent parallelism. To best fit the proposed influence maximization algorithm with the GPU architecture, we further develop an adaptive K-level combination method to maximize the parallelism and reorganize the influence graph to minimize the potential divergence. We carry out comprehensive experiments with both real-world and sythetic social network traces and demonstrate that with IMGPU framework, we are able to outperform the state-of-the-art influence maximization algorithm up to a factor of 60, and show potential to scale up to extraordinarily large-scale networks.
VISPR is an interactive visualization and analysis framework for CRISPR screening experiments. However, it only supports the output of MAGeCK, and requires installation and manual configuration. ...Furthermore, VISPR is designed to run on a single computer, and data sharing between collaborators is challenging.
To make the tool easily accessible to the community, we present VISPR-online, a web-based general application allowing users to visualize, explore, and share CRISPR screening data online with a few simple steps. VISPR-online provides an exploration of screening results and visualization of read count changes. Apart from MAGeCK, VISPR-online supports two more popular CRISPR screening analysis tools: BAGEL and JACKS. It provides an interactive environment for exploring gene essentiality, viewing guide RNA (gRNA) locations, and allowing users to resume and share screening results.
VISPR-online allows users to visualize, explore and share CRISPR screening data online. It is freely available at http://vispr-online.weililab.org , while the source code is available at https://github.com/lemoncyb/VISPR-online .