NONCODE (http://www.bioinfo.org/noncode/) is an interactive database that aims to present the most complete collection and annotation of non-coding RNAs, especially long non-coding RNAs (lncRNAs). ...The recently reduced cost of RNA sequencing has produced an explosion of newly identified data. Revolutionary third-generation sequencing methods have also contributed to more accurate annotations. Accumulative experimental data also provides more comprehensive knowledge of lncRNA functions. In this update, NONCODE has added six new species, bringing the total to 16 species altogether. The lncRNAs in NONCODE have increased from 210 831 to 527,336. For human and mouse, the lncRNA numbers are 167,150 and 130,558, respectively. NONCODE 2016 has also introduced three important new features: (i) conservation annotation; (ii) the relationships between lncRNAs and diseases; and (iii) an interface to choose high-quality datasets through predicted scores, literature support and long-read sequencing method support. NONCODE is also accessible through http://www.noncode.org/.
DaDianNao: A Machine-Learning Supercomputer Yunji Chen; Tao Luo; Shaoli Liu ...
2014 47th Annual IEEE/ACM International Symposium on Microarchitecture,
2014-Dec.
Conference Proceeding
Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The ...state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.
smORFs are small open reading frames of less than 100 codons. Recent low throughput experiments showed a lot of smORF-encoded peptides (SEPs) played crucial rule in processes such as regulation of ...transcription or translation, transportation through membranes and the antimicrobial activity. In order to gather more functional SEPs, it is necessary to have access to genome-wide prediction tools to give profound directions for low throughput experiments. In this study, we put forward a functional smORF-encoded peptides predictor (FSPP) which tended to predict authentic SEPs and their functions in a high throughput method. FSPP used the overlap of detected SEPs from Ribo-seq and mass spectrometry as target objects. With the expression data on transcription and translation levels, FSPP built two co-expression networks. Combing co-location relations, FSPP constructed a compound network and then annotated SEPs with functions of adjacent nodes. Tested on 38 sequenced samples of 5 human cell lines, FSPP successfully predicted 856 out of 960 annotated proteins. Interestingly, FSPP also highlighted 568 functional SEPs from these samples. After comparison, the roles predicted by FSPP were consistent with known functions. These results suggest that FSPP is a reliable tool for the identification of functional small peptides. FSPP source code can be acquired at https://www.bioinfo.org/FSPP.
Graph-parallel computation has become a crucial component in emerging applications of web search, data analytics and machine learning. In practice, most graphs derived from real-world phenomena are ...very large and scale-free. Unfortunately, distributed graph-parallel computation of these natural graphs still suffers strong scalability issues on contemporary multicore clusters. To embrace the multicore architecture in distributed graph-parallel computation, we propose the framework Graphine, which features (i) A Scatter-Combine computation abstraction that is evolved from the traditional vertex-centric approach by fusing the paired scatter and gather operations, executed separately on two edge sides, into a one-sided scatter. Further coupled with active message mechanism, it potentially reduces intermediate message cost and enables fine-grained parallelism on multicore architecture. (ii) An Agent-Graph data model, which leverages an idea similar to vertex-cut but conceptually splits the remote replica into two agent types of scatter and combiner, resulting in less communication. We implement the Graphine framework and evaluate it using several representative algorithms on six large real-world graphs and a series of synthetic graphs with power-law degree distributions. We show that Graphine achieves sublinear scalability with the number of cores per node, number of nodes, and graph sizes (up to one billion vertices), and is 2~15 times faster than the state-of-the-art PowerGraph on a cluster of 16 multicore nodes.
A high power and high quality picosecond laser is crucial in MEMS fabrication regarding micromachines. Optimal seed beam coupling is an important precondition to enhance laser efficiency. However, ...empirical coupling limits its development. In this paper, the physical parameters related to coupling are determined. The relationships among them are established under optical mode matching constraints to satisfy optimal seed beam coupling. According to a theoretical analysis, the focal length cut-off and the optimal coupling position of the coupling lens are acquired. A maximum transmittance of 87.2% is acquired with a 6 W input seed power in the validation experiment. In further power amplification experiments, a diffraction-limited beam quality is achieved, with M
= 1.111, M
= 1.017, an optical efficiency of 60.5% and a slope efficiency of 66%, benefiting from the previous theoretical guidance.
The authors designed an accelerator architecture for large-scale neural networks, with an emphasis on the impact of memory on accelerator design, performance, and energy. In this article, they ...present a concrete design at 65 nm that can perform 496 16-bit fixed-point operations in parallel every 1.02 ns, that is, 452 gop/s, in a 3.02mm 2 , 485-mw footprint (excluding main memory accesses).
Resource efficiency and quality of service (QoS) are both long-pursuit goals for cloud providers over the last decade. However, hardly any cloud platform can exactly achieve them perfectly even until ...today. Improving resource efficiency or resource utilization often could cause complicated resource contention between colocated cloud applications on different resources, spanning from the underlying hardware to the software stack, leading to unexpected performance degradation. The low-entropy cloud proposes a new software-hardware codesigned technology stack to holistically curb performance interference from the bottom up and obtain both high resource efficiency and high quality of application performance. In this paper, we introduce a new computer architecture for the low-entropy cloud stack, called labeled von Neumann architecture (LvNA), which incorporates a set of label-powered control mechanisms to enable shared components and resources on chip to differentiate, isolate, and prioritize user-defined application requests when competing for hardware resource. With the power of these mechanisms, LvNA was able to protect the performance of certain applications, such as latency-critical applications, from disorderly resource contention while improving resource utilization. We further build and tapeout Beihai, a 1.2 GHz 8-core RISC-V processor based on the LvNA architecture. The evaluation results show that Beihai could drastically reduce the performance degradation caused by memory bandwidth contention from 82.8% to 0.4%. When improving the CPU utilization over 70%, Beihai could reduce the 99th tail latency of Redis from 115 ms to 18.1 ms. Furthermore, Beihai can realize hardware virtualization, which boots up two unmodified virtual machines concurrently without the intervention of any software hypervisor.
Distributed computation on directed graphs has been increasingly important in emerging big data analytics. However, partitioning the huge real-world graphs, such as social and web networks, is known ...challenging for their skewed (or power-law) degree distributions. In this paper, by investigating two representative k-way balanced edge-cut methods (LDG streaming heuristic and METIS) on 12 real social and web graphs, we empirically find that both LDG and METIS can partition page-level web graphs with extremely high quality, but fail to generate low-cut balanced partitions for social networks and host-level web graphs. Our deep analysis identifies that the global star-motif structures around high-degree vertices is the main obstacle to high-quality partitioning. Based on the empirical study, we further propose a new distributed graph model, namely Agent-Graph , and the Agent+ framework that partitions power-law graphs in the Agent-Graph model. Agent-Graph is a vertex cut variant in the context of message passing, where any high-degree vertex is factored into arbitrary computational agents in remote partitions for message combining and scattering. The Agent framework filters the high-degree vertices to form a residual graph which is then partitioned with high quality by existing edge-cut methods, and finally refills high-degree vertices as agents to construct an agent-graph. Experiments show that the Agent+ approach constantly generates high-quality partitions for all tested real-world skewed graphs. In particular, for 64-way partitioning on social networks and host-level web graphs, the Agent+ approach reduces edge cut equivalently by 27%~79% for LDG and 23%~82% for METIS.
With the explosive growth of information, more and more organizations are deploying private cloud systems or renting public cloud systems to process big data. However, there is no existing benchmark ...suite for evaluating cloud performance on the whole system level. To the best of our knowledge, this paper proposes the first benchmark suite CloudRank-D to benchmark and rank cloud computing systems that are shared for running big data applications.We analyze the limitations of previous metrics, e.g., floating point operations, for evaluating a cloud computing system, and propose two simple metrics: data processed per second and data processed per Joule as two complementary metrics for evaluating cloud computing systems. We detail the design of CloudRank-D that considers representative applications, diversity of data characteristics, and dynamic behaviors of both applications and system software platforms. Through experiments, we demonstrate the advantages of our proposed metrics. In several case studies, we evaluate two small-scale deployments of cloud computing systems using CloudRank-D.