Motivation: Reticulate network is a model for displaying and quantifying the effects of complex reticulate processes on the evolutionary history of species undergoing reticulate evolution. A central ...computational problem on reticulate networks is: given a set of phylogenetic trees (each for some region of the genomes), reconstruct the most parsimonious reticulate network (called the minimum reticulate network) that combines the topological information contained in the given trees. This problem is well-known to be NP-hard. Thus, existing approaches for this problem either work with only two input trees or make simplifying topological assumptions. Results: We present novel results on the minimum reticulate network problem. Unlike existing approaches, we address the fully general problem: there is no restriction on the number of trees that are input, and there is no restriction on the form of the allowed reticulate network. We present lower and upper bounds on the minimum number of reticulation events in the minimum reticulate network (and infer an approximately parsimonious reticulate network). A program called PIRN implements these methods, which also outputs a graphical representation of the inferred network. Empirical results on simulated and biological data show that our methods are practical for a wide range of data. More importantly, the lower and upper bounds match for many datasets (especially when the number of trees is small or reticulation level is low), and this allows us to solve the minimum reticulate network problem exactly for these datasets. Availability: A software tool, PIRN, is available for download from the web page: http://www.engr.uconn.edu/~ywu. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data is available at Bioinformatics online.
Motivation: Subtree prune and regraft (SPR) is one kind of tree rearrangements that has seen applications in solving several computational biology problems. The minimum number of rooted SPR (rSPR) ...operations needed to transform one rooted binary tree to another is called the rSPR distance between the two trees. Computing the rSPR distance has been actively studied in recent years. Currently, there is a lack of practical software tools for computing the rSPR distance for relatively large trees with large rSPR distance. Results: In this article, we present a simple and practical method that computes the exact rSPR distance with integer linear programming. By applying this new method on several simulated and real biological datasets, we show that our new method outperforms existing software tools in term of accuracy and ef.ciency. Our experimental results indicate that our method can compute the exact rSPR distance for many large trees with large rSPR distance. Availability: A software tool, SPRDist, is available for download from the web page: http://www.engr.uconn.edu/~ywu. Contact: ywu@engr.uconn.edu
Structural variation (SV), which ranges from 50 bp to Formula: see text 3 Mb in size, is an important type of genetic variations. Deletion is a type of SV in which a part of a chromosome or a ...sequence of DNA is lost during DNA replication. Three types of signals, including discordant read-pairs, reads depth and split reads, are commonly used for SV detection from high-throughput sequence data. Many tools have been developed for detecting SVs by using one or multiple of these signals.
In this paper, we develop a new method called EigenDel for detecting the germline submicroscopic genomic deletions. EigenDel first takes advantage of discordant read-pairs and clipped reads to get initial deletion candidates, and then it clusters similar candidates by using unsupervised learning methods. After that, EigenDel uses a carefully designed approach for calling true deletions from each cluster. We conduct various experiments to evaluate the performance of EigenDel on low coverage sequence data.
Our results show that EigenDel outperforms other major methods in terms of improving capability of balancing accuracy and sensitivity as well as reducing bias. EigenDel can be downloaded from https://github.com/lxwgcool/EigenDel .
Gene expression and regulation in eukaryotes is controlled by orchestrated binding of regulatory proteins, including both activators and repressors, to promoters and other cis-regulatory DNA ...elements. An increasing number of plant genomes have been sequenced; however, a similar effort to the ENCODE project, which aimed to identify all functional elements in the human genome, has yet to be initiated in plants. Here we report genome-wide high-resolution mapping of DNase I hypersensitive (DH) sites in the model plant Arabidopsis thaliana. We identified 38,290 and 41,193 DH sites in leaf and flower tissues, respectively. The DH sites were depleted of bulk nucleosomes and were tightly associated with RNA polymerase II binding sites. Approximately 90% of the binding sites of two well-characterized MADS domain transcription factors, APETALA1 and SEPALLATA3, were covered by the DH sites. We demonstrate that protein binding footprints within a specific genomic region can be revealed using the DH site data sets in combination with known or putative protein binding motifs and gene expression data sets. Thus, genome-wide DH site mapping will be an important tool for systematic identification of all cis-regulatory DNA elements in plants.
China is the largest producer and consumer of refined copper in the world. The large amount of copper consumption not only creates added pressure surrounding resource availability but also causes ...prominent environmental problems. Although copper can be recycled to alleviate resource pressure, there are significant differences between mining primary copper and recycling scrap copper in view of resources, energy consumption, and pollution emissions. These factors were analyzed to better understand the total environmental effects of refined copper from extracting primary ore and recycling scrap copper. The results of this analysis showed that the most serious environmental impacts of refined copper were human toxicity, abiotic depletion potential, and global warming potential. The environmental impacts were mainly caused by the mining and smelting of primary copper by pyrometallurgy. For secondary copper, refining and electrolysis were the main factors. Thus, these main processes, which cause major environmental impacts, should be promoted technologically. According to the results, the total environmental impact of secondary copper was only 1/8 that of the primary copper production process, which indicates that regeneration has better environmental benefits. Furthermore, the sensitive analysis showed that electricity was the most sensitive factor of both technologies. By optimizing the energy structure and increasing the proportion of regeneration, can also reduce the environmental impact. It was suggested that energy structure should be improved and secondary copper should be given more attention and be developed vigorously. Finally, ways to reduce the environmental impact of primary copper and secondary copper industries were recommended.
Scattered and precious metals (SPMs: Se, Te, Au, Ag, Pt, and Pd) play an irreplaceable role in advanced methods and materials, and their global consumption has been growing in recent decades. ...However, SPM consumption and recycling are very unbalanced, resulting in a shortage of supply and some uncertain risk regarding their sustainability. Copper anode slime (CAS) is an important component of secondary resources, and it contains a large amount of SPMs. Because of the complicated occurrence state of SPMs, the technique of extracting and separating them from CAS is quite different from that of raw ore. This paper focuses on the distribution of minerals and current non-cyanide hydrometallurgical methods for extracting Se, Te, Au, Ag, Pt, and Pd from CAS. In particular, in terms of recovery technology, some representative methods, including selective separation, extraction, precipitation, and reduction of hydrometallurgical method, as well as the recovery process, chemical reaction formulas, and the optimization and recycling situations of SPMs are reviewed, and the recycling potential, value, and supply risks of CAS are elaborated. Although these methods have achieved quite satisfactory results in recovering certain SPMs from CAS, it is undeniable that these still face challenges for further promotion. In addition, from the perspective of economic assessment of recovery potential, supply sustainability, and technical improvement, future strategies for recovering SPMs from CAS are proposed. This paper is intended to serve as a guide for the future development research on CAS, and provides detailed information on the promotion of SPMs recycling.
•The main components and mineral characteristics of copper anode slime are summarized.•The separation and extraction methods of main metals in copper anode slime are presented.•Recovery process, and recycling situation of Se, Te, Au, Ag, Pt, and Pd are reviewed.•Recovery potential, and future strategies of recovering copper anode slime are analyzed.
Circular RNA is a type of non-coding RNA, which has a circular structure. Many circular RNAs are stable and contain exons, but are not translated into proteins. Circular RNA has important functions ...in gene regulation and plays an important role in some human diseases. Several biological methods, such as RNase R treatment, have been developed to identify circular RNA. Multiple bioinformatics tools have also been developed for circular RNA detection with high-throughput sequence data.
In this paper, we present circDBG, a new method for circular RNA detection with de Bruijn graph. We conduct various experiments to evaluate the performance of CircDBG based on both simulated and real data. Our results show that CircDBG finds more reliable circRNA with low bias, has more efficiency in running time, and performs better in balancing accuracy and sensitivity than existing methods. As a byproduct, we also introduce a new method to classify circular RNAs based on reads alignment. Finally, we report a potential chimeric circular RNA that is found by CircDBG based on real sequence data. CircDBG can be downloaded from https://github.com/lxwgcool/CircDBG.
We develop a new method called CircDBG for circular RNA detection, which is based on de Bruijn graph. We conduct extensive experiments and demonstrate CircDBG outperforms existing tools, especially in saving running time, reducing bias and improving capability of balancing accuracy and sensitivity. We also introduce a new method to classify circular RNAs and report a potential case of chimeric circular RNA.
Calling genetic variations from sequence reads is an important problem in genomics. There are many existing methods for calling various types of variations. Recently, Google developed a method for ...calling single nucleotide polymorphisms (SNPs) based on deep learning. Their method visualizes sequence reads in the forms of images. These images are then used to train a deep neural network model, which is used to call SNPs. This raises a research question: can deep learning be used to call more complex genetic variations such as structural variations (SVs) from sequence data?
In this paper, we extend this high-level approach to the problem of calling structural variations. We present DeepSV, an approach based on deep learning for calling long deletions from sequence reads. DeepSV is based on a novel method of visualizing sequence reads. The visualization is designed to capture multiple sources of information in the sequence data that are relevant to long deletions. DeepSV also implements techniques for working with noisy training data. DeepSV trains a model from the visualized sequence reads and calls deletions based on this model. We demonstrate that DeepSV outperforms existing methods in terms of accuracy and efficiency of deletion calling on the data from the 1000 Genomes Project.
Our work shows that deep learning can potentially lead to effective calling of different types of genetic variations that are complex than SNPs.
China is the largest lead-acid battery (LAB) consumer and recycler, but suffering from lead contamination due to the spent-lead recycling problems. This paper describes a comparative study of five ...typical LAB recycling processes in China by compiling data about the input materials, energy consumptions, pollution emissions, and final products. We compared the environmental impacts of these processes in six categories using the Chinese Life Cycle Database (CLCD) and analyzed their economic efficiencies using technology cost modeling (TCM) based on the local market prices of materials and energy. According to the results, we found that not all of the innovative hydrometallurgical processes are healthy alternatives. We should pay attention on indirect emissions in the environmental inspection, take account pollution treatment costs into green profit analysis, and recommend the best process with the change of the society resource supply structure.
•Five typical processes of lead recycling in China are compiled and compared.•Indirect emissions should not be ignored in future environmental inspection.•Pollution treatment costs should be taken account into green profit analysis.•The most suitable process will change based on the resource supply structure.
•Different attitudes toward WEEE are shown in developed and developing countries.•Price, convenience and canonicity can affect consumers when selecting a collector.•Stable cooperation within tier two ...collectors is formed by profit-driven individuals.•WEEE processing fund is shared by collectors and intensifies confusion of recovery.•Multi-agent cost-benefit analysis model is built to calculate collector profitability.
Different attitudes and practices toward WEEE recovery and recycling are found in developed and developing countries. As the largest developing country in the world, China's WEEE is widely regarded as a valuable product, and the resources contained in it offer potential profit for informal collectors. Unlike the formal collectors, who are supported by the government, informal collectors can only rely on themselves. However, formal and informal collectors often comfortably coexist in developing countries and there are even cases of informal collectors dominating the market. Obsolete television (OTV) in Beijing is employed as a case study. Questionnaire survey and multi-agent cost-benefit analysis are used to analyze the stability and profitability of the informal collector. The results show the following: (1) The factors of price, convenience and canonicity can affect consumers’ motivations when selecting a collector, with price and convenience having the strongest impact. And informal collectors can better meet consumers’ demands. (2) Stable cooperation within tier two collectors is formed by profit-driven individuals. The secondhand market can expand the profits of the first-tier collectors, while the remaining OTV, which cannot be reused, is delivered to the middleman, ensuring that the first-tier collectors are not menaced from the “rear”. (3) The informal backyard recycler expands the profit of the middleman. Meanwhile, the WEEE processing fund is mostly shared by collectors and intensifies the confusion of the recovery market. We then create a new fund system to improve the regulation of OTV recovery. To make formal collections work, we suggest that the government charges an extra 40 Yuan fund when consumers buy new TVs and distribute it to formal collectors in order to exclude the informal collectors.