Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits ...as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural "exports" and "imports" between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.
Abstract
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and ...clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Weak solutions of problems with
m equations with source terms are proposed using an augmented Riemann solver defined by
m
+
1 states instead of increasing the number of involved equations. These weak ...solutions use propagating jump discontinuities connecting the
m
+
1 states to approximate the Riemann solution. The average of the propagated waves in the computational cell leads to a reinterpretation of the Roe’s approach and in the upwind treatment of the source term of Vázquez-Cendón. It is derived that the numerical scheme can not be formulated evaluating the physical flux function at the position of the initial discontinuities, as usually done in the homogeneous case. Positivity requirements over the values of the intermediate states are the only way to control the global stability of the method. Also it is shown that the definition of well-balanced equilibrium in trivial cases is not sufficient to provide correct results: it is necessary to provide discrete evaluations of the source term that ensure energy dissipating solutions when demanded. The one and two dimensional shallow water equations with source terms due to the bottom topography and friction are presented as case study. The stability region is shown to differ from the one defined for the case without source terms, and it can be derived that the appearance of negative values of the thickness of the water layer in the proximity of the wet/dry front is a particular case, of the wet/wet fronts. The consequence is a severe reduction in the magnitude of the allowable time step size if compared with the one obtained for the homogeneous case. Starting from this result, 1D and 2D numerical schemes are developed for both quadrilateral and triangular grids, enforcing conservation and positivity over the solution, allowing computationally efficient simulations by means of a reconstruction technique for the inner states of the weak solution that allows a recovery of the time step size.
Recent advances in the simulation of free surface flows over mobile bed have shown that accurate and stable results in realistic problems can be provided if an appropriate coupling between the ...shallow water equations (SWE) and the Exner equation is performed. This coupling can be done if using a suitable Jacobian matrix. As a result, faithful numerical predictions are available for a wide range of flow conditions and empirical bed load discharge formulations, allowing to investigate the best option in each case study, which is mandatory in these type of environmental problems. When coupling the equations, the SWE are considered but including an extra conservation law for the sediment dynamics. In this way the computational cost may become unrealistic in situations where the application of the SWE over rigid bed can be used involving large time and space scales without giving up to the adequate level of mesh refinement. Therefore, for restoring the numerical efficiency, the coupling technique is simplified, not decreasing the number of waves involved in the Riemann problem but simplifying their definitions. The effects of the approximations made are tested against experimental data which include transient problems over erodible bed. The simplified model is formulated under a general framework able to insert any desirable discharge solid load formula.
Formulation of targets and establishing which factors in different contexts will achieve these targets are critical to successful decarbonization of the building sector. To contribute to this, we ...have performed an evidence map of roadmaps for zero and low energy and carbon buildings (ZLECB) worldwide, including a list and classification of documents in an on-line geographical map, a description of gaps, and a narrative review of the knowledge gluts. We have retrieved 1219 scientific documents from Scopus, extracted metadata from 274 documents, and identified 117 roadmaps, policies or plans from 27 countries worldwide. We find that there is a coverage bias towards more developed regions. The identified scientific studies are mostly recommendations to policy makers, different types of case studies, and demonstration projects. The geographical inequalities found in the coverage of the scientific literature are even more extreme in the coverage of the roadmaps. These underexplored world regions represent an area for further investigation and increased research/policy attention. Our review of the more substantial amount of literature and roadmaps for developed regions shows differences in target metrics and enforcement mechanisms but that all regions dedicate some efforts at national and local levels. Roadmaps generally focus more on new and public buildings than existing buildings, despite the fact that the latter are naturally larger in number and total floor area, and perform less energy efficiently. A combination of efficiency, technical upgrades, and renewable generation is generally proposed in the roadmaps, with behavioral measures only reflected in the use of information and communication technologies, and minimal focus being placed on lifecycle perspectives. We conclude that insufficient progress is being made in the implementation of ZLECB. More work is needed to couple the existing climate goals, with realistic, enforceable policies to make the carbon savings a reality for different contexts and stakeholders worldwide.
Shallow water flows are found in a variety of engineering problems always dominated by the presence of bed friction and irregular bathymetry. These source terms determine completely the possible ...evolution of the flooded area in time. It is well known that appropriate numerical schemes for this type of flows must be well-balanced. Well-balanced numerical schemes are based on the preservation of cases of quiescent equilibrium over variable bed elevation. Commonly they are formulated as an adaptation of numerical solvers defined for cases without source terms. This procedure is insufficient when applied to real situations. Then, it is possible to argue that appropriate numerical schemes cannot arise directly from those derived from the simplest homogeneous case without source terms. New solutions are presented in this work by defining weak solutions that include the presence of source terms. To do that, the solvers presented in this work extend the number of waves in the well known HLL and HLLC solvers involving a stationary jump in the solution. This is done without modifying the original solution vector of conserved quantities. The resulting approximate Riemann solvers include variable bed level surface and friction. Solvers are systematically assessed via a series of test problems with exact solutions for one and two dimensions, including steady and unsteady flow configurations, variation of the flooded area in time and comparisons with experimental data. The obtained results point out that the new method is able to predict faithfully the overall behavior of the solution and of any type of waves.
Transient flow over erodible bed is solved in this work assuming that the dynamics of the bed load problem is described by two mathematical models: the hydrodynamic model, assumed to be well ...formulated by means of the depth averaged shallow water equations, and the Exner equation. The Exner equation is written assuming that bed load transport is governed by a power law of the flow velocity and by a flow/sediment interaction parameter variable in time and space. The complete system is formed by four coupled partial differential equations and a genuinely Roe-type first order scheme has been used to solve it on triangular unstructured meshes. Exact solutions have been derived for the particular case of initial value Riemann problems with variable bed level and depending on particular forms of the solid discharge formula. The model, supplied with the corresponding solid transport formulae, is tested by comparing with the exact solutions. The model is validated against laboratory experimental data of different unsteady problems over erodible bed.
In this work, the source term discretization in hyperbolic conservation laws with source terms is considered using an approximate augmented Riemann solver. The technique is applied to the shallow ...water equations with bed slope and friction terms with the focus on the friction discretization. The augmented Roe approximate Riemann solver provides a family of weak solutions for the shallow water equations, that are the basis of the upwind treatment of the source term. This has proved successful to explain and to avoid the appearance of instabilities and negative values of the thickness of the water layer in cases of variable bottom topography. Here, this strategy is extended to capture the peculiarities that may arise when defining more ambitious scenarios, that may include relevant stresses in cases of mud/debris flow. The conclusions of this analysis lead to the definition of an accurate and robust first order finite volume scheme, able to handle correctly transient problems considering frictional stresses in both clean water and debris flow, including in this last case a correct modelling of stopping conditions.
In this study, we investigated the effect of five feature selection approaches on the performance of a mixed model (G-BLUP) and a Bayesian (Bayes C) prediction method. We predicted height, high ...density lipoprotein cholesterol (HDL) and body mass index (BMI) within 2,186 Croatian and into 810 UK individuals using genome-wide SNP data. Using all SNP information Bayes C and G-BLUP had similar predictive performance across all traits within the Croatian data, and for the highly polygenic traits height and BMI when predicting into the UK data. Bayes C outperformed G-BLUP in the prediction of HDL, which is influenced by loci of moderate size, in the UK data. Supervised feature selection of a SNP subset in the G-BLUP framework provided a flexible, generalisable and computationally efficient alternative to Bayes C; but careful evaluation of predictive performance is required when supervised feature selection has been used.