Robust methods to detect DNA-binding proteins from structures of unknown function are important for structural biology. This paper describes a method for identifying such proteins that (i) have a ...solvent accessible structural motif necessary for DNA-binding and (ii) a positive electrostatic potential in the region of the binding region. We focus on three structural motifs: helix–turn-helix (HTH), helix–hairpin–helix (HhH) and helix–loop–helix (HLH). We find that the combination of these variables detect 78% of proteins with an HTH motif, which is a substantial improvement over previous work based purely on structural templates and is comparable to more complex methods of identifying DNA-binding proteins. Similar true positive fractions are achieved for the HhH and HLH motifs. We see evidence of wide evolutionary diversity for DNA-binding proteins with an HTH motif, and much smaller diversity for those with an HhH or HLH motif.
Probes with runs of four or more guanines (G-stacks) in their sequences can exhibit a level of hybridization that is unrelated to the expression levels of the mRNA that they are intended to measure. ...This is most likely caused by the formation of G-quadruplexes, where inter-probe guanines form Hoogsteen hydrogen bonds, which probes with G-stacks are capable of forming. We demonstrate that for a specific microarray data set using the Human HG_U133A Affymetrix GeneChip and RMA normalization there is significant bias in the expression levels, the fold change and the correlations between expression levels. These effects grow more pronounced as the number of G-stack probes in a probe set increases. Approximately 14% of the probe sets are directly affected. The analysis was repeated for a number of other normalization pipelines and two, FARMS and PLIER, minimized the bias to some extent. We estimate that ∼15% of the data sets deposited in the GEO database are susceptible to the effect. The inclusion of G-stack probes in the affected data sets can bias key parameters used in the selection and clustering of genes. The elimination of these probes from any analysis in such affected data sets outweighs the increase of noise in the signal.
The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research ...outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for clinicians, researchers, policy- and decision-makers, funders, publishers, public health experts, disaster preparedness and response experts, infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations), and other potential users. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, and suggests how these developments can be leveraged by the wider scientific community.
Data science skills are rapidly becoming a necessity in modern science. In response to this need, institutions and organizations around the world are developing research data science curricula to ...teach the programming and computational skills that are needed to build and maintain data infrastructures and maximize the use of available data. To date, however, few of these courses have included an explicit ethics component, and developing such components can be challenging. This paper describes a novel approach to teaching data ethics on short courses developed for the CODATA-RDA Schools for Research Data Science. The ethics content of these schools is centred on the concept of open and responsible (data) science citizenship that draws on virtue ethics to promote ethics of practice. Despite having little formal teaching time, this concept of citizenship is made central to the course by distributing ethics content across technical modules. Ethics instruction consists of a wide range of techniques, including stand-alone lectures, group discussions and mini-exercises linked to technical modules. This multi-level approach enables students to develop an understanding both of “responsible and open (data) science citizenship”, and of how such responsibilities are implemented in daily research practices within their home environment. This approach successfully locates ethics within daily data science practice, and allows students to see how small actions build into larger ethical concerns. This emphasises that ethics are not something “removed from daily research” or the remit of data generators/end users, but rather are a vital concern for all data scientists.
We evaluate recent efforts to further the effective teaching of FAIR data principles by examining existing and developing educational frameworks focused upon FAIR, training initiatives that have ...informed teaching on FAIR skills' topics, and a number of key sources for discovering FAIR training materials and how much those sources provide descriptive information about the materials. FAIR4S, providing a coherent description of skills and competencies, is analyzed by target audience using the description of actors found in a European Open Science Cloud ecosystem report and by comparison of the coverage and extent of description of educational and training materials available from the list of sources for finding such materials. Our analysis elucidates the importance of linking resources to FAIR-related educational frameworks, providing consistent descriptions of them using a community-based metadata scheme, and developing an instructor community of practice where ideas and methods can be shared on how to teach FAIR data skills.
There are a variety of initiatives in teaching, training materials, and educational frameworks to inform the teaching of FAIR skills. FAIR4S as an educational framework gives a good overview of the necessary skills. No one source of training materials gives a complete overview of the necessary materials. There is a need to provide a method to annotate materials to make them more findable.
There are a variety of initiatives in teaching, training materials, and educational frameworks to inform the teaching of FAIR skills. FAIR4S as an educational framework gives a good, if very broad, overview of the necessary skills. No one source of training materials gives a complete overview of the necessary materials. There is a need to provide a method to annotate materials to make them more findable.
A method to detect DNA-binding sites on the surface of a protein structure is important for functional annotation. This work describes the analysis of residue patches on the surface of DNA-binding ...proteins and the development of a method of predicting DNA-binding sites using a single feature of these surface patches. Surface patches and the DNA-binding sites were initially analysed for accessibility, electrostatic potential, residue propensity, hydrophobicity and residue conservation. From this, it was observed that the DNA-binding sites were, in general, amongst the top 10% of patches with the largest positive electrostatic scores. This knowledge led to the development of a prediction method in which patches of surface residues were selected such that they excluded residues with negative electrostatic scores. This method was used to make predictions for a data set of 56 non-homologous DNA-binding proteins. Correct predictions made for 68% of the data set.
The CODATA-RDA Data Steward School Daniel Bangert; Joy Davidson; Steve Diggs ...
International Journal of Digital Curation,
7/2020, Letnik:
15, Številka:
1
Journal Article
Recenzirano
Odprti dostop
Given the expected increase in demand for Data Stewards and Data Stewardship skills it is clear that there is a need to develop training, education and CPD (continuous professional development) in ...this area.
In this paper a brief introduction is provided to the origin of definitions of Data Stewardship. Also it notes the present tendency towards equivalence between Data Stewardship skills and FAIR principles. It then focuses on one specific training event – the pilot Data Stewardship strand of the CODATA-RDA Research Data Science schools that by the time of the IDCC meeting will have been held in Trieste in August 2019. The paper will discuss the overall curriculum for the pilot school, how it matches with the FAIR4S framework, and plans for getting feedback from the students.
Finally, the paper discuss future plans for the school, in particular how to deepen the integration between the Data Stewardship strand with the Early Career Researcher strand.
Topological defects in lattice gauge theories Davis, Anne C; Kibble, Tom W.B; Rajantie, Arttu ...
The journal of high energy physics,
11/2000, Letnik:
2000, Številka:
11
Journal Article
Recenzirano
Odprti dostop
We present a non-perturbative formalism for measuring defect free energies (monopole mass or vortex tension) in 3D SU(2)+adjoint Higgs models. Starting from twisted, translation invariant boundary ...conditions, we perform a change of variables that allows us to express the defect free energies in terms of 't Hooft loops. We propose that the defect free energies can be used to distinguish between phases in this model, and also more generally in other gauge field theories where no local order parameters exist. In the case of monopoles, our construction can also be used in 4D pure-gauge SU(2) theory, where it gives the monopole mass in the maximally abelian gauge without the need for actually fixing the gauge in the simulation. Moreover, the expression is manifestly independent of the choice of the abelian projection as long as it is compatible with the classical 't Hooft-Polyakov solution. (Author)