Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite ...efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.
Cellular senescence—the permanent arrest of cycling in normally proliferating cells such as fibroblasts—contributes both to age‐related loss of mammalian tissue homeostasis and acts as a tumour ...suppressor mechanism. The pathways leading to establishment of senescence are proving to be more complex than was previously envisaged. Combining in‐silico interactome analysis and functional target gene inhibition, stochastic modelling and live cell microscopy, we show here that there exists a dynamic feedback loop that is triggered by a DNA damage response (DDR) and, which after a delay of several days, locks the cell into an actively maintained state of ‘deep’ cellular senescence. The essential feature of the loop is that long‐term activation of the checkpoint gene CDKN1A (p21) induces mitochondrial dysfunction and production of reactive oxygen species (ROS) through serial signalling through GADD45‐MAPK14(p38MAPK)‐GRB2‐TGFBR2‐TGFβ. These ROS in turn replenish short‐lived DNA damage foci and maintain an ongoing DDR. We show that this loop is both necessary and sufficient for the stability of growth arrest during the establishment of the senescent phenotype.
Synopsis
The phenomenon of cellular ‘senescence’—the permanent arrest of division in normally proliferating mammalian cells such as fibroblasts—is thought to be a central component of the ageing process. Senescence contributes both to age‐related loss of tissue homeostasis, as the loss of division capacity leads to impaired cell renewal, and also to protect against cancer, because it acts to block the uncontrolled proliferation of cells that may give rise to a malignant tumour. Replicative senescence is triggered by uncapped telomeres or by ‘unrepairable’ non‐telomeric DNA damage. Both lesions initiate the same canonical DNA damage response (DDR) (d'Adda di Fagagna, 2008). This response is characterized by activation of sensor kinases (ATM/ATR, DNA‐PK), formation of DNA damage foci containing activated H2A.X (γH2A.X) and ultimately induction of cell cycle arrest through activation of checkpoint proteins, notably p53 (TP53) and the CDK inhibitor p21 (CDKN1A). This signalling pathway continues to contribute actively to the stability of the G0 arrest in fully senescent cells long after induction of senescence (d'Adda di Fagagna et al, 2003). However, senescence is more complex than mere CDKI‐mediated growth arrest. Senescent cells alter their expression of literally hundreds of genes (Shelton et al, 1999), prominent among these being pro‐inflammatory secretory genes (Coppe et al, 2008) and marker genes for a retrograde response induced by mitochondrial dysfunction (Passos et al, 2007a).
There is a growing evidence that multiple mechanisms interact to underpin ageing at the cellular level (Kirkwood, 2005; Passos et al, 2007b) necessitating a systems biology approach if the complex mechanisms of ageing are to be understood (Kirkwood, 2008). With respect to cell senescence, the two major unanswered questions are (i) How does a DNA lesion that can be repaired, at least in principle, induce and maintain irreversible growth arrest? and (ii) How does a growth arrest trigger a completely different cellular phenotype as soon as it becomes irreversible?
To understand those questions, we performed a kinetic analysis of the establishment phase of senescence initiated by DNA damage or telomere dysfunction, focussing on pathways downstream of the classical DDR. Using an approach that combined (i) in‐silico interactome analysis, (ii) functional target gene inhibition, (iii) stochastic modelling, and (iv) live cell microscopy, we identified a positive feedback loop between DDR and mitochondrial production of reactive oxygen species (ROS) as necessary and sufficient for long‐term maintenance of growth arrest. Using pathway log likelihood scores calculated by a quantitative in‐silico interactome analysis to guide siRNA and small molecule inhibition experiments, and using results of sequential and combined inhibition experiments to refine the predictions from the interactome analysis, we found that DDR triggered mitochondrial dysfunction leading to enhanced ROS activation through a linear signal transduction through TP53, CDKN1A, GADD45A, p38 (MAPK14), GRB2, TGFBR2 and TGFβ(Figure 2D). We hypothesized that these ROS stochastically generate novel DNA damage in the nucleus, thus forming a positive feedback loop contributing to the long‐term maintenance of DDR (Figure 3A). First confirmation came from static inhibitor experiments as before, showing that nuclear DNA damage foci frequencies in senescent cells were reduced if feedback signalling was suppressed. To formally establish the existence of a feedback loop and its relevance for senescence, we used live cell microscopy in combination with quantitative modelling.
We transformed the conceptual model shown in Figure 3A into a stochastic mechanistic model of the DDR feedback loop by extending the previously published model of the TP53/Mdm2 circuit (Proctor and Gray, 2008) to include reactions for synthesis/activation and degradation/deactivation/repair of CDKN1A, GADD45, MAPK14, ROS and DNA damage. The model replicated very precisely the kinetic behaviour of activated TP53, CDKN1A, ROS and DNA damage foci after initiation of senescence by irradiation. Having established its concordance with the experimental data, the model was then used to predict the effects of intervening in the feedback loop. The model predicted that any intervention reducing ROS levels by about half would decrease average DNA damage foci frequencies from six to four foci/nucleus within about 15 h. It further predicted that this would be sufficient to reduce CDKN1A to basal levels continuously for at least 6 h in about 20% of the treated cells, thus allowing a significant fraction of cells to escape from growth arrest and to resume proliferation. This should happen even if the intervention into the feedback loop was started at a late time point (e.g. 6 days) after induction of senescence.
To analyse DNA damage foci dynamics we used a reporter construct (AcGFP–53BP1c) that quantitatively reports single DNA damage foci kinetics in time‐resolved live cell microscopy (Nelson et al, 2009). Foci frequency measurements quantitatively confirmed the prediction from the stochastic model. More importantly, we found that many individual foci in both telomere‐ and stress‐dependent senescence had short lifespans with half‐lives below 15 h. Feedback loop inhibition reduced only the frequencies of short‐lived DNA damage foci in accordance with the hypothesis that ROS production contributed to DDR by constant replenishment of short‐lived DNA damage foci.
Finally, we inhibited signalling through the loop at different time points after induction of senescence by ionizing radiation and measured ROS levels, DNA damage foci frequencies and proliferation markers. Treatments with the MAPK14 inhibitor SB203580 or the free radical scavenger PBN were used to block the loop. The results quantitatively confirmed the model prediction and indicated that the feedback loop between DDR and ROS production was both necessary and sufficient to maintain cell cycle arrest for at least 6–10 days after induction of senescence. Interestingly, the loop was still active at later time points and in deep senescence, but proliferation arrest was then stabilized by additional factor(s). This indicated that certain features of the senescent phenotype‐like ROS production that might be responsible for the negative impact of senescent cells into their tissue environment can be successfully inhibited even in deep senescence. This may prove relevant for novel therapeutic studies aiming to modulate intracellular ROS levels in both aging and cancer.
The sustained activation of CDKN1A (p21/Waf1/Cip1) by a DNA damage response induces mitochondrial dysfunction and reactive oxygen species (ROS) production via signalling through CDKN1A‐GADD45A‐MAPK14‐ GRB2‐TGFBR2‐TGFbeta in senescing primary human and mouse cells in vitro and in vivo.
Enhanced ROS production in senescing cells generates additional DNA damage. Although this damage is repairable and transient, it elevates the average levels of DNA damage response permanently, thus forming a positive feedback loop.
This loop is necessary and sufficient to maintain the stability of growth arrest until a ‘point of no return’ is reached during establishment of senescence.
Psoriasis is a common chronic skin disorder, but the mechanisms involved in the resolution and clearance of plaques remain poorly defined. We investigated the mechanism of action of UVB, which is ...highly effective in clearing psoriasis and inducing remission, and tested the hypothesis that apoptosis is a key mechanism. To distinguish bystander effects, equal erythemal doses of two UVB wavelengths were compared following in vivo irradiation of psoriatic plaques; one is clinically effective (311nm) and one has no therapeutic effect on psoriasis (290nm). Only 311nm UVB induced significant apoptosis in lesional epidermis, and most apoptotic cells were keratinocytes. To determine clinical relevance, we created a computational model of psoriatic epidermis. Modeling predicted apoptosis would occur in both stem and transit-amplifying cells to account for plaque clearance; this was confirmed and quantified experimentally. The median rate of keratinocyte apoptosis from onset to cell death was 20minutes. These data were fed back into the model and demonstrated that the observed level of keratinocyte apoptosis was sufficient to explain UVB-induced plaque resolution. Our human studies combined with a systems biology approach demonstrate that keratinocyte apoptosis is a key mechanism in psoriatic plaques clearance, providing the basis for future molecular investigation and therapeutic development.
Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the ...published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a Knowledge-based Intelligent Text Simplification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.
The need for the automated computational design of genetic circuits is becoming increasingly apparent with the advent of ever more complex and ambitious synthetic biology projects. Currently, most ...circuits are designed through the assembly of models of individual parts such as promoters, ribosome binding sites and coding sequences. These low level models are combined to produce a dynamic model of a larger device that exhibits a desired behaviour. The larger model then acts as a blueprint for physical implementation at the DNA level. However, the conversion of models of complex genetic circuits into DNA sequences is a non-trivial undertaking due to the complexity of mapping the model parts to their physical manifestation. Automating this process is further hampered by the lack of computationally tractable information in most models.
We describe a method for automatically generating DNA sequences from dynamic models implemented in CellML and Systems Biology Markup Language (SBML). We also identify the metadata needed to annotate models to facilitate automated conversion, and propose and demonstrate a method for the markup of these models using RDF. Our algorithm has been implemented in a software tool called MoSeC.
The software is available from the authors' web site http://research.ncl.ac.uk/synthetic_biology/downloads.html.
Synthetic biology is a relatively young field, although it builds upon disciplines whose roots go back centuries. Recently, its practitioners have tended to move into the field out of interest or by ...chance, and come from a wide variety of backgrounds. It is also a very fast‐moving field; new protocols, laboratory equipment, computational facilities and algorithms are being developed at a rapid pace. Students who start studying synthetic biology at an undergraduate or postgraduate level will, in the course of their careers, work with technologies as yet undreamt of, and will do so mostly in the context of highly interdisciplinary teams. In this study, the authors identify some of the key areas required for the education of new synthetic biologists to equip them with both adequate background and sufficient flexibility to tackle these challenges and therefore to future‐proof synthetic biology.
Networks of interactions evolve in many different domains. They tend to have topological characteristics in common, possibly due to common factors in the way the networks grow and develop. It has ...been recently suggested that one such common characteristic is the presence of a hierarchically modular organization. In this paper, we describe a new algorithm for the detection and quantification of hierarchical modularity, and demonstrate that the yeast protein–protein interaction network does have a hierarchically modular organization. We further show that such organization is evident in artificial networks produced by computational evolution using a gene duplication operator, but not in those developing via preferential attachment of new nodes to highly connected existing nodes.
BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This ...application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.
BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This ...application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.