Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of ...crop plants and a range of molecular ecology and diversity studies. With the increase in the availability of DNA sequence information, an automated process to identify and design PCR primers for amplification of SSR loci would be a useful tool in plant breeding programs. We report an application that integrates SPUTNIK, an SSR repeat finder, with Primer3, a PCR primer design program, into one pipeline tool, SSR Primer. On submission of multiple FASTA formatted sequences, the script screens each sequence for SSRs using SPUTNIK. The results are parsed to Primer3 for locus-specific primer design. The script makes use of a Web-based interface, enabling remote use. Availability: This program has been written in PERL and is freely available for non-commercial users by request from the authors. The Web-based version may be accessed at http://hornbill.cspp.latrobe.edu.au/
Federated learning has generated significant interest, with nearly all works focused on a "star" topology where nodes/devices are each connected to a central server. We migrate away from this ...architecture and extend it through the network dimension to the case where there are multiple layers of nodes between the end devices and the server. Specifically, we develop multi-stage hybrid federated learning ( MH-FL ), a hybrid of intra-and inter-layer model learning that considers the network as a multi-layer cluster-based structure. MH-FL considers the topology structures among the nodes in the clusters, including local networks formed via device-to-device (D2D) communications, and presumes a semi-decentralized architecture for federated learning. It orchestrates the devices at different network layers in a collaborative/cooperative manner (i.e., using D2D interactions) to form local consensus on the model parameters and combines it with multi-stage parameter relaying between layers of the tree-shaped hierarchy. We derive the upper bound of convergence for MH-FL with respect to parameters of the network topology (e.g., the spectral radius) and the learning algorithm (e.g., the number of D2D rounds in different clusters). We obtain a set of policies for the D2D rounds at different clusters to guarantee either a finite optimality gap or convergence to the global optimum. We then develop a distributed control algorithm for MH-FL to tune the D2D rounds in each cluster over time to meet specific convergence criteria. Our experiments on real-world datasets verify our analytical results and demonstrate the advantages of MH-FL in terms of resource utilization metrics.
Simple sequence repeat (SSR) molecular genetic markers have become important tools for a broad range of applications such as genome mapping and genetic diversity studies. SSRs are readily identified ...within DNA sequence data and PCR primers can be designed for their amplification. These PCR primers frequently cross amplify within related species. We report a web-based tool, SSR Primer, that integrates SPUTNIK, an SSR repeat finder, with Primer3, a primer design program, within one pipeline. On submission of multiple FASTA formatted sequences, the script screens each sequence for SSRs using SPUTNIK. Results are then parsed to Primer3 for locus specific primer design. We have applied this tool for the discovery of SSRs within the complete GenBank database, and have designed PCR amplification primers for over 13 million SSRs. The SSR Taxonomy Tree server provides web-based searching and browsing of species and taxa for the visualisation and download of these SSR amplification primers. These tools are available at http://bioinformatics.pbcbasc.latrobe.edu.au/ssrdiscovery.html.
Recently, deep learning approaches have provided solutions to difficult problems in wireless positioning (WP). Although these WP algorithms have attained excellent and consistent performance against ...complex channel environments, the computational complexity coming from processing high-dimensional features can be prohibitive for mobile applications. In this work, we design a novel positioning neural network (P-NN) that utilizes the minimum description features to substantially reduce the complexity of deep learning-based WP. P-NN's feature selection strategy is based on maximum power measurements and their temporal locations to convey information needed to conduct WP. We improve P-NN's learning ability by intelligently processing two different types of inputs: sparse image and measurement matrices. Specifically, we implement a self-attention layer to reinforce the training ability of our network. We also develop a technique to adapt feature space size, optimizing over the expected information gain and the classification capability quantified with information-theoretic measures on signal bin selection. Numerical results show that P-NN achieves a significant advantage in performance-complexity tradeoff over deep learning baselines that leverage the full power delay profile (PDP). In particular, we find that P-NN achieves a large improvement in performance for low SNR, as unnecessary measurements are discarded in our minimum description features
The adenomatous polyposis coli (APC) tumour suppressor gene is mutated in about 80% of colorectal cancers (CRC) Brannon et al. (2014) 1. APC is a large multifunctional protein that regulates many ...biological functions including Wnt signalling (through the regulation of beta-catenin stability) Reya and Clevers (2005) 2, cell migration Kroboth et al. (2007), Sansom et al. (2004) 3,4, mitosis Kaplan et al. (2001) 5, cell adhesion Faux et al. (2004), Carothers et al. (2001) 6,7 and differentiation Sansom et al. (2004) 4. Although the role of APC in CRC is often described as the deregulation of Wnt signalling, its other biological functions suggest that there are other factors at play that contribute to the onset of adenomas and the progression of CRC upon the truncation of APC. To identify genes and pathways that are dysregulated as a consequence of loss of function of APC, we compared the gene expression profiles of the APC mutated human CRC cell line SW480 following reintroduction of wild-type APC (SW480+APC) or empty control vector (SW480+vector control) Faux et al. (2004) . Here we describe the RNA-seq data derived for three biological replicates of parental SW480, SW480+vector control and SW480+APC cells, and present the bioinformatics pipeline used to test for differential gene expression and pathway enrichment analysis. A total of 1735 genes showed significant differential expression when APC was restored and were enriched for genes associated with cell polarity, Wnt signalling and the epithelial to mesenchymal transition. There was additional enrichment for genes involved in cell–cell adhesion, cell–matrix junctions, angiogenesis, axon morphogenesis and cell movement. The raw and analysed RNA-seq data have been deposited in the Gene Expression Omnibus (GEO) database under accession number GSE76307. This dataset is useful for further investigations of the impact of APC mutation on the properties of colorectal cancer cells.
Terrestrial networks face limitations like restricted rural broadband coverage and service outages during disasters. To address these challenges, non-terrestrial networks (NTNs) are a promising ...alternative, utilizing aerial vehicles and satellites to enhance coverage and support diverse user applications. However, existing works lack comprehensive consideration of transmitter availability, resource allocation, and cost constraints, leading to suboptimal network performance and design. This work explores the potential of heterogeneous NTNs, including unmanned aerial vehicles, high-altitude platforms, and satellites. Using stochastic geometry, we analyze downlink performance while considering interruptions in communication service for both low-altitude transmitters (due to recharging needs) and high-altitude transmitters (due to fluctuating solar energy harvesting). Our analysis derives connection probabilities, representing the likelihood of transmitters establishing downlink connections with ground users. We propose a resource allocation framework using convex optimization techniques to maximize the downlink connection probability while considering economic costs and communication quality-of-service requirements. Numerical evaluations highlight the significance of incorporating low-Earth orbit satellites and demonstrate the influence of economic cost and SINR-related constraints. Results indicate that allocating resources to higher-altitude layers is favorable under stringent cost constraints, while lower-altitude layers are preferred under strict SINR constraints due to improved propagation conditions.
We propose cooperative edge-assisted dynamic federated learning ( CE-FL). CE-FL introduces a distributed machine learning (ML) architecture, where data collection is carried out at the end devices, ...while the model training is conducted cooperatively at the end devices and the edge servers, enabled via data offloading from the end devices to the edge servers through base stations. CE-FL also introduces floating aggregation point, where the local models generated at the devices and the servers are aggregated at an edge server, which varies from one model training round to another to cope with the network evolution in terms of data distribution and users' mobility. CE-FL considers the heterogeneity of network elements in terms of communication/computation models and the proximity to one another. CE-FL further presumes a dynamic environment with online variation of data at the network devices which causes a drift at the ML model performance. We model the processes taken during CE-FL, and conduct analytical convergence analysis of its ML model training. We then formulate network-aware CE-FL which aims to adaptively optimize all the network elements via tuning their contribution to the learning process, which turns out to be a non-convex mixed integer problem. Motivated by the large scale of the system, we propose a distributed optimization solver to break down the computation of the solution across the network elements. We finally demonstrate the effectiveness of our framework with the data collected from a real-world testbed.
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the ...server). In this paper, we develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions: (i) Network , allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) Heterogeneity , interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for model/concept drift . (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) Proximity , where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with idle times in-between them for resource efficiency improvements, and incorporates data dispersion and model dispersion with local model condensation into FedL. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning. We then propose network-aware dynamic model tracking to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.
Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as ...secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.
Device-to-device (D2D) communications is expected to be a critical enabler of distributed computing in edge networks at scale. A key challenge in providing this capability is the requirement for ...judicious management of the heterogeneous communication and computation resources that exist at the edge to meet processing needs. In this paper, we develop an optimization methodology that considers the network topology jointly with device and network resource allocation to minimize total D2D overhead, which we quantify in terms of time and energy required for task processing. Variables in our model include task assignment, CPU allocation, subchannel selection, and beamforming design for multiple-input multiple-output (MIMO) wireless devices. We propose two methods to solve the resulting non-convex mixed integer program: semi-exhaustive search optimization, which represents a "best-effort" at obtaining the optimal solution, and efficient alternate optimization, which is more computationally efficient. As a component of these two methods, we develop a novel coordinated beamforming algorithm which we show obtains the optimal beamformer for a common receiver characteristic. Through numerical experiments, we find that our methodology yields substantial improvements in network overhead compared with local computation and partially optimized methods, which validates our joint optimization approach. Further, we find that the efficient alternate optimization scales well with the number of nodes, and thus can be a practical solution for D2D computing in large networks.