Python for Natural Language Processing Goto, Isao
The Journal of The Institute of Image Information and Television Engineers,
2018, Volume:
72, Issue:
11
Journal Article
Motivation Biologists often wish to use their knowledge on a few experimental models of a given molecular system to identify homologs in genomic data. We developed a generic tool for this purpose. ...Results Macromolecular System Finder (MacSyFinder) provides a flexible framework to model the properties of molecular systems (cellular machinery or pathway) including their components, evolutionary associations with other systems and genetic architecture. Modelled features also include functional analogs, and the multiple uses of a same component by different systems. Models are used to search for molecular systems in complete genomes or in unstructured data like metagenomes. The components of the systems are searched by sequence similarity using Hidden Markov model (HMM) protein profiles. The assignment of hits to a given system is decided based on compliance with the content and organization of the system model. A graphical interface, MacSyView, facilitates the analysis of the results by showing overviews of component content and genomic context. To exemplify the use of MacSyFinder we built models to detect and class CRISPR-Cas systems following a previously established classification. We show that MacSyFinder allows to easily define an accurate "Cas-finder" using publicly available protein profiles. Availability and Implementation MacSyFinder is a standalone application implemented in Python. It requires Python 2.7, Hmmer and makeblastdb (version 2.2.28 or higher). It is freely available with its source code under a GPLv3 license at https://github.com/gem-pasteur/macsyfinder. It is compatible with all platforms supporting Python and Hmmer/makeblastdb. The "Cas-finder" (models and HMM profiles) is distributed as a compressed tarball archive as Supporting Information.
Python for Scientists and Engineers Millman, K. Jarrod; Aivazis, Michael
Computing in science & engineering,
2011-March-April, 2011-03-00, 20110301, Volume:
13, Issue:
2
Journal Article
Peer reviewed
Open access
Python has arguably become the de facto standard for exploratory, interactive, and computation-driven scientific research. This issue discusses Python's advantages for scientific research and ...presents several of the core Python libraries and tools used in scientific research.
Albany is a parallel C++ finite element library for solving forward and inverse problems involving partial differential equations (PDEs). In this paper we introduce PyAlbany, a newly developed Python ...interface to the Albany library. PyAlbany can be used to effectively drive Albany enabling fast and easy analysis and post-processing of applications based on PDEs that are pre-implemented in Albany. PyAlbany relies on the library PyBind11 to bind Python with C++ Albany code. Here we detail the implementation of PyAlbany and showcase its capabilities through a number of examples targeting a heat-diffusion problem. In particular we consider (1) the generation of samples for a Monte Carlo application, (2) a scalability study, (3) a study of parameters on the performance of a linear solver, and (4) a tool for performing eigenvalue decompositions of matrix-free operators for a Bayesian inference application.
PacBio high fidelity (HiFi) sequencing reads are both long (15-20 kb) and highly accurate (> Q20). Because of these properties, they have revolutionised genome assembly leading to more accurate and ...contiguous genomes. In eukaryotes the mitochondrial genome is sequenced alongside the nuclear genome often at very high coverage. A dedicated tool for mitochondrial genome assembly using HiFi reads is still missing.
MitoHiFi was developed within the Darwin Tree of Life Project to assemble mitochondrial genomes from the HiFi reads generated for target species. The input for MitoHiFi is either the raw reads or the assembled contigs, and the tool outputs a mitochondrial genome sequence fasta file along with annotation of protein and RNA genes. Variants arising from heteroplasmy are assembled independently, and nuclear insertions of mitochondrial sequences are identified and not used in organellar genome assembly. MitoHiFi has been used to assemble 374 mitochondrial genomes (368 Metazoa and 6 Fungi species) for the Darwin Tree of Life Project, the Vertebrate Genomes Project and the Aquatic Symbiosis Genome Project. Inspection of 60 mitochondrial genomes assembled with MitoHiFi for species that already have reference sequences in public databases showed the widespread presence of previously unreported repeats.
MitoHiFi is able to assemble mitochondrial genomes from a wide phylogenetic range of taxa from Pacbio HiFi data. MitoHiFi is written in python and is freely available on GitHub ( https://github.com/marcelauliano/MitoHiFi ). MitoHiFi is available with its dependencies as a Docker container on GitHub (ghcr.io/marcelauliano/mitohifi:master).
Array programming with NumPy Harris, Charles R; Millman, K Jarrod; van der Walt, Stéfan J ...
Nature (London),
09/2020, Volume:
585, Issue:
7825
Journal Article
Peer reviewed
Open access
Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array ...programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves
and in the first imaging of a black hole
. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.
The Conventional Gait Model (CGM), known by a variety of different names, is widely used in clinical gait analysis. We present pyCGM2, an open-source implementation of the CGM with two versions. The ...first, CGM1.0, is a clone of Vicon Plug In Gait (PiG) with all its variants. CGM1.0 provides a platform to test the effect of modifications to the CGM on data collected and processed retrospectively or to provide backward compatibility. The second version, CGM1.1, offers some practical modifications and includes three well documented improvements.
How do improvements of the conventional gait model affect joint kinematics and kinetics?
The practical modifications include the possibility to use a medial knee epicondyle marker, during static calibration only, to define the medio-lateral axis of the femur in place of the knee alignment device. The three improvements correspond to the change of pelvis angle decomposition sequence, the adoption of a single tibia coordinate system, and the default decomposition of the joint moments in the joint coordinate system. We validated the outputs of version CGM1.0 against Vicon-PiG, and estimated the effect of the modifications included in version CGM1.1 using gait data collected in 16 healthy participants.
Kinematics and kinetics of CGM1.0 were superimposed with that of Vicon-PiG, with root mean square differences less than 0.04° for kinematics and less than 0.05 N.m.kg-1 for kinetics.
The differences between the CGM1.1 and CGM1.0 were minimal in the healthy participant cohort but we discussed the expected difference in participants with different gait pathologies. We hope that the pyCGM2 will facilitate the systematic testing and the use of improved processing methods for the conventional gait model.