With an increasing interest in RNA therapeutics and for targeting RNA to treat disease, there is a need for the tools used in protein-based drug design, particularly DOCKing algorithms, to be ...extended or adapted for nucleic acids. Here, we have compiled a test set of RNA-ligand complexes to validate the ability of the DOCK suite of programs to successfully recreate experimentally determined binding poses. With the optimized parameters and a minimal scoring function, 70% of the test set with less than seven rotatable ligand bonds and 26% of the test set with less than 13 rotatable bonds can be successfully recreated within 2 A heavy-atom RMSD. When DOCKed conformations are rescored with the implicit solvent models AMBER generalized Born with solvent-accessible surface area (GB/SA) and Poisson-Boltzmann with solvent-accessible surface area (PB/SA) in combination with explicit water molecules and sodium counterions, the success rate increases to 80% with PB/SA for less than seven rotatable bonds and 58% with AMBER GB/SA and 47% with PB/SA for less than 13 rotatable bonds. These results indicate that DOCK can indeed be useful for structure-based drug design aimed at RNA. Our studies also suggest that RNA-directed ligands often differ from typical protein-ligand complexes in their electrostatic properties, but these differences can be accommodated through the choice of potential function. In addition, in the course of the study, we explore a variety of newly added DOCK functions, demonstrating the ease with which new functions can be added to address new scientific questions.
Closed-form expressions are presented for the numbers of edges in the auxiliary pair graphs (APGs) associated with non spin-orbit and spin-orbit Shavitt graphs for full configuration interaction ...expansions. A Shavitt graph is a visual representation of a configuration state function expansion space constructed via the graphical unitary group approach (GUGA). An APG is an organisational aid and a programmatic tool generated from a Shavitt graph. The number of edges in an APG determines bounds on the computational scaling as a function of the total numbers of electrons, orbitals, and spin multiplicities. The edge counts extend a suite of Shavitt graph statistics based on these functional parameters. The derivation and the presentation of the formulas for the edge counts has been assisted by the bra-ket interchange symmetry and the particle-hole interchange symmetry in the GUGA formalism. These symmetry operators produce one-to-one correspondences between various sets of edges, and this yields identities among some edge count formulas. There are 208 possible edge types. Of these, some do not contribute to two-electron operators, some are related by bra-ket interchange symmetry, and some are related by particle-hole interchange symmetry. For the remaining unique edge types, explicit expressions are derived for the numbers of edges.
The All Configuration Mean Energy (ACME) conditions are a special case of state averaging for Multiconfigurational Self-Consistent-Field (MCSCF) orbital optimisation. The method is formulated using ...the Graphical Unitary Group Approach (GUGA) in which the Configuration State Function (CSF) basis is represented as walks within a Shavitt graph. This graphical formulation leads to efficient recursive algorithms for the energy and reduced density matrices (RDM) that are independent of the CSF dimension and that scale only as O(n
2
) where n is the number of occupied orbitals. The Hamiltonian matrix diagonalization step is obviated and the CSF expansion coefficients are neither referenced nor required during the orbital optimisation. This allows MCSCF orbital optimisation to be performed for essentially unlimited numbers of active orbitals and arbitrarily large CSF expansions. The discussion includes various types of CSF expansion spaces, the partitioning of the essential and redundant orbital optimisation parameters, the computation of the spin-density, and the formulation of state-specific analytic gradients and nonadiabatic coupling for high-level electronic structure methods that use the ACME MCSCF orbitals.
The Representation and Parametrization of Orthogonal Matrices Shepard, Ron; Brozell, Scott R; Gidofalvi, Gergely
The journal of physical chemistry. A, Molecules, spectroscopy, kinetics, environment, & general theory,
07/2015, Volume:
119, Issue:
28
Journal Article
Peer reviewed
Four representations and parametrizations of orthogonal matrices Q ∈ R m × n in terms of the minimal number of essential parameters {φ} are discussed: the exponential representation, the Householder ...reflector representation, the Givens rotation representation, and the rational Cayley transform representation. Both square n = m and rectangular n < m situations are considered. Two separate kinds of parametrizations are considered: one in which the individual columns of Q are distinct, the Stiefel manifold, and the other in which only span(Q) is significant, the Grassmann manifold. The practical issues of numerical stability, continuity, and uniqueness are discussed. The computation of Q in terms of the essential parameters {φ}, and also the extraction of {φ} for a given Q are considered for all of the parametrizations. The transformation of gradient arrays between the Q and {φ} variables is discussed for all representations. It is our hope that developers of new methods will benefit from this comparative presentation of an important but rarely analyzed subject.
An efficient algorithm for computing the maximum-flow path in a network is applied to the identification of the dominant configuration state functions (CSFs) in a graphically contracted function ...(GCF), configuration interaction, wave function. The flow network is a space of spin-adapted CSFs represented by a Shavitt graph, wherein the nodes correspond to orbital occupations and spin quantum numbers. The graph nodes are connected by arcs, and an arc density is defined as sums of the associated squared CSF coefficients. A max-min approach determines an upper bound to the maximum possible incoming flow for each graph node. A backtracking step generates a candidate walk and is followed by a limited search of alternative branching paths for the dominant CSF. The arc density contributions are removed from the graph, and the algorithm is reapplied to the updated graph. This list of generated walks can be partitioned in order to guarantee that the dominant CSFs have been identified. All of the steps in this algorithm are computationally efficient and do not depend on the potentially large dimension of the underlying linear CSF expansion space. An analysis of low-lying valence states of C
2
illustrates the method.
The basic formulation for the multifacet generalization of the graphically contracted function (MFGCF) electronic structure method is presented. The analysis includes the discussion of linear ...dependency and redundancy of the arc factor parameters, the computation of reduced density matrices, Hamiltonian matrix construction, spin-density matrix construction, the computation of optimization gradients for single-state and state-averaged calculations, graphical wave function analysis, and the efficient computation of configuration state function and Slater determinant expansion coefficients. Timings are given for Hamiltonian matrix element and analytic optimization gradient computations for a range of model problems for full-CI Shavitt graphs, and it is observed that both the energy and the gradient computation scale as O(N(2)n(4)) for N electrons and n orbitals. The important arithmetic operations are within dense matrix-matrix product computational kernels, resulting in a computationally efficient procedure. An initial implementation of the method is used to present applications to several challenging chemical systems, including N2 dissociation, cubic H8 dissociation, the symmetric dissociation of H2O, and the insertion of Be into H2. The results are compared to the exact full-CI values and also to those of the previous single-facet GCF expansion form.
In conjunction with the recent American Chemical Society symposium titled “Docking and Scoring: A Review of Docking Programs” the performance of the DOCK6 program was evaluated through (1) pose ...reproduction and (2) database enrichment calculations on a common set of organizer-specified systems and datasets (ASTEX, DUD, WOMBAT). Representative baseline grid score results averaged over five docking runs yield a relatively high pose identification success rate of 72.5 % (symmetry corrected rmsd) and sampling rate of 91.9 % for the multi site ASTEX set (N = 147) using organizer-supplied structures. Numerous additional docking experiments showed that ligand starting conditions, symmetry, multiple binding sites, clustering, and receptor preparation protocols all affect success. Encouragingly, in some cases, use of more sophisticated scoring and sampling methods yielded results which were comparable (Amber score ligand movable protocol) or exceeded (LMOD score) analogous baseline grid-score results. The analysis highlights the potential benefit and challenges associated with including receptor flexibility and indicates that different scoring functions have system dependent strengths and weaknesses. Enrichment studies with the DUD database prepared using the SB2010 preparation protocol and native ligand pairings yielded individual area under the curve (AUC) values derived from receiver operating characteristic curve analysis ranging from 0.29 (bad enrichment) to 0.96 (good enrichment) with an average value of 0.60 (27/38 have AUC ≥ 0.5). Strong early enrichment was also observed in the critically important 1.0–2.0 % region. Somewhat surprisingly, an alternative receptor preparation protocol yielded comparable results. As expected, semi-random pairings yielded poorer enrichments, in particular, for unrelated receptors. Overall, the breadth and number of experiments performed provide a useful snapshot of current capabilities of DOCK6 as well as starting points to guide future development efforts to further improve sampling and scoring.
Practical algorithms are presented for the parameterization of orthogonal matrices Q ∈ R(m×n) in terms of the minimal number of essential parameters {φ}. Both square n = m and rectangular n < m ...situations are examined. Two separate kinds of parameterizations are considered, one in which the individual columns of Q are distinct, and the other in which only Span(Q) is significant. The latter is relevant to chemical applications such as the representation of the arc factors in the multifacet graphically contracted function method and the representation of orbital coefficients in SCF and DFT methods. The parameterizations are represented formally using products of elementary Householder reflector matrices. Standard mathematical libraries, such as LAPACK, may be used to perform the basic low-level factorization, reduction, and other algebraic operations. Some care must be taken with the choice of phase factors in order to ensure stability and continuity. The transformation of gradient arrays between the Q and {φ} parameterizations is also considered. Operation counts for all factorizations and transformations are determined. Numerical results are presented which demonstrate the robustness, stability, and accuracy of these algorithms.