Motivation: Unsupervised analysis of microarray gene expression data attempts to find biologically significant patterns within a given collection of expression measurements. For example, hierarchical ...clustering can be applied to expression profiles of genes across multiple experiments, identifying groups of genes that share similiar expression profiles. Previous work using the support vector machine supervised learning algorithm with microarray data suggests that higher-order features, such as pairwise and tertiary correlations across multiple experiments, may provide significant benefit in learning to recognize classes of co-expressed genes. Results: We describe a generalization of the hierarchical clustering algorithm that efficiently incorporates these higher-order features by using a kernel function to map the data into a high-dimensional feature space. We then evaluate the utility of the kernel hierarchical clustering algorithm using both internal and external validation. The experiments demonstrate that the kernel representation itself is insufficient to provide improved clustering performance. We conclude that mapping gene expression data into a high-dimensional feature space is only a good idea when combined with a learning algorithm, such as the support vector machine that does not suffer from the curse of dimensionality. Availability: Supplementary data at www.cs.columbia.edu/compbio/hiclust. Software source code available by request.
As-received and heat-treated Ti40Ta and Ti50Ta alloys were evaluated to determine their corrosion as well as mechanical performances and compared to Ti6A14V, a common material utilized for orthopedic ...(surgical) implants. Anodic potentiodynamic tests performed in Plasmalyte showed that all samples, except for the Ti50Ta specimen aged at 400 degrees C for 3 h gave a curve similar to that of Ti6A14V. Optical and TEM microscopy was performed to determine as-received and heat-treated microstructures. As-received materials showed an alpha precipitate in an alpha+beta and martensite matrix. Samples that were aged at 400 degrees C increased in the density and the length of the alpha precipitate. Vickers hardness measurements were performed to get an approximation of the tensile strengths. Aged Ti40Ta and Ti50Ta specimens produced the highest tensile values when compared to the Ti6A14V material, representing a 31% and 56% increase for the 3 h samples and an 18% and 58% increase for the 10 h samples. Of all the materials studied the Ti50Ta specimen aged for 10 h exhibited the best biocompatibility showing excellent corrosion resistance combined with the highest tensile strength (1089 MPa and 58% harder/stronger than Ti6A14V).
Motivation: Classification of proteins sequences into functional and structural families based on sequence homology is a central problem in computational biology. Discriminative supervised machine ...learning approaches provide good performance, but simplicity and computational efficiency of training and prediction are also important concerns. Results: We introduce a class of string kernels, called mismatch kernels, for use with support vector machines (SVMs) in a discriminative approach to the problem of protein classification and remote homology detection. These kernels measure sequence similarity based on shared occurrences of fixed-length patterns in the data, allowing for mutations between patterns. Thus, the kernels provide a biologically well-motivated way to compare protein sequences without relying on family-based generative models such as hidden Markov models. We compute the kernels efficiently using a mismatch tree data structure, allowing us to calculate the contributions of all patterns occurring in the data in one pass while traversing the tree. When used with an SVM, the kernels enable fast prediction on test sequences. We report experiments on two benchmark SCOP datasets, where we show that the mismatch kernel used with an SVM classifier performs competitively with state-of-the-art methods for homology detection, particularly when very few training examples are available. Examination of the highest-weighted patterns learned by the SVM classifier recovers biologically important motifs in protein families and superfamilies. Availability: SVM software is publicly available at http://microarray.cpmc.columbia.edu/gist. Mismatch kernel software is available upon request.
The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the ...search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions.
OBJECTIVE
To study interactions between leptin and the pituitary–thyroid axis, both in euthyroid and dysthyroid states.
SUBJECTS AND MEASUREMENTS
We investigated the relationships of plasma leptin to ...levels of free thyroid hormones and TSH in 18 patients with newly diagnosed hyperthyroidism, 22 with newly diagnosed primary hypothyroidism, and 32 lean (body mass index BMI < 30) and 37 obese (BMI > 30 kg/m2) euthyroid subjects. Hypothyroid patients were restudied during thyroxine replacement treatment.
RESULTS
Median interquartile range plasma leptin concentrations were highest in obese euthyroid subjects (31.5 19.0–48.0 and in untreated hypothyroid patients (19.2 11.5–31.5), and lowest levels in untreated hyperthyroid patients (8.9 5.5–11.1) and lean euthyroid control subjects (6.6 3.9–14.4 μg/l (Kruskall–Wallis one‐way analysis of variance; P < 0.0001). In euthyroid subjects, plasma leptin levels were higher in obese than in lean subjects (P < 0.00001). In obese subjects plasma levels of TSH correlated with percentage body fat (r = 0.67; P < 0.001) and plasma leptin (r = 0.61; P < 0.001). In untreated hyperthyroid subjects plasma leptin was unrelated to free T3, and in untreated hypothyroidism plasma leptin was unrelated to either free T3 or TSH concentrations (all P = NS). In untreated hyperthyroid, but not hypothyroid, patients plasma leptin concentrations correlated with BMI (r = 0.57; P = 0.02). Treatment of hypothyroidism with thyroxine resulted in a significant reduction in plasma leptin concentrations from 20.8 (11.8 to 31.6) to 12.9 (4.6–21.2) μg/l (P = 0.005), but BMI did not change significantly in the hypothyroid subjects being studied prospectively.
CONCLUSIONS
(i) In euthyroid subjects, plasma leptin and TSH levels correlate, and both are positively correlated with adiposity. (ii) Plasma leptin was significantly elevated in hypothyroid subjects, to levels similar to those seen in obese euthyroid subjects. (iii) Treatment of hypothyroidism resulted in a reduction in the raised plasma leptin levels. The data are consistent with the hypothesis that leptin and the pituitary–thyroid axis interact in the euthyroid state, and that hypothyroidism reversibly increases leptin concentrations.
A widespread proteomics procedure for characterizing a complex mixture of proteins combines tandem mass spectrometry and database search software to yield mass spectra with identified peptide ...sequences. The same peptides are often detected in multiple experiments, and once they have been identified, the respective spectra can be used for future identifications. We present a method for collecting previously identified tandem mass spectra into a reference library that is used to identify new spectra. Query spectra are compared to references in the library to find the ones that are most similar. A dot product metric is used to measure the degree of similarity. With our largest library, the search of a query set finds 91% of the spectrum identifications and 93.7% of the protein identifications that could be made with a SEQUEST database search. A second experiment demonstrates that queries acquired on an LCQ ion trap mass spectrometer can be identified with a library of references acquired on an LTQ ion trap mass spectrometer. The dot product similarity score provides good separation of correct and incorrect identifications.
Motivation: Sequence similarity often suggests evolutionary relationships between protein sequences that can be important for inferring similarity of structure or function. The most widely-used ...pairwise sequence comparison algorithms for homology detection, such as BLAST and PSI-BLAST, often fail to detect less conserved remotely-related targets. Results: In this paper, we propose a new general graph-based propagation algorithm called MotifProp to detect more subtle similarity relationships than pairwise comparison methods. MotifProp is based on a protein-motif network, in which edges connect proteins and the k-mer based motif features that they contain. We show that our new motif-based propagation algorithm can improve the ranking results over a base algorithm, such as PSI-BLAST, that is used to initialize the ranking. Despite the complex structure of the protein-motif network, MotifProp can be easily interpreted using the top-ranked motifs and motif-rich regions induced by the propagation, both of which are helpful for discovering conserved structural components in remote homologies. Availability: http://www.cs.columbia.edu/compbio/motifprop Contact: cleslie@cs.columbia.edu
The purpose of this study was to perform a comparative analysis of powder-bed-based additive manufacturing (AM) technologies during the production of metallic components using Inconel 625 powder ...material. The AM technologies explored in this study include electron beam powder bed fusion (EPBF), laser powder bed fusion (LPBF), and binder jetting technology. Samples were fabricated in two build directions (X and Z build orientations) for this evaluation process, where all specimens underwent a hot isostatic pressing (HIP) post-process. The comparison was made in terms of microstructure and mechanical properties including ultimate tensile strength (UTS), yield strength (YS), percent elongation, and modulus of elasticity (E). Microstructural characterization showed evidence of equiaxed grain formation for binder jetting and LPBF parts, whereas EPBF parts displayed a more columnar grain formation parallel to the build direction. Six specimens were tested per technology, three built in the X orientation and three built in the Z orientation. All six specimens were built in a single run of each AM machine. Results indicated that all three technologies are capable of meeting the minimum requirements of the ASTM F3056-14 standard for parts produced in the X orientation, with properties that are similar to wrought Inconel 625. In the Z orientation, however, only LPBF was able to meet the minimum standard requirements. Through the comparative analysis of the mechanical properties, here this work showed that LPBF outperformed the other technologies in a majority of the evaluated properties, followed by EPBF and binder jetting. An analysis of the fracture surfaces of tensile specimens was also performed, and it indicated ductile fracture (dimple rupture) for the specimens produced with all three of the AM technologies studied. Nevertheless, the characterization also showed certain differences in the fractured surfaces, such as the presence of un-sintered powder particles for the binder jetting processed Inconel 625, or the development of the so called woody structure for the EPBF processed material. This study can be used to determine distinct characteristics between the three powder-bed-based technologies for the fabrication of Inconel 625 that can further include other technologies and materials using similar approaches.
New results are presented from the teleseismic component of the Jemez Tomography Experiment conducted across Valles caldera in northern New Mexico. We invert 4872 relative P wave arrival times ...recorded on 50 portable stations to determine velocity structure to depths of 40 km. The three principle features of our model for Valles caldera are: (1) near‐surface low velocities of −17% beneath the Toledo embayment and the Valle Grande, (2) midcrustal low velocities of −23% in an ellipsoidal volume underneath the northwest quadrant of the caldera, and (3) a broad zone of low velocities (−15%) in the lower crust or upper mantle. Crust shallower than 20 km is generally fast to the northwest of the caldera and slow to the southeast. Near‐surface low velocities are interpreted as thick deposits of Bandelier tuff and postcaldera volcaniclastic rocks. Lateral variation in the thickness of these deposits supports increased caldera collapse to the southeast, beneath the Valle Grande. We interpret the midcrustal low‐velocity zone to contain a minimum melt fraction of 10%. While we cannot rule out the possibility that this zone is the remnant 1.2 Ma Bandelier magma chamber, the eruption history and geochemistry of the volcanic rocks erupted in Valles caldera following the Bandelier tuff make it more likely that magma results from a new pulse of intrusion, indicating that melt flux into the upper crust beneath Valles caldera continues. The low‐velocity zone near the crust‐mantle boundary is consistent with either partial melt in the lower crust or mafic rocks without partial melt in the upper mantle. In either case, this low‐velocity anomaly indicates that underplating by mantle‐derived melts has occurred.
We carried out a large-scale screen to identify interactions between integral membrane proteins of Saccharomyces cerevisiae by using a modified split-ubiquitin technique. Among 705 proteins annotated ...as integral membrane, we identified 1,985 putative interactions involving 536 proteins. To ascribe confidence levels to the interactions, we used a support vector machine algorithm to classify interactions based on the assay results and protein data derived from the literature. Previously identified and computationally supported interactions were used to train the support vector machine, which identified 131 interactions of highest confidence, 209 of the next highest confidence, 468 of the next highest, and the remaining 1,085 of low confidence. This study provides numerous putative interactions among a class of proteins that have been difficult to analyze on a high-throughput basis by other approaches. The results identify potential previously undescribed components of established biological processes and roles for integral membrane proteins of ascribed functions.