When the distributional assumptions for a t-test are not met, the default position of many analysts is to resort to a rank-based test, such as the Wilcoxon-Mann-Whitney Test to compare the difference ...in means between two samples. The Wilcoxon-Mann-Whitney Test presents no danger of tied observations when the observations in the data are continuous. However, in practice, observations are discretized due various logical reasons, or the data are ordinal in nature. When ranks are tied, most textbooks recommend using mid-ranks to replace the tied ranks, a practice that affects the distribution of the Wilcoxon-Mann-Whitney Test under the null hypothesis. Other methods for breaking ties have also been proposed. In this study, we examine four tie-breaking methods-average-scores, mid-ranks, jittering, and omission-for their effects on Type I and Type II error of the Wilcoxon-Mann-Whitney Test and the two-sample t-test for various combinations of sample sizes, underlying population distributions, and percentages of tied observations. We use the results to determine the maximum percentage of ties for which the power and size are seriously affected, and for which method of tie-breaking results in the best Type I and Type II error properties. Not surprisingly, the underlying population distribution of the data has less of an effect on the Wilcoxon-Mann-Whitney Test than on the t-test. Surprisingly, we find that the jittering and omission methods tend to hold Type I error at the nominal level, even for small sample sizes, with no substantial sacrifice in terms of Type II error. Furthermore, the t-test and the Wilcoxon-Mann-Whitney Test are equally effected by ties in terms of Type I and Type II error; therefore, we recommend omitting tied observations when they occur for both the two-sample t-test and the Wilcoxon-Mann-Whitney due to the bias in Type I error that is created when tied observations are left in the data, in the case of the t-test, or adjusted using mid-ranks or average-scores, in the case of the Wilcoxon-Mann-Whitney.
Sports such as diving, gymnastics, and ice skating rely on expert judges to score performance accurately. Human error and bias can affect the scores, sometimes leading to controversy, especially at ...high levels. Instant replay or recorded video can be used to assess judges’ scores, or sometimes update judges’ scores, during a competition. For diving in particular, judges are trained to look for certain characteristics of a dive, such as angle of entry, height of splash, and distance of the dive from the end of the board, to score each dive on a scale of 0 to 10, where a 0 is a failed dive and a 10 is a perfect dive. In an effort to obtain objective comparisons for judges’ scores, a diving meet was filmed and the video footage used to measure certain characteristics of each dive for each participant. The variables measured from the video were height of the dive at its apex, angle of entry into the water, and distance of the dive from the end of the board. The measured items were then used as explanatory variables in a regression model where the judge’s scores were the response. The measurements from the video are gathered to provide a gold standard that is specific to the athletic performances at the meet being judged, and supplement judges’ scores with synergistic quantitative and visual information. In this article we show, via a series of regression analyses, that certain aspects of an athlete’s performance measured from video after a meet provide similar information to the judges’ scores. The model was shown to fit the data well enough to warrant use of characteristics from video footage to supplement judges’ scores in future meets. In addition, we calibrated the results from the model against those of meets where the same divers competed to show that the measurement data ranks divers in approximately the same order as they were ranked in other meets, showing meet to meet consistency in measured data and judges’ scores. Eventually, our findings could lead to use of video footage to supplement judges’ scores in real time.
The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains ...are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.
This paper introduces flowthrough centrality, a node centrality measure determined from the hierarchical maximum concurrent flow problem (HMCFP). Based upon the extent to which a node is acting as a ...hub within a network, this centrality measure is defined to be the fraction of the flow passing through the node to the total flow capacity of the node. Flowthrough centrality is compared to the commonly-used centralities of closeness centrality, betweenness centrality, and flow betweenness centrality, as well as to stable betweenness centrality to measure the stability (i.e., accuracy) of the centralities when knowledge of the network topology is incomplete or in transition. Perturbations do not alter the flowthrough centrality values of nodes that are based upon flow as much as they do other types of centrality values that are based upon geodesics. The flowthrough centrality measure overcomes the problem of overstating or understating the roles that significant actors play in social networks. The flowthrough centrality is canonical in that it is determined from a natural, realized flow universally applicable to all networks.
Motivation: Affymetrix GeneChip arrays require summarization in order to combine the probe-level intensities into one value representing the expression level of a gene. However, probe intensity ...measurements are expected to be affected by different levels of non-specific- and cross-hybridization to non-specific transcripts. Here, we present a new summarization technique, the Distribution Free Weighted method (DFW), which uses information about the variability in probe behavior to estimate the extent of non-specific and cross-hybridization for each probe. The contribution of the probe is weighted accordingly during summarization, without making any distributional assumptions for the probe-level data. Results: We compare DFW with several popular summarization methods on spike-in datasets, via both our own calculations and the ‘Affycomp II’ competition. The results show that DFW outperforms other methods when sensitivity and specificity are considered simultaneously. With the Affycomp spike-in datasets, the area under the receiver operating characteristic curve for DFW is nearly 1.0 (a perfect value), indicating that DFW can identify all differentially expressed genes with a few false positives. The approach used is also computationally faster than most other methods in current use. Availability: The R code for DFW is available upon request. Contact:mmcgee@smu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Moderate alcohol intake is associated with lower atherosclerosis risk, presumably due to increased HDL cholesterol (HDL-C) concentrations; however, the metabolic mechanisms of this increase are ...poorly understood.
We tested the hypothesis that ethanol increases HDL-C by raising transport rates (TRs) of the major HDL apolipoproteins apoA-I and -II. We measured the turnover of these apolipoproteins in vivo in paired studies with and without alcohol consumption in 14 subjects. The fractional catabolic rate (FCR) and TR of radiolabeled apoA-I and -II were determined in the last 2 weeks of a 4-week Western-type metabolic diet, without (control) or with alcohol in isocaloric exchange for carbohydrates. Alcohol was given as vodka in fixed amounts ranging from 0.20 to 0.81 g. kg(-1). d(-1) (mean+/-SD 0.45+/-0.19) to reflect the usual daily intake of each subject. HDL-C concentrations increased 18% with alcohol compared with the control (Wilcoxon matched-pairs test, P=0.002). The apoA-I concentrations increased by 10% (P=0.048) and apoA-II concentrations increased by 17% (P=0.005) due to higher apoA-I and -II TRs, respectively, whereas the FCR of both apoA-I and -II did not change. The amount of alcohol consumed correlated with the degree of increase in HDL-C (Pearson's r=0.66, P=0.01) and apoA-I TR (r=0.57, P=0.03). The increase in HDL-C also correlated with the increase in apoA-I TR (r=0.61, P=0.02).
Alcohol intake increases HDL-C in a dose-dependent fashion, associated with and possibly caused by an increase in the TR of HDL apolipoproteins apoA-I and -II.