UNI-MB - logo
UMNIK - logo
 
E-resources
Full text
Peer reviewed Open access
  • On triangle inequalities of...
    Chen, Jiaxing; Ng, Yen Kaow; Lin, Lu; Zhang, Xianglilan; Li, Shuaicheng

    BMC bioinformatics, 02/2023, Volume: 24, Issue: 1
    Journal Article

    Distance functions are fundamental for evaluating the differences between gene expression profiles. Such a function would output a low value if the profiles are strongly correlated-either negatively or positively-and vice versa. One popular distance function is the absolute correlation distance, Formula: see text, where Formula: see text is similarity measure, such as Pearson or Spearman correlation. However, the absolute correlation distance fails to fulfill the triangle inequality, which would have guaranteed better performance at vector quantization, allowed fast data localization, as well as accelerated data clustering. In this work, we propose Formula: see text as an alternative. We prove that Formula: see text satisfies the triangle inequality when Formula: see text represents Pearson correlation, Spearman correlation, or Cosine similarity. We show Formula: see text to be better than Formula: see text, another variant of Formula: see text that satisfies the triangle inequality, both analytically as well as experimentally. We empirically compared Formula: see text with Formula: see text in gene clustering and sample clustering experiment by real-world biological data. The two distances performed similarly in both gene clustering and sample clustering in hierarchical clustering and PAM (partitioning around medoids) clustering. However, Formula: see text demonstrated more robust clustering. According to the bootstrap experiment, Formula: see text generated more robust sample pair partition more frequently (P-value Formula: see text). The statistics on the time a class "dissolved" also support the advantage of Formula: see text in robustness. Formula: see text, as a variant of absolute correlation distance, satisfies the triangle inequality and is capable for more robust clustering.