Akademska digitalna zbirka SLovenije - logo
E-resources
Full text
Peer reviewed Open access
  • Mining the Risk Types of Hu...
    Park, S. -B.; Hwang, S.; Zhang, B. -T.

    Database and Expert Systems Applications, 2003
    Book Chapter, Conference Proceeding

    Human Papillomavirus (HPV) infection is known as the main factor for cervical cancer, where cervical cancer is a leading cause of cancer deaths in women worldwide. Because there are more than 100 types in HPV, it is critical to discriminate the HPVs related with cervical cancer from those not related with it. In this paper, we classify the risk type of HPVs using their textual explanation. The important issue in this problem is to distinguish false negatives from false positives. That is, we must find out high-risk HPVs though we may miss some low-risk HPVs. For this purpose, the AdaCost, a cost-sensitive learner is adopted to consider different costs between training examples. The experimental results on the HPV sequence database show that considering costs gives higher performance. The F-score is higher than the accuracy, which implies that most high-risk HPVs are found.