NUK - logo
E-viri
Celotno besedilo
  • Bai, Lin; Yang, Lina; Huo, Lin; Li, Taoshen

    2018 International Conference on Wavelet Analysis and Pattern Recognition (ICWAPR), 2018-July
    Conference Proceeding

    Automatically describing the content of an image is a challenging task in computer vision that connects the machine learning and natural language processing. In this paper, we present a framework, based on modeling image context, to generate natural sentences describing an image, which consists of two parts: relation modeling and description generating. By modeling the mapping from image spatial context to the logical relationship between objects, the former is trained to maximize the likelihood of the target linguistics phrase describing the relationship between object given the training image. By taking the the advantages of the syntactic-tree based method, the latter takes the predicted relationships as key ingredients to facilitate the image description generation within tree-growth process. We conduct extensive experimental evaluations on MS COCO dataset. Our framework outperforms the state-of-the-art methods. The results demonstrates that our framework provides robust and significant improvements for the relationship prediction between objects and the image description generation.