Akademska digitalna zbirka SLovenije - logo
E-resources
Full text
Peer reviewed
  • Multi-view 3D object retrie...
    Lin, Dongyun; Li, Yiqun; Cheng, Yi; Prasad, Shitala; Nwe, Tin Lay; Dong, Sheng; Guo, Aiyuan

    Knowledge-based systems, 07/2022, Volume: 247
    Journal Article

    In multi-view 3D object retrieval tasks, it is pivotal to aggregate visual features extracted from multiple view images to generate a discriminative representation for a 3D object. The existing multi-view convolutional neural network employs view pooling for feature aggregation, which ignores the local view-relevant discriminative information within each view image and the global correlative information across all view images. To leverage both types of information, we propose two self-attention modules, namely, View Attention Module and Instance Attention Module, to learn view and instance attentive features, respectively. The final representation of a 3D object is the aggregation of three features: original, view-attentive, and instance-attentive. Furthermore, we propose employing the ArcFace loss together with the cosine-distance-based triplet-center loss as the metric learning guidance to train our model. As the cosine distance is used to rank the retrieval results, our angular metric learning losses achieve a consistent objective between the training and testing processes, thereby facilitating discriminative feature learning. Extensive experiments and ablation studies are conducted on four publicly available datasets on 3D object retrieval to show the superiority of the proposed method over multiple state-of-the-art methods. •We propose to leverage the aggregation of view and instance attentive features for multi-view 3D object retrieval.•To leverage local view-relevant discriminative information within each of the view images, we propose a View Attention Module (VAM) to learn view attentive features for each view image.•To leverage global correlative information across all the view images, we propose an Instance Attention Module (IAM) to learn instance attentive features for each view image.•We propose to employ ArcFace loss together with cosine distance based triplet-center loss as the metric learning guidance to learn discriminative representations in the angular feature space.