UNI-MB - logo
UMNIK - logo
 
E-resources
Full text
Peer reviewed Open access
  • TCNet: Multiscale Fusion of...
    Xiang, Xuyang; Gong, Wenping; Li, Shuailong; Chen, Jun; Ren, Tianhe

    IEEE journal of selected topics in applied earth observations and remote sensing, 01/2024, Volume: 17
    Journal Article

    Semantic segmentation of remote sensing images plays a critical role in areas such as urban change detection, environmental protection, and geohazard identification. Convolutional Neural Networks (CNN) have been excessively employed for semantic segmentation over the past few years; however, a limitation of the CNN is that there exists a challenge in extracting the global context of remote sensing images, which is vital for semantic segmentation, due to the locality of the convolution operation. It is informed that the recently developed Transformer is equipped with powerful global modeling capabilities. A network called TCNet is proposed in this study, and a parallel-in-branch architecture of the Transformer and the CNN is adopted in the TCNet. As such, the TCNet takes advantage of both Transformer and CNN, both global context and low-level spatial details could be captured in a much shallower manner. In addition, a novel fusion technique called Interactive Self-attention (ISa) is advanced to fuse the multi-level features extracted from both branches. To bridge the semantic gap between regions, a skip connection module called Windowed Self-attention Gating (WSaG) is further developed and added to the progressive upsampling network. Experiments on three public datasets (i.e., Bijie Landslide Dataset, WHU Building Dataset, and Massachusetts Buildings Dataset) depict that TCNet yields superior performance over state-of-the-art models. The IoU values obtained by TCNet for these three datasets are 75.34% (ranked first among ten models compared), 91.16% (ranked first among thirteen models compared), and 76.21% (ranked first among thirteen models compared), respectively.