UP - logo
E-resources
Full text
Peer reviewed
  • LVM-StARS: Large Vision Mod...
    Yang, Bohan; Chen, Yushi; Ghamisi, Pedram

    IEEE geoscience and remote sensing letters, 01/2024, Volume: 21
    Journal Article

    Recently, both large language models and large vision models (LVM) have gained significant attention. Trained on large-scale datasets, these large models have showcased remarkable capabilities across various research domains. To enhance the accuracy of remote sensing (RS) scene classification, LVM-based methods are explored in this letter. Due to the differences between RS images and natural images, simply transferring LVMs to RS tasks is impractical. Therefore, we conducted research on relevant techniques and appended learnable prompt tokens to the input tokens while freezing the backbone weights, reducing the parameter scale and making the LVM weights easier to harness and to transfer. In consideration of latent catastrophic forgetting issues induced by ordinary finetuning techniques and the inherent complexity and redundancy of RS images, we introduced soft adaption mechanisms between backbone layers based on prompt tuning technique, and implemented the first LVM tuning method, namely the LVM-StARS-Deep and the LVM-StARS-Shallow to make LVMs more suitable for RS scene classification tasks. The proposed methods are evaluated on two popular RS scene classification datasets and the experimental results indicate that the proposed method outperforms other state-of-the-art methods. The experimental results demonstrate that our proposed method enhances overall accuracy by 1.71% to 3.94%, while updating only 0.1% to 0.5% of the parameters compared to full finetuning. Furthermore, our method outperforms existing methods.