UP - logo
E-viri
Celotno besedilo
Recenzirano
  • Transound: Hyper-head atten...
    Tang, Quan; Xu, Liming; Zheng, Bochuan; He, Chunlin

    Ecological informatics, July 2023, 2023-07-00, Letnik: 75
    Journal Article

    Bird strikes in low-altitude areas can cause severe economic losses and endanger the lives of airline passengers. Thus, it is necessary to drive away the corresponding birds, which requires adequate and accurate identification of birds. In this paper, we propose an effective bird identification algorithm using a vision transformer (ViT) with hyper-head attention and a Mel frequency cepstral coefficient (MFCC) flow framework. The original sound signal is preprocessed by using preemphasis, framing, and windowing. Then, the designed MFCC flow, which includes discrete Fourier transform, Mel frequency filtering, and discrete cosine transform operations, is proposed to extract sound features, which are then normalized as a recognizable visual dataset that contains the visual feature and can be identified by subsequent visual feature networks. Next, the ViT with hyper-head attention is designed to encode visual features and accurately identify birds. Extensive experiments on two public datasets show that the proposed method performs satisfactorily. Compared with five recent state-of-the-art approaches, the proposed Transound method achieves average increments of 10.64%, 5.65%, 1.15%, 1.78%, and 1.51%. •We propose a vision Transformer with hyper-head attention to achieve visual encoding and accurate Birds sound recognition.•We propose MFCC flow to describe the dynamic transformation relationships among patches.•We incorporate a hyper-head attention mechanism into the vision Transformer to measure vision and region similarity.•Ours achieves better performance than other state-of-the-art sound recognition approaches.