DIKUL - logo
E-resources
Full text
Peer reviewed Open access
  • Unsupervised Clustering and...
    Pu, Yunjiao; Cummer, Steven A.; Lyu, Fanchao; Zheng, Yu; Briggs, Michael S.; Lesage, Stephen; Mailyan, Bagrat; Roberts, Oliver J.

    Journal of geophysical research. Atmospheres, 16 May 2023, Volume: 128, Issue: 9
    Journal Article

    We developed a framework merging unsupervised and supervised machine learning to classify lightning radio signals, and applied it to the possible detection of terrestrial gamma‐ray flashes (TGFs). Recent studies have established a tight connection between energetic in‐cloud pulses (EIPs, >150 kA) and a subset of TGFs, enabling continuous and large‐scale ground‐based TGF detection. However, even with a high peak current threshold, it is time‐consuming to manually search for EIPs in a background of many non‐EIP events, and it becomes even more difficult when a lower peak current threshold is used. Machine learning classifiers are an effective tool. Beginning with unsupervised learning, spectral clustering is performed on the low‐dimensional features extracted by an autoencoder from raw radio waveforms, showing that +EIPs naturally constitute a distinct class of waveform and 6%–7% of the total population. The clustering results are used to form a labeled data set (∼10,000 events) to further train supervised convolutional neural network (CNN) that targets for +EIPs. Our CNN models identify on average 95.2% of true +EIPs with accuracy up to 98.7%, representing a powerful tool for +EIP classification. The pretrained CNN classifier is further applied to identify lower peak current EIPs (LEIPs, >50 kA) from a larger data set (∼30,000 events). Among 10 LEIPs coincident with Fermi TGF observations, 2 previously reported TGFs and 2 unreported but suspected TGFs are found, while the majority are not associated with detectable TGFs. In addition, unsupervised clustering is found to reflect characteristics of the ionosphere reflection height and its effect on radio wave propagation. Plain Language Summary In this study, we developed a machine learning‐based method to classify lightning radio signals. The method uses unsupervised and supervised machine learning to distinguish different types of signals with high accuracy. The focus of the study is to identify energetic in‐cloud pulses (EIPs) that are associated with a subset of lightning‐related terrestrial gamma‐ray flashes (TGFs). The method successfully identified 95.2% of true EIPs with up to 98.7% accuracy, and discovered new TGF events that were not previously reported. Additionally, the method revealed insights into the ionosphere and radio wave propagation. This method can be useful for studying lightning and other related phenomena. Key Points A framework merging unsupervised clustering and supervised convolutional neural network (CNN) for lightning classification is developed Clustering of positive polarity energetic lightning radio pulses (>150 kA) identifies three processes: +EIPs (6%–7%), +NBEs, and +CGs CNNs detect 95.2% of manually identified +EIPs with up to 98.7% accuracy, enabling studying EIP‐TGF link with lower peak current (>50 kA)