Reconstruction of the long‐term surface temperature history in Antarctica is important for a better understanding of human‐induced climate changes, especially since the Industrial Revolution. We ...present here a surface temperature history spanning the last century at Styx Glacier, located on the eastern coast of northern Victoria Land, which is reconstructed using borehole logging data. Our results indicate that surface temperatures in the 20th century were 1.7 ± 0.4 °C higher than the long term averages over 1600–1900 Common Era, indicating regional warming over the eastern coast of northern Victoria Land. However, we found no evidence for significant warming across the northern Victoria Land since the mid‐20th century. A global reanalysis as well as the reconstruction of proxy records demonstrate that the climate in this region was more affected by changes in the Southern Hemisphere Annular Mode than in the Amundsen‐Bellingshausen Sea Low.
Plain Language Summary
The western coast of the Ross Sea, northern Victoria Land, is one of several regions around the world where the temperature history is highly uncertain. Here we provide a temperature reconstruction using borehole logging data from Styx Glacier. The reconstructed temperature history indicates that the surface temperature at Styx Glacier in the 20th century is higher than in previous centuries, although there is no significant trend since ~1950s. The lack of recent warming trend off the western Ross Sea is in contrast to the warming in the Antarctic Peninsula and West Antarctica.
Key Points
The surface temperature in the 20th century at Styx Glacier (western coast of the Ross Sea) is higher by 1.7 +/− 0.4 degrees than before 1900 CE
No clear warming trend since the mid‐20th century was found in northern Victoria Land, Antarctica
The climate over the western coast of the Ross Sea might be affected by the Southern Hemisphere Annular Mode
Metric Learning for User-Defined Keyword Spotting Jung, Jaemin; Kim, Youkyum; Park, Jihwan ...
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2023-June-4
Conference Proceeding
Open access
The goal of this work is to detect new spoken terms defined by users. While most previous works address Keyword Spotting (KWS) as a closed-set classification problem, this limits their ...transferability to unseen terms. The ability to define custom keywords has advantages in terms of user experience.In this paper, we propose a metric learning-based training strategy for user-defined keyword spotting. In particular, we make the following contributions: (1) we construct a large-scale keyword dataset with an existing speech corpus and propose a filtering method to remove data that degrade model training; (2) we propose a metric learning-based two-stage training strategy, and demonstrate that the proposed method improves the performance on the user-defined keyword spotting task by enriching their representations; (3) to facilitate the fair comparison in the user-defined KWS field, we propose unified evaluation protocol and metrics.Our proposed system does not require an incremental training on the user-defined keywords, and outperforms previous works by a significant margin on the Google Speech Commands dataset using the proposed as well as the existing metrics.
The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition (CSLR) that addresses key issues of sign language recognition. These include the need for ...complex multi-scale features such as hands, face, and mouth for understanding, and absence of frame-level annotations. To this end, we propose (1) Divide and Focus Convolution (DFConv) which extracts both manual and non-manual features without the need for additional networks or annotations, and (2) Dense Pseudo-Label Refinement (DPLR) which propagates non-spiky frame-level pseudo-labels by combining the ground truth gloss sequence labels with the predicted sequence. We demonstrate that our model achieves state-of-the-art performance among RGB-based methods on large-scale CSLR benchmarks, PHOENIX-2014 and PHOENIX-2014-T, while showing comparable results with better efficiency when compared to other approaches that use multi-modality or extra annotations.
Firn air provides plenty of old air from the near past, and can therefore be
useful for understanding human impact on the recent history of the
atmospheric composition. Most of the existing firn air ...records cover only
the last several decades (typically 40 to 55 years) and are insufficient to
understand the early part of anthropogenic impacts on the atmosphere. In
contrast, a few firn air records from inland sites, where temperatures and
snow accumulation rates are very low, go back in time about a century. In
this study, we report an unusually old firn air effective CO2 age of 93 years from Styx Glacier, near the Ross Sea coast in Antarctica. This is the
first report of such an old firn air age (>55 years) from a warm
coastal site. The lock-in zone thickness of 12.4 m is larger than at other
sites where snow accumulation rates and air temperature are similar.
High-resolution X-ray density measurements demonstrate a high variability of
the vertical snow density at Styx Glacier. The CH4 mole fraction and
total air content of the closed pores also indicate large variations in
centimeter-scale depth intervals, indicative of layering. We hypothesize that the
large density variations in the firn increase the thickness of the lock-in
zone and, consequently, increase the firn air ages because the age of firn
air increases more rapidly with depth in the lock-in zone than in the
diffusive zone. Our study demonstrates that all else being equal, sites
where weather conditions are favorable for the formation of large density
variations at the lock-in zone preserve older air within their open
porosity, making them ideal places for firn air sampling.
The objective of this work is the effective extraction of spatial and dynamic features for Continuous Sign Language Recognition (CSLR). To accomplish this, we utilise a two-pathway SlowFast network, ...where each pathway operates at distinct temporal resolutions to separately capture spatial (hand shapes, facial expressions) and dynamic (movements) information. In addition, we introduce two distinct feature fusion methods, carefully designed for the characteristics of CSLR: (1) Bi-directional Feature Fusion (BFF), which facilitates the transfer of dynamic semantics into spatial semantics and vice versa; and (2) Pathway Feature Enhancement (PFE), which enriches dynamic and spatial representations through auxiliary subnetworks, while avoiding the need for extra inference time. As a result, our model further strengthens spatial and dynamic representations in parallel. We demonstrate that the proposed framework outperforms the current state-of-the-art performance on popular CSLR datasets, including PHOENIX14, PHOENIX14-T, and CSL-Daily.
The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with ...promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for its capability in generating natural samples. For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism. This mechanism is specifically tailored for the speech domain to integrate the phonetic information from audio-visual correspondence in speech generation. In this way, the fusion process maintains the high temporal resolution of the features, without excessive computational requirements. We demonstrate that the proposed framework achieves state-of-the-art results on two benchmarks, including VoxCeleb2 and LRS3, producing speech with notably better naturalness.
The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad. Our framework consists of the following three key components: (1) We employ ...discrete wavelet transform that decomposes a complicated waveform into sub-band wavelets, which helps FreGrad to operate on a simple and concise feature space, (2) We design a frequency-aware dilated convolution that elevates frequency awareness, resulting in generating speech with accurate frequency information, and (3) We introduce a bag of tricks that boosts the generation quality of the proposed model. In our experiments, FreGrad achieves 3.7 times faster training time and 2.2 times faster inference speed compared to our baseline while reducing the model size by 0.6 times (only 1.78M parameters) without sacrificing the output quality. Audio samples are available at: https://mm.kaist.ac.kr/projects/FreGrad.