We juxtapose 386 prominent contrarians with 386 expert scientists by tracking their digital footprints across ∼200,000 research publications and ∼100,000 English-language digital and print media ...articles on climate change. Projecting these individuals across the same backdrop facilitates quantifying disparities in media visibility and scientific authority, and identifying organization patterns within their association networks. Here we show via direct comparison that contrarians are featured in 49% more media articles than scientists. Yet when comparing visibility in mainstream media sources only, we observe just a 1% excess visibility, which objectively demonstrates the crowding out of professional mainstream sources by the proliferation of new media sources, many of which contribute to the production and consumption of climate change disinformation at scale. These results demonstrate why climate scientists should increasingly exert their authority in scientific and public discourse, and why professional journalists and editors should adjust the disproportionate attention given to contrarians.
Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or ...hearing aids. In addition, they are crucial preprocessing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multichannel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this paper, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.
Localizing audio sources is challenging in real reverberant environments, especially when several sources are active. We propose to use a neural network built from stacked convolutional and recurrent ...layers in order to estimate the directions of arrival of multiple sources from a first-order Ambisonics recording. It returns the directions of arrival over a discrete grid of a known number of sources. We propose to use features derived from the acoustic intensity vector as inputs. We analyze the behavior of the neural network by means of a visualization technique called layerwise relevance propagation. This analysis highlights which parts of the input signal are relevant in a given situation. We also conduct experiments to evaluate the performance of our system in various environments, from simulated rooms to real recordings, with one or two speech sources. The results show that the proposed features significantly improve performances with respect to raw Ambisonics inputs.
•An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data simulation mismatches.•Based on: a ...critical analysis of the results published on the CHiME-3 dataset and new experiments.•Result: with the exception of MVDR beamforming, these mismatches have little effect on the ASR performance.•Contribution: the CHiME-4 challenge, which revisits the CHiME-3 dataset and reduces the number of microphones available for testing.
Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME-3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of different noise environments, different numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge, which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing.
•The presentation of a unique multi-microphone speech recognition challenge with speech recorded in real environments.•A detailed characterisation of the challenge audio using novel analyses to ...estimate key properties of the speakers, environments and noisy speech signals.•An overview of 26 systems submitted to the challenge presenting a snapshot of the state-of-the-art in distant microphone ASR.•A presentation of system performance identifying which signal processing and statistical modelling techniques are the most beneficial.•A presentation of correlations between signal characteristics and system performances across utterances addressing the question, “What are the particular circumstances that lead to high word error rates?”
This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various ‘axes of difficulty’ by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations.
Like many web platforms, Facebook is under pressure to regulate misinformation. According to the company, users that repeatedly share misinformation (‘repeat offenders’) will have their distribution ...reduced, but little is known about the implementation or the impacts of this measure. The first contribution of this paper is to offer a methodology to investigate the implementation and consequences of this measure, which relies on an analysis combining fact-checking and engagement metrics data. Using a Science Feedback and a Social Science One (Condor) datasets, we identified a set of public accounts (groups and pages) that have shared misinformation repeatedly during the 2019–2020 period. We find that the engagement per post decreased significantly for Facebook pages after they shared two or more ‘false news’. The median decrease for pages identified with the Science Feedback dataset is −43%, while this value reaches −62% for pages identified using the Condor dataset. In a different approach, we identified a set of pages claiming to be under ‘reduced distribution’ for repeatedly sharing misinformation and having received a notification from Facebook. With this set of pages, we observed a median decrease of −25% in engagement per post averaged over 30 days after receiving the notification minus 30 days before. We show that this ‘repeat offenders’ penalty did not apply to Facebook groups. Instead, we discover that groups have been affected in a different way with a sudden drop in their average engagement per post that occurred around June 9, 2020. While this drop has cut the groups’ engagement per post in about half, this decrease was compensated by the fact that these accounts have doubled their number of posts between early 2019 and summer 2020. The net result is that the total engagement on posts from ‘repeat offender’ accounts (including both pages and groups) returned to its early 2019 levels. Overall, Facebook’s policy thus appears to be able to contain the increase in misinformation shared by ‘repeat offenders’ rather than to decrease it.
Most audio source separation methods are developed for a particular scenario characterized by the number of sources and channels and the characteristics of the sources and the mixing process. In this ...paper, we introduce a general audio source separation framework based on a library of structured source models that enable the incorporation of prior knowledge about each source via user-specifiable constraints. While this framework generalizes several existing audio source separation methods, it also allows to imagine and implement new efficient methods that were not yet reported in the literature. We first introduce the framework by describing the model structure and constraints, explaining its generality, and summarizing its algorithmic implementation using a generalized expectation-maximization algorithm. Finally, we illustrate the above-mentioned capabilities of the framework by applying it in several new and existing configurations to different source separation problems. We have released a software tool named Flexible Audio Source Separation Toolbox (FASST) implementing a baseline version of the framework in Matlab.
We aim to assess the perceived quality of estimated source signals in the context of audio source separation. These signals may involve one or more kinds of distortions, including distortion of the ...target source, interference from the other sources or musical noise artifacts. We propose a subjective test protocol to assess the perceived quality with respect to each kind of distortion and collect the scores of 20 subjects over 80 sounds. We then propose a family of objective measures aiming to predict these subjective scores based on the decomposition of the estimation error into several distortion components and on the use of the PEMO-Q perceptual salience measure to provide multiple features that are then combined. These measures increase correlation with subjective scores up to 0.5 compared to nonlinear mapping of individual state-of-the-art source separation measures. Finally, we released the data and code presented in this paper in a freely available toolkit called PEASS.
General circulation models frequently suffer from a substantial cold bias in equatorial Pacific sea surface temperatures (SSTs). For instance, the majority of the climate models participating in the ...Coupled Model Intercomparison Project Phase 5 (CMIP5) have this particular problem (17 out of the 26 models evaluated in the present study). Here, we investigate the extent to which these equatorial cold biases are related to mean climate biases generated in the extra-tropics and then communicated to the equator via the oceanic subtropical cells (STCs). With an evident relationship across the CMIP5 models between equatorial SSTs and upper ocean temperatures in the extra-tropical subduction regions, our analysis suggests that cold SST biases within the extra-tropical Pacific indeed translate into a cold equatorial bias via the STCs. An assessment of the relationship between these extra-tropical SST biases and local surface heat flux components indicates a link to biases in the simulated shortwave fluxes. Further sensitivity studies with a climate model (CESM) in which extra-tropical cloud albedo is systematically varied illustrate the influence of cloud albedo perturbations, not only directly above the oceanic subduction regions but across the extra-tropics, on the equatorial bias. The CESM experiments reveal a quadratic relationship between extra-tropical Pacific albedo and the root-mean-square-error in equatorial SSTs—a relationship with which the CMIP5 models generally agree. Thus, our study suggests that one way to improve the equatorial cold bias in the models is to improve the representation of subtropical and mid-latitude cloud albedo.
► The paper reviews a recent distant microphone speech recognition evaluation that attracted participation from 13 entrants. ► The paper presents a comparative analysis of the recognition systems ...that were entered. ► Results of the automatic systems are present and compared to human performance. Common features of successful systems are identified. ► The paper concludes with a brief discussion of possible directions for future challenges.
Distant microphone speech recognition systems that operate with human-like robustness remain a distant goal. The key difficulty is that operating in everyday listening conditions entails processing a speech signal that is reverberantly mixed into a noise background composed of multiple competing sound sources. This paper describes a recent speech recognition evaluation that was designed to bring together researchers from multiple communities in order to foster novel approaches to this problem. The task was to identify keywords from sentences reverberantly mixed into audio backgrounds binaurally recorded in a busy domestic environment. The challenge was designed to model the essential difficulties of the multisource environment problem while remaining on a scale that would make it accessible to a wide audience. Compared to previous ASR evaluations a particular novelty of the task is that the utterances to be recognised were provided in a continuous audio background rather than as pre-segmented utterances thus allowing a range of background modelling techniques to be employed. The challenge attracted thirteen submissions. This paper describes the challenge problem, provides an overview of the systems that were entered and provides a comparison alongside both a baseline recognition system and human performance. The paper discusses insights gained from the challenge and lessons learnt for the design of future such evaluations.