The Coronavirus Disease 2019 (COVID-19) pandemic continues to have a devastating effect on the health and well-being of the global population. A critical step in the fight against COVID-19 is ...effective screening of infected patients, with one of the key screening approaches being radiology examination using chest radiography. It was found in early studies that patients present abnormalities in chest radiography images that are characteristic of those infected with COVID-19. Motivated by this and inspired by the open source efforts of the research community, in this study we introduce COVID-Net, a deep convolutional neural network design tailored for the detection of COVID-19 cases from chest X-ray (CXR) images that is open source and available to the general public. To the best of the authors' knowledge, COVID-Net is one of the first open source network designs for COVID-19 detection from CXR images at the time of initial release. We also introduce COVIDx, an open access benchmark dataset that we generated comprising of 13,975 CXR images across 13,870 patient patient cases, with the largest number of publicly available COVID-19 positive cases to the best of the authors' knowledge. Furthermore, we investigate how COVID-Net makes predictions using an explainability method in an attempt to not only gain deeper insights into critical factors associated with COVID cases, which can aid clinicians in improved screening, but also audit COVID-Net in a responsible and transparent manner to validate that it is making decisions based on relevant information from the CXR images. By no means a production-ready solution, the hope is that the open access COVID-Net, along with the description on constructing the open source COVIDx dataset, will be leveraged and build upon by both researchers and citizen data scientists alike to accelerate the development of highly accurate yet practical deep learning solutions for detecting COVID-19 cases and accelerate treatment of those who need it the most.
This study tightly integrates complementary spectral and spatial features for deep learning based multi-channel speaker separation in reverberant environments. The key idea is to localize individual ...speakers so that an enhancement network can be trained on spatial as well as spectral features to extract the speaker from an estimated direction and with specific spectral structures. The spatial and spectral features are designed in a way such that the trained models are blind to the number of microphones and microphone geometry. To determine the direction of the speaker of interest, we identify time-frequency (T-F) units dominated by that speaker and only use them for direction estimation. The T-F unit level speaker dominance is determined by a two-channel chimera++ network, which combines deep clustering and permutation invariant training at the objective function level, and integrates spectral and interchannel phase patterns at the input feature level. In addition, T-F masking based beamforming is tightly integrated in the system by leveraging the magnitudes and phases produced by beamforming. Strong separation performance has been observed on reverberant talker-independent speaker separation, which separates reverberant speaker mixtures based on a random number of microphones arranged in arbitrary linear-array geometry.
This study proposes a complex spectral mapping approach for single- and multi-channel speech enhancement, where deep neural networks (DNNs) are used to predict the real and imaginary (RI) components ...of the direct-path signal from noisy and reverberant ones. The proposed system contains two DNNs. The first one performs single-channel complex spectral mapping. The estimated complex spectra are used to compute a minimum variance distortion-less response (MVDR) beamformer. The RI components of beamforming results, which encode spatial information, are then combined with the RI components of the mixture to train the second DNN for multi-channel complex spectral mapping. With estimated complex spectra, we also propose a novel method of time-varying beamforming. State-of-the-Art performance is obtained on the speech enhancement and recognition tasks of the CHiME-4 corpus. More specifically, our system obtains 6.82%, 3.19% and 1.99% word error rates (WER) respectively on the single-, two-, and six-microphone tasks of CHiME-4, significantly surpassing the current best results of 9.15%, 3.91% and 2.24% WER.
Deep neural network (DNN) based end-to-end optimization in the complex time-frequency (T-F) domain or time domain has shown considerable potential in monaural speech separation. Many recent studies ...optimize loss functions defined solely in the time or complex domain, without including a loss on magnitude. Although such loss functions typically produce better scores if the evaluation metrics are objective time-domain metrics, they however produce worse scores on speech quality and intelligibility metrics and usually lead to worse speech recognition performance, compared with including a loss on magnitude. While this phenomenon has been experimentally observed by many studies, it is often not accurately explained and there lacks a thorough understanding on its fundamental cause. This letter provides a novel view from the perspective of the implicit compensation between estimated magnitude and phase. Analytical results based on monaural speech separation and robust automatic speech recognition (ASR) tasks in noisy-reverberant conditions support the validity of our view.
Robustness against noise and reverberation is critical for ASR systems deployed in real-world environments. In robust ASR, corrupted speech is normally enhanced using speech separation or enhancement ...algorithms before recognition. This paper presents a novel joint training framework for speech separation and recognition. The key idea is to concatenate a deep neural network (DNN) based speech separation frontend and a DNN-based acoustic model to build a larger neural network, and jointly adjust the weights in each module. This way, the separation frontend is able to provide enhanced speech desired by the acoustic model and the acoustic model can guide the separation frontend to produce more discriminative enhancement. In addition, we apply sequence training to the jointly trained DNN so that the linguistic information contained in the acoustic and language models can be back-propagated to influence the separation frontend at the training stage. To further improve the robustness, we add more noise- and reverberation-robust features for acoustic modeling. At the test stage, utterance-level unsupervised adaptation is performed to adapt the jointly trained network by learning a linear transformation of the input of the separation frontend. The resulting sequence-discriminative jointly-trained multistream system with run-time adaptation achieves 10.63% average word error rate (WER) on the test set of the reverberant and noisy CHiME-2 dataset (task-2), which represents the best performance on this dataset and a 22.75% error reduction over the best existing method.
Background and Aims
Cancer‐associated fibroblasts (CAFs) are key players in multicellular, stromal‐dependent alterations leading to HCC pathogenesis. However, the intricate crosstalk between CAFs and ...other components in the tumor microenvironment (TME) remains unclear. This study aimed to investigate the cellular crosstalk among CAFs, tumor cells, and tumor‐associated neutrophils (TANs) during different stages of HCC pathogenesis.
Approach and Results
In the HCC‐TME, CAF‐derived cardiotrophin‐like cytokine factor 1 (CLCF1) increased chemokine (C‐X‐C motif) ligand 6 (CXCL6) and TGF‐β secretion in tumor cells, which subsequently promoted tumor cell stemness in an autocrine manner and TAN infiltration and polarization in a paracrine manner. Moreover, CXCL6 and TGF‐β secreted by HCC cells activated extracellular signal‐regulated kinase (ERK) 1/2 signaling of CAFs to produce more CLCF1, thus forming a positive feedback loop to accelerate HCC progression. Inhibition of ERK1/2 or CLCF1/ciliary neurotrophic factor receptor signaling efficiently impaired CLCF1‐mediated crosstalk among CAFs, tumor cells, and TANs both in vitro and in vivo. In clinical samples, up‐regulation of the CLCF1−CXCL6/TGF‐β axis exhibited a marked correlation with increased cancer stem cells, “N2”‐polarized TANs, tumor stage, and poor prognosis.
Conclusions
This study reveals a cytokine‐mediated cellular crosstalk and clinical network involving the CLCF1−CXCL6/TGF‐β axis, which regulates the positive feedback loop among CAFs, tumor stemness, and TANs, HCC progression, and patient prognosis. These results may support the CLCF1 cascade as a potential prognostic biomarker and suggest that selective blockade of CLCF1/ciliary neurotrophic factor receptor or ERK1/2 signaling could provide an effective therapeutic target for patients with HCC.
Deep learning-based time-frequency (T-F) masking has dramatically advanced monaural (single-channel) speech separation and enhancement. This study investigates its potential for direction of arrival ...(DOA) estimation in noisy and reverberant environments. We explore ways of combining T-F masking and conventional localization algorithms, such as generalized cross correlation with phase transform, as well as newly proposed algorithms based on steered-response SNR and steering vectors. The key idea is to utilize deep neural networks (DNNs) to identify speech dominant T-F units containing relatively clean phase for DOA estimation. Our DNN is trained using only monaural spectral information, and this makes the trained model directly applicable to arrays with various numbers of microphones arranged in diverse geometries. Although only monaural information is used for training, experimental results show strong robustness of the proposed approach in new environments with intense noise and room reverberation, outperforming traditional DOA estimation methods by large margins. Our study also suggests that the ideal ratio mask and its variants remain effective training targets for robust speaker localization.
The development of circularly polarized electroluminescence (CPEL) is currently hampered by the high difficulty and cost in the syntheses of suitable chiral materials and the notorious chirality ...diminishment issue in electrical devices. Herein, diastereomeric IrIII and RuII complexes with chiral (±)‐camphorsulfonate counteranions are readily synthesized and used as the active materials in circularly polarized light‐emitting electrochemical cells to generate promising CPELs. The addition of the chiral ionic liquid (±)‐1‐butyl‐3‐methylimidazole camphorsulfonate into the active layer significantly improves the device performance and the electroluminescence dissymmetry factors (≈10−3), in stark contrast to the very weak circularly polarized photoluminescence of the spin‐coated films of these diastereomeric complexes. Control experiments with enantiopure IrIII complexes suggest that the chiral anions play a dominant role in the electrically‐induced amplification of CPELs.
By using a chiral anion strategy, electrically amplified circularly polarized luminescence (CPL) is realized in light‐emitting electrochemical cells of readily obtained diastereomeric ionic transition‐metal complexes. The addition of a chiral ionic liquid (IL) into the active layer significantly improves the device performance and the electroluminescence dissymmetry factors.
We propose mixture to mixture (M2M) training, a weakly-supervised neural speech separation algorithm that leverages close-talk mixtures as a weak supervision for training discriminative models to ...separate far-field mixtures. Our idea is that, for a target speaker, its close-talk mixture has a much higher signal-to-noise ratio (SNR) of the target speaker than any far-field mixtures, and hence could be utilized to design a weak supervision for separation. To realize this, at each training step we feed a far-field mixture to a deep neural network (DNN) to produce an intermediate estimate for each speaker, and, for each of considered close-talk and far-field microphones, we linearly filter the DNN estimates and optimize a loss so that the filtered estimates of all the speakers can sum up to the mixture captured by each of the considered microphones. Evaluation results on a 2-speaker separation task in simulated reverberant conditions show that M2M can effectively leverage close-talk mixtures as a weak supervision for separating far-field mixtures.
In real-world situations, speech reaching our ears is commonly corrupted by both room reverberation and background noise. These distortions are detrimental to speech intelligibility and quality, and ...also pose a serious problem to many speech-related applications, including automatic speech and speaker recognition. In order to deal with the combined effects of noise and reverberation, we propose a two-stage strategy to enhance corrupted speech, where denoising and dereverberation are conducted sequentially using deep neural networks. In addition, we design a new objective function that incorporates clean phase during model training to better estimate spectral magnitudes, which would in turn yield better phase estimates when combined with iterative phase reconstruction. The two-stage model is then jointly trained to optimize the proposed objective function. Systematic evaluations and comparisons show that the proposed algorithm improves objective metrics of speech intelligibility and quality substantially, and significantly outperforms previous one-stage enhancement systems.