The currently available compilation techniques are for general computing and are not optimized for physical layer computing in 5G micro base stations. In such cases, the foreseeable data sizes and ...small code size are application specific opportunities for baseband algorithm optimizations. Therefore, the special attention can be paid, for example, the specific register allocation algorithm has not been studied so far. The compilation for kernel sub-routines of baseband in 5G micro base stations is our focusing point. For applications of known and fixed data size, we proposed a compilation scheme of parallel data accessing, while operands can be mainly allocated and stored in registers. Based on a small register group (48×32b), the target of our compilation scheme is the optimization of baseband algorithms based on 4×4 or smaller matrices, maximizing the utilization of register files, and eliminating the extra register data exchanging. Meanwhile, when data is allocated into register files, we used VLIW (Very Long Instruction Word) machine to hide the time of data accessing and minimize the cost of data accessing, thus the total execution time is minimum. Experiments indicate that for algorithms with small data size, the cost of data accessing and extra addressing can be minimized.
The performance of the ASR system is unsatisfactory in a low-resource environment. In this paper, we investigated the effectiveness of three approaches to improve the performance of the acoustic ...models in low-resource environments. They are Mono-and-triphone Learning, Soft One-hot Label and Feature Combinations. We applied these three methods to the network architecture and compared their results with baselines. Our proposal has achieved remarkable improvement in the task of mandarin speech recognition in the hybrid hidden Markov model - neural network approach on phoneme level. In order to verify the generalization ability of our proposed method, we conducted many comparative experiments on DNN, RNN, LSTM and other network structures. The experimental results show that our method is applicable to almost all currently widely used network structures. Compared to baselines, our proposals achieved an average relative Character Error Rate (CER) reduction of 8.0%. In our experiments, the size of training data is ~10 hours, and we did not use data augmentation or transfer learning methods, which means that we did not use any additional data.
In this paper, we extend polar decoding function to our previous design, and propose a flexible quad-mode forward error correction application specific instruction-set processor (QFEC ASIP) that ...supports polar, low-density parity-check (LDPC), turbo, and convolutional code (CC) decoding with multiple code lengths and code rates. A unified polar/LDPC/turbo/CC quad-mode algorithm framework is presented. The top level architecture of QFEC ASIP and the polar data path are designed on the basis of the algorithm framework. A quad-mode confliction-free global memory system is proposed. 65.2% of global memory banks, 48.9% of global memory bits, and 29.7% of global memory area are saved via hardware sharing. Specially accelerated FEC decoding instructions make the decoding procedure fully programmable and ensure the high throughput. Synthesis using 65-nm technology shows that the total area of QFEC ASIP is 4.26 mm 2 . QFEC ASIP provides the maximum throughput of 1345 Mb/s for polar, 917 Mb/s for LDPC (WiMAX), 320 Mb/s for turbo, and 387 Mb/s for CC (64 states) at the clock frequency of 344 MHz. QFEC ASIP occupies much smaller silicon area than the sum of the silicon area of 4 single-mode FEC decoders that together provide a similar function range as QFEC ASIP.
In Automatic Speech Recognition (ASR), transcribed data take substantial effort to obtain. It is worthwhile to explore how to selective the samples with more information from un-transcribed datapool ...to get a better model with the limited cost. Therefore, active learning in ASR becomes a research topic. In this manuscript, we proposed two new methods of active learning. One is Signal-Model Committee Approach (SMCA) and the other is LM-based Certainty Approach (LMCA). These two methods respectively evaluate the information amount of samples from different angles and can be applied together for joint sampling in some scenarios. We conducted many comparative experiments on Listen, Attend and Spell (LAS) model according to different demands. In experiments, we compared our approach with the random sampling and another state-of-the-art committee-based approach: heterogeneous neural networks (HNN) based approach. We examined our approach in CER in Chinese Mandarin speech recognition task. The results show that proposed approach is not only simple to use, but also has the best performance.
A high-throughput programmable fast Fourier transform (FFT) processor is designed supporting 16- to 4096-point FFTs and 12- to 2400-point discrete Fourier transforms (DFTs) for 4G, wireless local ...area network, and future 5G. A 16-path data parallel memory-based architecture is selected as a tradeoff between throughput and cost. To implement a hardware-efficient high-speed processor, several improvements are provided. To maximally reuse the hardware resource, a reconfigurable butterfly unit is proposed to support computing including eight radix-2 in parallel, four radix-3/4 in parallel, two radix-5/8 in parallel, and a radix-16 in one clock cycle. Twiddle factor multipliers using different schemes are optimized and compared, wherein modified coordinate rotation digital computer scheme is finally implemented to minimize the hardware cost while supporting both FFTs and DFTs. An optimized conflict-free data access scheme is also proposed to support multiple butterflies at any radices. The processor is designed as a general IP and can be implemented using a processor synthesizer (application-specific instruction-set processor designer). The electronic design automation synthesis result based on a 65-nm technology shows that the processor area is 1.46 mm 2 . The processor supports 972 MS/s 4096-point FFT at 250 MHz with a power consumption of 68.64 mW and a signal-to-quantization-noise ratio of 66.1 dB. The proposed processor has better-normalized throughput per area unit than the state-of-the-art available designs.
Speaker verification models have achieved good results on the single genre data. But the performance degrades when model training and testing are not in the same domain. The adversarial training ...method is proposed to solve this problem by minimizing domain distribution differences. However, the adversarial training ignores domain‐specific information for the domain‐invariant speaker representations. In this paper, an improved collaborative adversarial network for domain adaptation in speaker verification is performed. Compared to the adversarial training, a collaborative discriminator is newly incorporated that learns domain‐specific information at the lower layers. Further, the projection block is added to the collaborative discriminator. It reduces the noise introduced by the collaborative discriminator. Experiments are conducted in different mismatch scenarios and using different speaker encoders. All the experimental results show that the performance of this method is better than the baseline and previous work using adversarial training.
This work can extract better speaker representations that are both domain‐ invariant and domain‐specific. The proposed collaborative discriminator enables the speaker encoder to learn domain‐specific information, which is beneficial for adversarial training. Further, the projection block is designed to reduce the noise introduced by the collaborative discriminator.
A design space exploration methodology of 1-D FFT processor is proposed to find the best hardware architecture in a quantitative way during early design. The methodology includes architecture ...candidate collection, coarse-grained architecture selection, and circuit level design optimizations. We show how to select a better architecture from candidates including different architectures (SDF, SDC, MDF, MDC and memory-based) with different degree of parallelism at different radices. The sub-level designs, including designs of rotator and data scaling module, are introduced for further optimizations. As a proof of concept, an FFT processor for 4G, WLAN and future 5G is designed supporting 16-4096 and 12-2400 point FFTs. Memory-based architecture with 16-datapath mixed-radix butterfly unit is selected to satisfy the demands for 1GS/s (4096) throughput. The synthesis result based on 65nm technology shows that the silicon cost and power consumption are 1.46mm2 and 68.64mW respectively. The proposed processor has better normalized throughput per area unit and normalized FFTs per energy unit than the state of the art available designs.
Image recognition and object detection of colorectal cancer cell are significant in the medical community. It can help the doctors to diagnose the disease, assess the patient's cure status, and ...choose a suitable treatment plan. At present, object detection methods based on neural networks have been widely used. However, when applying to the colorectal cancer cell' image object detection, it has a shortage due to the complex background and small object cells. Because of this problem, this paper proposes an improved Faster R-CNN algorithm with multi-scale detection and a multi loss function. In our algorithm, during the feature extraction period, multi-scale detection is not only retaining the semantic information but also the details of the edges and textures of the image. During the recognition period, the discriminative features of cells can be learned from the multi loss in the complex background. The experiment result shows that the accuracy of the improved Faster R-CNN is 2.4% higher than that of the Faster R-CNN.
Transfer Learning for Air Traffic Control LVCSR System Jiawen Wang; Shaohan Liu; Qun Yang
2017 Second International Conference on Mechanical, Control and Computer Engineering (ICMCCE),
2017-Dec.
Conference Proceeding
In order to reduce the accidents due to errors of Air Traffic Control (ATC) directives and do responsibility investigation, it's necessary to recognize the audio of the ATC directives into texts. ...However, the existing Automatic Speech Recognition (ASR) systems are aimed for isolated word recognition, which can't apply to LVCSR. Thus, we analyze the characteristics of ATC directives and develop Large Vocabulary Continuous Speech Recognition (LVCSR) for it. In addition, to solve the issue, that the data of ATC directives are scarce, we proposed a new crosslingual knowledge transfer learning method, i.e. semi-shared-hidden layers crosslingual (Semi-SHL-CDNN). We demonstrate that the Semi-SHL-CDNN can reduce errors by 16.76%, relatively, over monolingual DNNs. Compared with SHL-MDNN, the WER is reduced by 1.38% extra.
Currently, most of the speech enhancement methods can't address the performance degradation problem caused by low signal-to-noise ratios (SNR) and non-stationary noises. For better speech enhancement ...at the above scenarios, this paper proposes a two-stage method that fuses DCCRN and SubNet. Compared with the single stage-stage networks, two-stage networks have more powerful mapping capabilities. This paper uses complex-valued spectrogram as the training target. In the first stage, the DCCRN takes the magnitude and phase as input and estimates corresponding target of clean speech. By simulating the complex-valued operation, the DCCRN can train the complex target effectively. However, it is still difficult to handle the low SNR and non-stationary noises. This paper uses the SubNet as the second stage network for better speech enhancement. In the second stage, the SubNet further refines the magnitude of target frequency by exploiting the context frequencies. Its input is consisted of magnitude of target frequency and several context frequencies. The output is the estimation of the clean speech magnitude target for the corresponding frequency. The experimental results show that the proposed method obtains better performance than other baseline models in terms of PESQ, STOI and SI-SDR.