In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We ...present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we compare CPM-2 with mT5 on downstream tasks. Experimental results show that CPM-2 has excellent general language intelligence. Moreover, we validate the efficiency of InfMoE when conducting inference of large-scale models having tens of billions of parameters on a single GPU. All source code and model parameters are available at https://github.com/TsinghuaAI/CPM.
Toroidal Alfven eigenmodes (TAEs) excited in purely ohmically heated plasmas without any auxiliary heating have been identified for the first time in the SUNIST spherical tokamak. The TAE modes are ...observed during minor disruptions and have a frequency range of 150-500 kHz. The mode structure analysis indicates the existence of both m/n=-3/-1 and -4/-1 harmonics, propagating in the electron diamagnetic direction in the laboratory frame of reference. These TAEs appear simultaneously with the generation of runaway electrons in the current quench phase, accompanying with the density sweeping during the minor disruption. Possible driving mechanisms and potential applications of these TAEs are discussed.