A search for the rare decays $B^0_s \to\mu^+\mu^-$ and $B^0 \to\mu^+\mu^-$ is performed at the LHCb experiment. The data analysed correspond to an integrated luminosity of 1 fb$^{-1}$ of $pp$ ...collisions at a centre-of-mass energy of 7 TeV and 2 fb$^{-1}$ at 8 TeV. An excess of $B^0_s \to\mu^+\mu^-$ signal candidates with respect to the background expectation is seen with a significance of 4.0 standard deviations. A time-integrated branching fraction of ${\cal B}(B^0_s \to\mu^+\mu^-) = (2.9^{+1.1}_{-1.0})\times 10^{-9}$ is obtained and an upper limit of ${\cal B}(B^0 \to\mu^+\mu^-) < 7.4\times 10^{-10}$ at 95% confidence level is set. These results are consistent with the Standard Model expectations.
This paper aims to understand the impacts of various data combinations (e.g., web text, wikipedia, github, books) on the training of large language models using SlimPajama. SlimPajama is a rigorously ...deduplicated, multi-source dataset, which has been refined and further deduplicated to 627B tokens from the extensive 1.2T tokens RedPajama dataset contributed by Together. We've termed our research as SlimPajama-DC, an empirical analysis designed to uncover fundamental characteristics and best practices associated with employing SlimPajama in the training of large language models. During our research with SlimPajama, two pivotal observations emerged: (1) Global deduplication vs. local deduplication. We analyze and discuss how global (across different sources of datasets) and local (within the single source of dataset) deduplications affect the performance of trained models. (2) Proportions of high-quality/highly-deduplicated multi-source datasets in the combination. To study this, we construct six configurations of SlimPajama dataset and train individual ones using 1.3B Cerebras-GPT model with Alibi and SwiGLU. Our best configuration outperforms the 1.3B model trained on RedPajama using the same number of training tokens by a significant margin. All our 1.3B models are trained on Cerebras 16\(\times\) CS-2 cluster with a total of 80 PFLOP/s in bf16 mixed precision. We further extend our discoveries (such as increasing data diversity is crucial after global deduplication) on a 7B model with large batch-size training. Our models and the separate SlimPajama-DC datasets are available at: https://huggingface.co/MBZUAI-LLM and https://huggingface.co/datasets/cerebras/SlimPajama-627B.
This paper proposes an efficient federated distillation learning system (EFDLS) for multi-task time series classification (TSC). EFDLS consists of a central server and multiple mobile users, where ...different users may run different TSC tasks. EFDLS has two novel components, namely a feature-based student-teacher (FBST) framework and a distance-based weights matching (DBWM) scheme. Within each user, the FBST framework transfers knowledge from its teacher's hidden layers to its student's hidden layers via knowledge distillation, with the teacher and student having identical network structure. For each connected user, its student model's hidden layers' weights are uploaded to the EFDLS server periodically. The DBWM scheme is deployed on the server, with the least square distance used to measure the similarity between the weights of two given models. This scheme finds a partner for each connected user such that the user's and its partner's weights are the closest among all the weights uploaded. The server exchanges and sends back the user's and its partner's weights to these two users which then load the received weights to their teachers' hidden layers. Experimental results show that the proposed EFDLS achieves excellent performance on a set of selected UCR2018 datasets regarding top-1 accuracy.
The decay $B_c\rightarrow J/\psi K^+ K^- \pi^+$ is observed for the first time, using proton-proton collisions collected with the LHCb detector corresponding to an integrated luminosity of ...3fb$^{-1}$. A signal yield of $78\pm14$ decays is reported with a significance of 6.2 standard deviations. The ratio of the branching fraction of $\B_c \rightarrow J/\psi K^+ K^- \pi^+$ decays to that of $B_c \rightarrow J/\psi \pi^+$ decays is measured to be $0.53\pm 0.10\pm0.05$, where the first uncertainty is statistical and the second is systematic.
The differential branching fraction of the decay $\Lambda_b^0\rightarrow\Lambda\mu^+\mu^-$ is measured as a function of the square of the dimuon invariant mass, $q^2$. A yield of $78\pm12$ ...$\Lambda_b^0\rightarrow\Lambda\mu^+\mu^-$ decays is observed using data, corresponding to an integrated luminosity of 1.0\,fb$^{-1}$, collected by the LHCb experiment at a centre-of-mass energy of 7\,TeV. A significant signal is found in the $q^2$ region above the square of the $J/\psi$ mass, while at lower-$q^2$ values upper limits are set on the differential branching fraction. Integrating the differential branching fraction over $q^2$, while excluding the $J/\psi$ and $\psi(2S)$ regions, gives a branching fraction of $\BF($\Lambda_b^0\rightarrow\Lambda\mu^+\mu^-$)=(0.96\pm 0.16\stat\pm 0.13\syst\pm 0.21 (\mathrm{norm}))\times 10^{-6}$, where the uncertainties are statistical, systematic and due to the normalisation mode, $$\Lambda_b^0\rightarrow J/psi\Lambda$, respectively.
Charged particle multiplicities are studied in proton-proton collisions in the forward region at a centre-of-mass energy of Formula: see textTeV with data collected by the LHCb detector. The forward ...spectrometer allows access to a kinematic range of Formula: see text in pseudorapidity, momenta greater than Formula: see text and transverse momenta greater than Formula: see text. The measurements are performed using events with at least one charged particle in the kinematic acceptance. The results are presented as functions of pseudorapidity and transverse momentum and are compared to predictions from several Monte Carlo event generators.
The lifetime of the Formula: see text meson is measured using semileptonic decays having a Formula: see text meson and a muon in the final state. The data, corresponding to an integrated luminosity ...of Formula: see text, are collected by the LHCb detector in Formula: see text collisions at a centre-of-mass energy of 8 TeV. The measured lifetime is Formula: see textwhere the first uncertainty is statistical and the second is systematic.
The first observation of the decay $B^0_s\rightarrow\chi_{c1}\phi$ and a study of $B^0\rightarrow\chi_{c1,2}K^{*0}$ decays are presented. The analysis is performed using a dataset, corresponding to ...an integrated luminosity of 1.0 fb$^{-1}$, collected by the LHCb experiment in pp collisions at a centre-of-mass energy of 7 TeV. The following ratios of branching fractions are measured: \begin{equation*} \begin{array}{lll} \dfrac{\cal{B}(B^0_s\rightarrow\chi_{c1}\phi)}{\cal{B}(B^0_s\rightarrow J/\psi\phi)} &=& (18.9 \pm1.8\,(stat)\pm1.3\,(syst)\pm0.8\,(\cal{B})) \times 10^{-2}, \nonumber \\ \noalign{\vskip 3pt} \dfrac{\cal{B}(B^0\rightarrow\chi_{c1}K^{*0})}{\cal{B}(B^0\rightarrow J/\psiK^{*0})} &=& (19.8 \pm1.1\,(stat)\pm1.2\,(syst)\pm0.9\,(\cal{B})) \times 10^{-2}, \nonumber \\ \noalign{\vskip 3pt} \dfrac{\cal{B}(B^0\rightarrow\chi_{c2}K^{*0})}{\cal{B}(B^0\rightarrow\chi_{c 1}K^{*0})} &=& (17.1 \pm5.0\,(stat)\pm1.7\,(syst)\pm1.1\,(\cal{B})) \times 10^{-2}, \nonumber \\ \noalign{\vskip 3pt} \end{array} \end{equation*} \noindent where the third uncertainty is due to the limited knowledge of the branching fractions of ${\chi_{c}\rightarrow J/\psi \gamma}$ modes.
The results of a search for the rare two-body charmless baryonic decays $B^0 \to p \bar{p}$ and $B_s^0 \to p \bar{p}$ are reported. The analysis uses a data sample, corresponding to an integrated ...luminosity of 0.9 fb$^{-1}$, of $pp$ collision data collected by the LHCb experiment at a centre-of-mass energy of 7 TeV. An excess of $B^0 \to p \bar{p}$ candidates with respect to background expectations is seen with a statistical significance of 3.3 standard deviations. This is the first evidence for a two-body charmless baryonic $B^0$ decay. No significant $B_s^0 \to p \bar{p}$ signal is observed, leading to an improvement of three orders of magnitude over previous bounds. If the excess events are interpreted as signal, the 68.3% confidence level intervals on the branching fractions are {eqnarray} \cal{B}(B^0 \to p \bar{p}) & = & (1.47 \,^{+0.62}_{-0.51} \,^{+0.35}_{-0.14}) \times 10^{-8} \,, *{0.3cm} \cal{B}(B_s^0 \to p \bar{p}) & = & (2.84 \,^{+2.03}_{-1.68} \,^{+0.85}_{-0.18}) \times 10^{-8} \,, {eqnarray} where the first uncertainty is statistical and the second is systematic.
Fusing a high-spatial-resolution (HR) multi-spectral image (MSI) with a low-spatial resolution (LR) hyperspectral image (HSI) provides an effective way for HSI super-resolution (SR). Although recent ...deep neural network-based methods have shown pleasing fusion performance, most of them assume both input images for fusion to be clean without any noise corruption. When random noise exists in real applications, their performance drops greatly. To mitigate this problem, we present a U-shape spectral transformer for robust fusion-based HSI SR, which mainly contributes in the following three aspects. 1) A two-stage network is established to end-to-end denoise both input images and fuse them for SR. 2) A U-shape spectral transformer is constructed to simultaneously exploit the multi-scale spatial information and the long-range correlation in spectral domain, which enables sufficiently fusing the supplementary spatial-spectral information in both input images for accurate HSI SR. 3) A mutual information maximization based loss is composed with the conventional reconstruction loss to more accurately supervise the training process, thus further enhance the performance. Experimental results on two datasets demonstrate the efficacy of the proposed method in terms of HSI SR under different levels of noise corruption.