Abstract Gamma-ray bursts (GRBs), due to their high luminosities, are detected up to a redshift of 10, and thus have the potential to be vital cosmological probes of early processes in the Universe. ...Fulfilling this potential requires a large sample of GRBs with known redshifts, but due to observational limitations, only 11% have known redshifts ( z ). There have been numerous attempts to estimate redshifts via correlation studies, most of which have led to inaccurate predictions. To overcome this, we estimated GRB redshift via an ensemble-supervised machine-learning (ML) model that uses X-ray afterglows of long-duration GRBs observed by the Neil Gehrels Swift Observatory. The estimated redshifts are strongly correlated (a Pearson coefficient of 0.93) and have an rms error, namely, the square root of the average squared error 〈Δ z 2 〉, of 0.46 with the observed redshifts showing the reliability of this method. The addition of GRB afterglow parameters improves the predictions considerably by 63% compared to previous results in peer-reviewed literature. Finally, we use our ML model to infer the redshifts of 154 GRBs, which increase the known redshifts of long GRBs with plateaus by 94%, a significant milestone for enhancing GRB population studies that require large samples with redshift.
Recent alignment algorithms such as direct preference optimization (DPO) have been developed to improve the safety of large language models (LLMs) by training these models to match human behaviors ...exemplified by preference data. However, these methods are both computationally intensive and lacking in controllability and transparency, making them prone to jailbreaking and inhibiting their widespread use. Furthermore, these tuning-based methods require large-scale preference data for training and are susceptible to noisy preference data. In this paper, we introduce a tuning-free alignment alternative (DeTox) and demonstrate its effectiveness under the use case of toxicity reduction. Grounded on theory from factor analysis, DeTox is a sample-efficient model editing approach that identifies a toxic subspace in the model parameter space and reduces model toxicity by projecting away the detected subspace. The toxic sub-space is identified by extracting preference data embeddings from the language model, and removing non-toxic information from these embeddings. We show that DeTox is more sample-efficient than DPO, further showcasing greater robustness to noisy data. Finally, we establish both theoretical and empirical connections between DeTox and DPO, showing that DeTox can be interpreted as a denoised version of a single DPO step.
Suppose \(X\) and \(Y\) are \(p\times n\) matrices each with mean \(0\), variance \(1\) and where all moments of any order are uniformly bounded as \(p,n \to \infty\). Moreover, the entries ...\((X_{ij}, Y_{ij})\) are independent across \(i,j\) with a common correlation \(\rho\). Let \(C=n^{-1}XY^*\) be the sample cross-covariance matrix. We show that if \(n, p\to \infty, p/n\to y\neq 0\), then \(C\) converges in the algebraic sense and the limit moments depend only on \(\rho\). Independent copies of such matrices with same \(p\) but different \(n\), say \(\{n_l\}\), different correlations \(\{\rho_l\}\), and different non-zero \(y\)'s, say \(\{y_l\}\) also converge jointly and are asymptotically free. When \(y=0\), the matrix \(\sqrt{np^{-1}}(C-\rho I_p)\) converges to an elliptic variable with parameter \(\rho^2\). In particular, this elliptic variable is circular when \(\rho=0\) and is semi-circular when \(\rho=1\). If we take independent \(C_l\), then the matrices \(\{\sqrt{n_lp^{-1}}(C_l-\rho_l I_p)\}\) converge jointly and are also asymptotically free. As a consequence, the limiting spectral distribution of any symmetric matrix polynomial exists and has compact support.
New satellite sensors will soon make it possible to estimate field-level crop yields, showing a great potential for agricultural index insurance. This paper identifies an important threat to better ...insurance from these new technologies: data with many fields and few years can yield downward biased estimates of basis risk, a fundamental metric in index insurance. To demonstrate this bias, we use state-of-the-art satellite-based data on agricultural yields in the US and in Kenya to estimate and simulate basis risk. We find a substantive downward bias leading to a systematic overestimation of insurance quality. In this paper, we argue that big data in crop insurance can lead to a new situation where the number of variables \(N\) largely exceeds the number of observations \(T\). In such a situation where \(T\ll N\), conventional asymptotics break, as evidenced by the large bias we find in simulations. We show how the high-dimension, low-sample-size (HDLSS) asymptotics, together with the spiked covariance model, provide a more relevant framework for the \(T\ll N\) case encountered in index insurance. More precisely, we derive the asymptotic distribution of the relative share of the first eigenvalue of the covariance matrix, a measure of systematic risk in index insurance. Our formula accurately approximates the empirical bias simulated from the satellite data, and provides a useful tool for practitioners to quantify bias in insurance quality.
Gamma-Ray Bursts (GRBs), due to their high luminosities are detected up to
redshift 10, and thus have the potential to be vital cosmological probes of
early processes in the universe. Fulfilling this ...potential requires a large
sample of GRBs with known redshifts, but due to observational limitations, only
11\% have known redshifts ($z$). There have been numerous attempts to estimate
redshifts via correlation studies, most of which have led to inaccurate
predictions. To overcome this, we estimated GRB redshift via an ensemble
supervised machine learning model that uses X-ray afterglows of long-duration
GRBs observed by the Neil Gehrels Swift Observatory. The estimated redshifts
are strongly correlated (a Pearson coefficient of 0.93) and have a root mean
square error, namely the square root of the average squared error
$\langle\Delta z^2\rangle$, of 0.46 with the observed redshifts showing the
reliability of this method. The addition of GRB afterglow parameters improves
the predictions considerably by 63\% compared to previous results in
peer-reviewed literature. Finally, we use our machine learning model to infer
the redshifts of 154 GRBs, which increase the known redshifts of long GRBs with
plateaus by 94\%, a significant milestone for enhancing GRB population studies
that require large samples with redshift.
The proliferation of generative models, combined with pretraining on web-scale data, raises a timely question: what happens when these models are trained on their own generated outputs? Recent ...investigations into model-data feedback loops proposed that such loops would lead to a phenomenon termed model collapse, under which performance progressively degrades with each model-data feedback iteration until fitted models become useless. However, those studies largely assumed that new data replace old data over time, where an arguably more realistic assumption is that data accumulate over time. In this paper, we ask: what effect does accumulating data have on model collapse? We empirically study this question by pretraining sequences of language models on text corpora. We confirm that replacing the original real data by each generation's synthetic data does indeed tend towards model collapse, then demonstrate that accumulating the successive generations of synthetic data alongside the original real data avoids model collapse; these results hold across a range of model sizes, architectures, and hyperparameters. We obtain similar results for deep generative models on other types of real data: diffusion models for molecule conformation generation and variational autoencoders for image generation. To understand why accumulating data can avoid model collapse, we use an analytically tractable framework introduced by prior work in which a sequence of linear models are fit to the previous models' outputs. Previous work used this framework to show that if data are replaced, the test error increases with the number of model-fitting iterations; we extend this argument to prove that if data instead accumulate, the test error has a finite upper bound independent of the number of iterations, meaning model collapse no longer occurs.
A
bstract
We employ a hybrid approach in determining the anomalous dimension and OPE coefficient of higher spin operators in the Wilson-Fisher theory. First we do a large spin analysis for CFT data ...where we use results obtained from the usual and the Mellin bootstrap and also from Feynman diagram literature. This gives new predictions at
O
(
ϵ
4
) and
O
(
ϵ
5
) for anomalous dimensions and OPE coefficients, and also provides a cross-check for the results from Mellin bootstrap. These higher orders get contributions from all higher spin operators in the crossed channel. We also use the bootstrap in Mellin space method for
ϕ
3
in
d
= 6 −
ϵ
CFT where we calculate general higher spin OPE data. We demonstrate a higher loop order calculation in this approach by summing over contributions from higher spin operators of the crossed channel in the same spirit as before.
More on analytic bootstrap for O(N) models Dey, Parijat; Kaviraj, Apratim; Sen, Kallol
The journal of high energy physics,
06/2016, Letnik:
2016, Številka:
6
Journal Article
Recenzirano
Odprti dostop
A
bstract
This note is an extension of a recent work on the analytical bootstrapping of O(
N
) models. An additonal feature of the O(
N
) model is that the OPE contains trace and antisymmetric ...operators apart from the symmetric-traceless objects appearing in the OPE of the singlet sector. This in addition to the stress tensor (
T
μν
) and the
ϕ
i
ϕ
i
scalar, we also have other minimal twist operators as the spin-1 current
J
μ
and the symmetric-traceless scalar in the case of O(
N
). We determine the effect of these additional objects on the anomalous dimensions of the corresponding trace, symmetric-traceless and antisymmetric operators in the large spin sector of the O(
N
) model, in the limit when the spin is much larger than the twist. As an observation, we also verified that the leading order results for the large spin sector from the ϵ−expansion are an exact match with our
n
= 0 case. A plausible holographic setup for the special case when
N
= 2 is also mentioned which mimics the calculation in the CFT.