This paper introduces general mathematical techniques for stable long-term forecasts with calibrated uncertainty measures. For most time series models, the difficulty of obtaining accurate ...probabilistic future time step predictions increases with the prediction horizon. We propose a surprisingly simple class of models that characterizes time-varying distributions and enables reasonably accurate predictions thousands of time steps into the future. This technique, called Deep Probabilistic Koopman (DPK), is based on recent advances in linear Koopman operator theory and does not require time stepping for future time predictions. We demonstrate the long-term forecasting performance of these models on a diversity of domains, including electricity demand forecasting, atmospheric chemistry, and neuroscience. Our domain-agnostic technique outperforms all 177 domain-specific competitors in the most recent Global Energy Forecasting Competition for electricity demand modelling.
Abstract
In many scenarios, it is necessary to monitor a complex system via a time-series of observations and determine when anomalous exogenous events have occurred so that relevant actions can be ...taken. Determining whether current observations are abnormal is challenging. It requires learning an extrapolative probabilistic model of the dynamics from historical data, and using a limited number of current observations to make a classification. We leverage recent advances in long-term probabilistic forecasting, namely
Deep Probabilistic Koopman
, to build a general method for classifying anomalies in multi-dimensional time-series data. We also show how to utilize models with domain knowledge of the dynamics to reduce type I and type II error. We demonstrate our proposed method on the important real-world task of global atmospheric pollution monitoring, integrating it with NASA’s Global Earth Observing System Model. The system successfully detects localized anomalies in air quality due to events such as COVID-19 lockdowns and wildfires.
Eliciting Latent Knowledge (ELK) aims to find patterns in a capable neural network's activations that robustly track the true state of the world, especially in hard-to-verify cases where the model's ...output is untrusted. To further ELK research, we introduce 12 datasets and a corresponding suite of "quirky" language models (LMs) that are finetuned to make systematic errors when answering questions if and only if the keyword "Bob" is present in the prompt. We find that, especially in middle layers, linear probes usually report an LM's knowledge independently of what the LM outputs, enabling us to elicit the correct answer despite the model's untruthful output. The best probing method (logistic regression on contrast pairs) recovers 89% of the gap in AUROC between truthful and untruthful contexts, and 75% for questions harder than those used to train the probe. We also find that a mechanistic anomaly detection approach can flag untruthful behavior with 0.95 AUROC. Our results show promise for eliciting reliable knowledge from capable but untrusted models, and facilitates future research empirically investigating ELK methods.
The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present ...compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later. We also extend the DSB to discrete domains by proving an equivalence between token \(n\)-gram frequencies and the moments of embedding vectors, and by finding empirical evidence for the bias in LLMs. Finally we use optimal transport methods to surgically edit the low-order statistics of one class to match those of another, and show that early-training networks treat the edited samples as if they were drawn from the target class. Code is available at https://github.com/EleutherAI/features-across-time.
Despite their impressive performance on diverse tasks, large language models (LMs) still struggle with tasks requiring rich world knowledge, implying the limitations of relying solely on their ...parameters to encode a wealth of world knowledge. This paper aims to understand LMs' strengths and limitations in memorizing factual knowledge, by conducting large-scale knowledge probing experiments of 10 models and 4 augmentation methods on PopQA, our new open-domain QA dataset with 14k questions. We find that LMs struggle with less popular factual knowledge, and that scaling fails to appreciably improve memorization of factual knowledge in the long tail. We then show that retrieval-augmented LMs largely outperform orders of magnitude larger LMs, while unassisted LMs remain competitive in questions about high-popularity entities. Based on those findings, we devise a simple, yet effective, method for powerful and efficient retrieval-augmented LMs, which retrieves non-parametric memories only when necessary. Experimental results show that this significantly improves models' performance while reducing the inference costs.
Probabilistic forecasting of complex phenomena is paramount to various scientific disciplines and applications. Despite the generality and importance of the problem, general mathematical techniques ...that allow for stable long-term forecasts with calibrated uncertainty measures are lacking. For most time series models, the difficulty of obtaining accurate probabilistic future time step predictions increases with the prediction horizon. In this paper, we introduce a surprisingly simple approach that characterizes time-varying distributions and enables reasonably accurate predictions thousands of timesteps into the future. This technique, which we call Deep Probabilistic Koopman (DPK), is based on recent advances in linear Koopman operator theory, and does not require time stepping for future time predictions. Koopman models also tend to have a small parameter footprint (often less than 10,000 parameters). We demonstrate the long-term forecasting performance of these models on a diversity of domains, including electricity demand forecasting, atmospheric chemistry, and neuroscience. For electricity demand modeling, our domain-agnostic technique outperforms all of 177 domain-specific competitors in the most recent Global Energy Forecasting Competition.
In many scenarios, it is necessary to monitor a complex system via a time-series of observations and determine when anomalous exogenous events have occurred so that relevant actions can be taken. ...Determining whether current observations are abnormal is challenging. It requires learning an extrapolative probabilistic model of the dynamics from historical data, and using a limited number of current observations to make a classification. We leverage recent advances in long-term probabilistic forecasting, namely {\em Deep Probabilistic Koopman}, to build a general method for classifying anomalies in multi-dimensional time-series data. We also show how to utilize models with domain knowledge of the dynamics to reduce type I and type II error. We demonstrate our proposed method on the important real-world task of global atmospheric pollution monitoring, integrating it with NASA's Global Earth System Model. The system successfully detects localized anomalies in air quality due to events such as COVID-19 lockdowns and wildfires.
In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive ...neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks (DNNs). We provide baselines and an initial analysis of RepE techniques, showing that they offer simple yet effective solutions for improving our understanding and control of large language models. We showcase how these methods can provide traction on a wide range of safety-relevant problems, including honesty, harmlessness, power-seeking, and more, demonstrating the promise of top-down transparency research. We hope that this work catalyzes further exploration of RepE and fosters advancements in the transparency and safety of AI systems.
To determine whether blur adaptation influences blur sensitivity and blur discrimination thresholds in young adult myopes and emmetropes. In addition, to determine whether there is a differential ...effect of blur adaptation on blur sensitivity and discrimination between refractive error groups.
Proximal and distal blur sensitivity thresholds and blur discrimination thresholds were measured under cycloplegia with a Badal optometer in 24 young adult subjects (8 emmetropes EMM, 8 early-onset myopes EOM, and 8 late-onset myopes LOM). Adaptation to 1 D of myopic refractive blur was then undertaken for 30 minutes. Blur sensitivity and discrimination thresholds were then remeasured.
After blur adaptation, blur sensitivity, and blur discrimination thresholds were found to be elevated. Blur adaptation had a significant effect on distal blur sensitivity threshold, with the largest effect being observed in the EOMs. Mean changes in distal blur sensitivity thresholds were EMMs +0.03 +/- 0.14 D, EOMs +0.30 +/- 0.21 D, and LOMs +0.08 +/- 0.13 D.
Adaptation to a degraded stimulus modifies the blur detection mechanisms of the visual system in young adults. Depth of focus is expanded by prolonged exposure to defocus. EOMs are more susceptible to this phenomenon than are LOMs and EMMs.
We discuss our preliminary results in building a configurable accelerator for differential equation time stepping and iterative methods for algebraic equations. Relative to prior efforts in building ...hardware accelerators for numerical methods, our focus is on the following: 1) Demonstrating a higher order of numerical convergence that is needed to actually support existing numerical algorithms. 2) Providing the capacity for wide vectors of variables by keeping the hardware design components as simple as possible. 3) Demonstrating configurable hardware support for a variety of numerical algorithms that form the core of scientific computation libraries. These efforts are toward the goal of making the accelerator democratically accessible by computational scientists.