Palladium has a number of important applications in energy and catalysis in which there is evidence that surface modification leads to enhanced properties. A strategy for preparing such materials is ...needed that combines the properties of (i) scalability (especially on high-surface-area substrates, e.g. powders); (ii) uniform deposition, even on substrates with complex, three-dimensional features; and (iii) low-temperature processing conditions that preserve nanopores and other nanostructures. Presented herein is a method that exhibits these properties and makes use of benign reagents without the use of specialized equipment. By exposing Pd powder to dilute hydrogen in nitrogen gas, sacrificial surface PdH is formed along with a controlled amount of dilute interstitial hydride. The lattice expansion that occurs in Pd under higher H2 partial pressures is avoided. Once the flow of reagent gas is terminated, addition of metal salts facilitates controlled, electroless deposition of an overlayer of subnanometer thickness. This process can be cycled to create thicker layers. The approach is carried out under ambient processing conditions, which is an advantage over some forms of atomic layer deposition. The hydride-mediated reaction is electroless in that it has no need for connection to an external source of electrical current and is thus amenable to deposition on high-surface-area substrates having rich, nanoscale topography as well as on insulator-supported catalyst particles. STEM-EDS measurements show that conformal Rh and Pt surface layers can be formed on Pd powder with this method. A growth model based on energy-resolved XPS depth profiling of Rh-modified Pd powder is in general agreement. After two cycles, deposits are consistent with 70–80% coverage and a surface layer with a thickness from 4 to 8 Å.
Many real-world applications require artificial agents to compete and coordinate with other agents in complex environments. As a stepping stone to this goal, the domain of StarCraft has emerged as an ...important challenge for artificial intelligence research, owing to its iconic and enduring status among the most difficult professional esports and its relevance to the real world in terms of its raw complexity and multi-agent challenges. Over the course of a decade and numerous competitions
, the strongest agents have simplified important aspects of the game, utilized superhuman capabilities, or employed hand-crafted sub-systems
. Despite these advantages, no previous agent has come close to matching the overall skill of top StarCraft players. We chose to address the challenge of StarCraft using general-purpose learning methods that are in principle applicable to other complex domains: a multi-agent reinforcement learning algorithm that uses data from both human and agent games within a diverse league of continually adapting strategies and counter-strategies, each represented by deep neural networks
. We evaluated our agent, AlphaStar, in the full game of StarCraft II, through a series of online games against human players. AlphaStar was rated at Grandmaster level for all three StarCraft races and above 99.8% of officially ranked human players.
Gaseous mixtures of diatomic hydrogen isotopologues and helium are often encountered in the nuclear energy industry and in analytical chemistry. Compositions of stored mixtures can vary due to ...interactions with storage and handling materials. When tritium is present, it decays to form ions and helium-3, both of which can lead to further compositional variation. Monitoring of composition is typically achieved by mass spectrometry, a method that is bulky and energy-intensive. Mass spectrometers disperse sample material through vacuum pumps, which is especially troublesome if tritium is present. Our ultimate goal is to create a compact, fast, low-power sensor that can determine composition with minimal gas consumption and waste generation, as a complement to mass spectrometry that can be instantiated more widely. We propose calorimetry of metal hydrides as an approach to this, due to the strong isotope effect on gas absorption, and demonstrate the sensitivity of measured heat flow to atomic composition of the gas. Peak shifts are discernible when mole fractions change by at least 1%. A mass flow restriction results in a unique dependence of the measurement on helium concentration. A mathematical model is presented as a first step toward prediction of the peak shapes and positions. The model includes a useful method to compute estimates of phase diagrams for palladium in the presence of arbitrary mixtures of hydrogen isotopologues. We expect that this approach can be used to deduce unknown atomic compositions from measured calorimetric data over a useful range of partial pressures of each component.
Display omitted
•Calorimetry of palladium can determine isotopic fractions in hydrogen gas mixtures.•Palladium contacts the gas through a flow restriction.•The flow restriction reveals partial pressure of inert components like helium.•An independent measurement of total pressure is needed.•The method is best suited to mixtures with similar quantities of two isotopes.
Gaseous mixtures of diatomic hydrogen isotopologues and helium are often encountered in the nuclear energy industry and in analytical chemistry. Compositions of stored mixtures can vary due to ...interactions with storage and handling materials. When tritium is present, it decays to form ions and helium-3, both of which can lead to further compositional variation. Monitoring of composition is typically achieved by mass spectrometry, a method that is bulky and energy-intensive. Mass spectrometers disperse sample material through vacuum pumps, which is especially troublesome if tritium is present. Moreover, our ultimate goal is to create a compact, fast, low-power sensor that can determine composition with minimal gas consumption and waste generation, as a complement to mass spectrometry that can be instantiated more widely. We propose calorimetry of metal hydrides as an approach to this, due to the strong isotope effect on gas absorption, and demonstrate the sensitivity of measured heat flow to atomic composition of the gas. Peak shifts are discernible when mole fractions change by at least 1%. A mass flow restriction results in a unique dependence of the measurement on helium concentration. We present a mathematical model as a first step toward prediction of the peak shapes and positions. The model includes a useful method to compute estimates of phase diagrams for palladium in the presence of arbitrary mixtures of hydrogen isotopologues. As a result, we expect that this approach can be used to deduce unknown atomic compositions from measured calorimetric data over a useful range of partial pressures of each component.
Language Models (LMs) often cannot be deployed because of their potential to harm users in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using human annotators to ...hand-write test cases. However, human annotation is expensive, limiting the number and diversity of test cases. In this work, we automatically find cases where a target LM behaves in a harmful way, by generating test cases ("red teaming") using another LM. We evaluate the target LM's replies to generated test questions using a classifier trained to detect offensive content, uncovering tens of thousands of offensive replies in a 280B parameter LM chatbot. We explore several methods, from zero-shot generation to reinforcement learning, for generating test cases with varying levels of diversity and difficulty. Furthermore, we use prompt engineering to control LM-generated test cases to uncover a variety of other harms, automatically finding groups of people that the chatbot discusses in offensive ways, personal and hospital phone numbers generated as the chatbot's own contact info, leakage of private training data in generated text, and harms that occur over the course of a conversation. Overall, LM-based red teaming is one promising tool (among many needed) for finding and fixing diverse, undesirable LM behaviors before impacting users.
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 ...exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.
We investigate the optimal model size and number of tokens for training a transformer language model under a given compute budget. We find that current large language models are significantly ...undertrained, a consequence of the recent focus on scaling language models whilst keeping the amount of training data constant. By training over 400 language models ranging from 70 million to over 16 billion parameters on 5 to 500 billion tokens, we find that for compute-optimal training, the model size and the number of training tokens should be scaled equally: for every doubling of model size the number of training tokens should also be doubled. We test this hypothesis by training a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4\(\times\) more more data. Chinchilla uniformly and significantly outperforms Gopher (280B), GPT-3 (175B), Jurassic-1 (178B), and Megatron-Turing NLG (530B) on a large range of downstream evaluation tasks. This also means that Chinchilla uses substantially less compute for fine-tuning and inference, greatly facilitating downstream usage. As a highlight, Chinchilla reaches a state-of-the-art average accuracy of 67.5% on the MMLU benchmark, greater than a 7% improvement over Gopher.
The performance of a language model has been shown to be effectively modeled as a power-law in its parameter count. Here we study the scaling behaviors of Routing Networks: architectures that ...conditionally use only a subset of their parameters while processing an input. For these models, parameter count and computational requirement form two independent axes along which an increase leads to better performance. In this work we derive and justify scaling laws defined on these two variables which generalize those known for standard language models and describe the performance of a wide range of routing architectures trained via three different techniques. Afterwards we provide two applications of these laws: first deriving an Effective Parameter Count along which all models scale at the same rate, and then using the scaling coefficients to give a quantitative comparison of the three routing techniques considered. Our analysis derives from an extensive evaluation of Routing Networks across five orders of magnitude of size, including models with hundreds of experts and hundreds of billions of parameters.
We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a \(2\) trillion token database, our ...Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25\(\times\) fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a frozen Bert retriever, a differentiable encoder and a chunked cross-attention mechanism to predict tokens based on an order of magnitude more data than what is typically consumed during training. We typically train RETRO from scratch, yet can also rapidly RETROfit pre-trained transformers with retrieval and still achieve good performance. Our work opens up new avenues for improving language models through explicit memory at unprecedented scale.
Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we ...present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.