Live speech portraits Lu, Yuanxun; Chai, Jinxiang; Cao, Xun
ACM transactions on graphics,
12/2021, Volume:
40, Issue:
6
Journal Article
Peer reviewed
Open access
To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system contains three ...stages. The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space. In the second stage, we learn facial dynamics and motions from the projected audio features. The predicted motions include head poses and upper body motions, where the former is generated by an autoregressive probabilistic model which models the head pose distribution of the target person. Upper body motions are deduced from head poses. In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings. Our method generalizes well to wild audio and successfully synthesizes high-fidelity personalized facial details, e.g., wrinkles, teeth. Our method also allows explicit control of head poses. Extensive qualitative and quantitative evaluations, along with user studies, demonstrate the superiority of our method over state-of-the-art techniques.
Training a bipedal character to play basketball and interact with objects, or a quadruped character to move in various locomotion modes, are difficult tasks due to the fast and complex contacts ...happening during the motion. In this paper, we propose a novel framework to learn fast and dynamic character interactions that involve multiple contacts between the body and an object, another character and the environment, from a rich, unstructured motion capture database. We use one-on-one basketball play and character interactions with the environment as examples. To achieve this task, we propose a novel feature called local motion phase, that can help neural networks to learn asynchronous movements of each bone and its interaction with external objects such as a ball or an environment. We also propose a novel generative scheme to reproduce a wide variation of movements from abstract control signals given by a gamepad, which can be useful for changing the style of the motion under the same context. Our scheme is useful for animating contact-rich, complex interactions for real-time applications such as computer games.
Aviation produces a net climate warming contribution that comprises multiple forcing terms of mixed sign. Aircraft NO.sub.x emissions are associated with both warming and cooling terms, with the ...short-term increase in O.sub.3 induced by NO.sub.x emissions being the dominant warming effect. The uncertainty associated with the magnitude of this climate forcer is amongst the highest out of all contributors from aviation and is owed to the nonlinearity of the NO.sub.x -O.sub.3 chemistry and the large dependency of the response on space and time, i.e., on the meteorological condition and background atmospheric composition. This study addresses how transport patterns of emitted NO.sub.x and their climate effects vary with respect to regions (North America, South America, Africa, Eurasia and Australasia) and seasons (January-March and July-September in 2014) by employing global-scale simulations. We quantify the climate effects from NO.sub.x emissions released at a representative aircraft cruise altitude of 250 hPa (â¼10 400 m) in terms of radiative forcing resulting from their induced short-term contributions to O.sub.3 . The emitted NO.sub.x is transported with Lagrangian air parcels within the ECHAM5/MESSy Atmospheric Chemistry (EMAC) model. To identify the main global transport patterns and associated climate impacts of the 14 000 simulated air parcel trajectories, the unsupervised QuickBundles clustering algorithm is adapted and applied. Results reveal a strong seasonal dependence of the contribution of NO.sub.x emissions to O.sub.3 . For most regions, an inverse relationship is found between an air parcel's downward transport and its mean contribution to O.sub.3 . NO.sub.x emitted in the northern regions (North America and Eurasia) experience the longest residence times in the upper midlatitudes (40 %-45 % of their lifetime), while those beginning in the south (South America, Africa and Australasia) remain mostly in the Tropics (45 %-50 % of their lifetime). Due to elevated O.sub.3 sensitivities, emissions in Australasia induce the highest overall radiative forcing, attaining values that are larger by factors of 2.7 and 1.2 relative to Eurasia during January and July, respectively. The location of the emissions does not necessarily correspond to the region that will be most affected - for instance, NO.sub.x over North America in July will induce the largest radiative forcing in Europe. Overall, this study highlights the spatially and temporally heterogeneous nature of the NO.sub.x -O.sub.3 chemistry from a global perspective, which needs to be accounted for in efforts to minimize aviation's climate impact, given the sector's resilient growth.
Many modern sequential recommender systems use deep neural networks, which can effectively estimate the relevance of items, but require a lot of time to train. Slow training increases the costs of ...training, hinders product development timescales and prevents the model from being regularly updated to adapt to changing user preferences. The training of such sequential models involves appropriately sampling past user interactions to create a realistic training objective. The existing training objectives have limitations. For instance, next item prediction never uses the beginning of the sequence as a learning target, thereby potentially discarding valuable data. On the other hand, the item masking used by the state-of-the-art BERT4Rec recommender model is only weakly related to the goal of the sequential recommendation; therefore, it requires much more time to obtain an effective model. Hence, we propose a novel Recency-based Sampling of Sequences (RSS) training objective (which is parameterized by a choice of recency importance function) that addresses both limitations. We apply our method to various recent and state-of-the-art model architectures – such as GRU4Rec, Caser, and SASRec. We show that the models enhanced with our method can achieve performances exceeding or very close to the effective BERT4Rec, but with much less training time. For example, on the MovieLens-20M dataset, RSS applied to the SASRec model can result in a 60% improvement in NDCG over a vanilla SASRec, and a 16% improvement over a fully-trained BERT4Rec model, despite taking 93% less training time than BERT4Rec. We also experiment with two families of recency importance functions and show that they perform similarly. We further empirically demonstrate that RSS-enhanced SASRec successfully learns to distinguish differences between recent and older interactions – a property that the original SASRec model does not exhibit. Overall, we show that RSS is a viable (and frequently better) alternative to the existing training objectives, which is both effective and efficient for training sequential recommender model when the computational resources for training are limited.
Ab initio thermodynamics of liquid and solid water Cheng, Bingqing; Engel, Edgar A.; Behler, Jörg ...
Proceedings of the National Academy of Sciences - PNAS,
01/2019, Volume:
116, Issue:
4
Journal Article
Peer reviewed
Open access
Thermodynamic properties of liquid water as well as hexagonal (Ih) and cubic (Ic) ice are predicted based on density functional theory at the hybrid-functional level, rigorously taking into account ...quantum nuclear motion, anharmonic fluctuations, and proton disorder. This is made possible by combining advanced free-energy methods and state-of-the-art machine-learning techniques. The ab initio description leads to structural properties in excellent agreement with experiments and reliable estimates of the melting points of light and heavy water. We observe that nuclear-quantum effects contribute a crucial 0:2 meV/H₂O to the stability of ice Ih, making it more stable than ice Ic. Our computational approach is general and transferable, providing a comprehensive framework for quantitative predictions of ab initio thermodynamic properties using machine-learning potentials as an intermediate step.
It is common wisdom that gathering a variety of views and inputs improves the process of decision making, and, indeed, underpins a democratic society. Dubbed 'ensemble learning' by researchers in ...computational intelligence and machine learning, it is known to improve a decision system's robustness and accuracy. Now, fresh developments are allowing researchers to unleash the power of ensemble learning in an increasing range of real-world applications. Ensemble learning algorithms such as 'boosting' and 'random forest' facilitate solutions to key computational issues such as face recognition and are now being applied in areas as diverse as object tracking and bioinformatics. Responding to a shortage of literature dedicated to the topic, this volume offers comprehensive coverage of state-of-the-art ensemble learning techniques, including the random forest skeleton tracking algorithm in the Xbox Kinect sensor, which bypasses the need for game controllers. At once a solid theoretical study and a practical guide, the volume is a windfall for researchers and practitioners alike. Dr. Zhang works for Microsoft. Dr. Ma works for Honeywell.
Numerical simulations on fluid dynamics problems primarily rely on spatially or/and temporally discretization of the governing equation using polynomials into a finite-dimensional algebraic system. ...Due to the multi-scale nature of the physics and sensitivity from meshing a complicated geometry, such process can be computational prohibitive for most real-time applications (e.g., clinical diagnosis and surgery planning) and many-query analyses (e.g., optimization design and uncertainty quantification). Therefore, developing a cost-effective surrogate model is of great practical significance. Deep learning (DL) has shown new promises for surrogate modeling due to its capability of handling strong nonlinearity and high dimensionality. However, the off-the-shelf DL architectures, success of which heavily relies on the large amount of training data and interpolatory nature of the problem, fail to operate when the data becomes sparse. Unfortunately, data is often insufficient in most parametric fluid dynamics problems since each data point in the parameter space requires an expensive numerical simulation based on the first principle, e.g., Navier–Stokes equations. In this paper, we provide a physics-constrained DL approach for surrogate modeling of fluid flows without relying on any simulation data. Specifically, a structured deep neural network (DNN) architecture is devised to enforce the initial and boundary conditions, and the governing partial differential equations (i.e., Navier–Stokes equations) are incorporated into the loss of the DNN to drive the training. Numerical experiments are conducted on a number of internal flows relevant to hemodynamics applications, and the forward propagation of uncertainties in fluid properties and domain geometry is studied as well. The results show excellent agreement on the flow field and forward-propagated uncertainties between the DL surrogate approximations and the first-principle numerical simulations.
•Proposed a simulation-free, physics-constrained deep learning for surrogate CFD model.•Boundary-encoded neural network outperforms the one with soft boundary constraints.•Demonstrated effectiveness of the label-free learning on a few vascular flows.