What's Hidden in a Randomly Weighted Neural Network? Ramanujan, Vivek; Wortsman, Mitchell; Kembhavi, Aniruddha ...
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Conference Proceeding
Odprti dostop
Training a neural network is synonymous with learning the values of the weights. By contrast, we demonstrate that randomly weighted neural networks contain subnetworks which achieve impressive ...performance without ever training the weight values. Hidden in a randomly weighted Wide ResNet-50 is a subnetwork (with random weights) that is smaller than, but matches the performance of a ResNet-34 trained on ImageNet. Not only do these ``untrained subnetworks" exist, but we provide an algorithm to effectively find them. We empirically show that as randomly weighted neural networks with fixed weights grow wider and deeper, an ``untrained subnetwork" approaches a network with learned weights in accuracy.
Neuromorphic networks of artificial neurons and synapses can solve computationally hard problems with energy efficiencies unattainable for von Neumann architectures. For image processing, silicon ...neuromorphic processors outperform graphic processing units in energy efficiency by a large margin, but deliver much lower chip-scale throughput. The performance-efficiency dilemma for silicon processors may not be overcome by Moore's law scaling of silicon transistors. Scalable and biomimetic active memristor neurons and passive memristor synapses form a self-sufficient basis for a transistorless neural network. However, previous demonstrations of memristor neurons only showed simple integrate-and-fire behaviors and did not reveal the rich dynamics and computational complexity of biological neurons. Here we report that neurons built with nanoscale vanadium dioxide active memristors possess all three classes of excitability and most of the known biological neuronal dynamics, and are intrinsically stochastic. With the favorable size and power scaling, there is a path toward an all-memristor neuromorphic cortical computer.
We present CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving. Previous work has employed a variety of methods, including multimodal regression, occupancy ...maps, and 1-step stochastic policies. We instead frame the trajectory prediction problem as classification over a diverse set of trajectories. The size of this set remains manageable due to the limited number of distinct actions that can be taken over a reasonable prediction horizon. We structure the trajectory set to a) ensure a desired level of coverage of the state space, and b) eliminate physically impossible trajectories. By dynamically generating trajectory sets based on the agent's current state, we can further improve our method's efficiency. We demonstrate our approach on public, real world self-driving datasets, and show that it outperforms state-of-the-art methods.
We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On ...robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as accurate as possible. However, during the learning of the student, we inject noise such as dropout, stochastic depth and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.
We consider the problem of estimating a sparse linear regression vector β * under a Gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is ...available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β * . We establish oracle inequalities for the prediction and ℓ 2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p = ∞, this result implies that a thresholded version of the Group Lasso estimator selects the sparsity pattern of β * with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and ℓ 2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation errors as compared to the Lasso. An important application of our results is provided by the problem of estimating multiple regression equations simultaneously or multi-task learning. In this case, we obtain refinements of the results in In Proc. of the 22nd Annual Conference on Learning Theory (COLT) (2009), which allow us to establish a quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.
In this paper, we introduce the notion of strongly generalized convex functions which is called as strongly η-convex stochastic processes. We prove the Hermite-Hadamard, Ostrowski type inequality, ...and obtain some important inequalities for above processes. Some previous results are special cases of the results obtained in this paper.
In the classical domain, it is well known that divisibility does not imply that a stochastic process is Markovian. However, for quantum processes, divisibility is often considered to be synonymous ...with Markovianity. We show that completely positive divisible quantum processes can still involve non-Markovian temporal correlations, that we then fully classify using the recently developed process tensor formalism, which generalizes the theory of stochastic processes to the quantum domain.
Gaussian predictive process models for large spatial data sets Banerjee, Sudipto; Gelfand, Alan E.; Finley, Andrew O. ...
Journal of the Royal Statistical Society. Series B, Statistical methodology,
September 2008, Letnik:
70, Številka:
4
Journal Article
Recenzirano
Odprti dostop
With scientific data available at geocoded locations, investigators are increasingly turning to spatial process models for carrying out statistical inference. Over the last decade, hierarchical ...models implemented through Markov chain Monte Carlo methods have become especially popular for spatial modelling, given their flexibility and power to fit models that would be infeasible with classical methods as well as their avoidance of possibly inappropriate asymptotics. However, fitting hierarchical spatial models often involves expensive matrix decompositions whose computational complexity increases in cubic order with the number of spatial locations, rendering such models infeasible for large spatial data sets. This computational burden is exacerbated in multivariate settings with several spatially dependent response variables. It is also aggravated when data are collected at frequent time points and spatiotemporal process models are used. With regard to this challenge, our contribution is to work with what we call predictive process models for spatial and spatiotemporal data. Every spatial (or spatiotemporal) process induces a predictive process model (in fact, arbitrarily many of them). The latter models project process realizations of the former to a lower dimensional subspace, thereby reducing the computational burden. Hence, we achieve the flexibility to accommodate non-stationary, non-Gaussian, possibly multivariate, possibly spatiotemporal processes in the context of large data sets. We discuss attractive theoretical properties of these predictive processes. We also provide a computational template encompassing these diverse settings. Finally, we illustrate the approach with simulated and real data sets.
In the past decade, blockchain has shown a promising vision to build trust without any powerful third party in a secure, decentralized and scalable manner. However, due to the wide application and ...future development from cryptocurrency to the Internet of Things, blockchain is an extremely complex system enabling integration with mathematics, computer science, communication and network engineering, etc. By revealing the intrinsic relationship between blockchain and communication, networking and computing from a methodological perspective, it provided a view to the challenge that engineers, experts and researchers hardly fully understand the blockchain process in a systematic view from top to bottom. In this article we first introduce how blockchain works, the research activities and challenges, and illustrate the roadmap involving the classic methodologies with typical blockchain use cases and topics. Second, in blockchain systems, how to adopt stochastic process, game theory, optimization theory, and machine learning to study the blockchain running processes and design the blockchain protocols/algorithms are discussed in details. Moreover, the advantages and limitations using these methods are also summarized as the guide of future work to be further considered. Finally, some remaining problems from technical, commercial and political views are discussed as the open issues. The main findings of this article will provide a survey from a methodological perspective to study theoretical model for blockchain fundamentals understanding, design network service for blockchain-based mechanisms and algorithms, as well as apply blockchain for the Internet of Things, etc.
The recently noticed ability of restart to reduce the expected completion time of first-passage processes allows appealing opportunities for performance improvement in a variety of settings. However, ...complex stochastic processes often exhibit several possible scenarios of completion which are not equally desirable in terms of efficiency. Here we show that restart may have profound consequences on the splitting probabilities of a Bernoulli-like first-passage process, i.e., of a process which can end with one of two outcomes. Particularly intriguing, in this respect, is the class of problems where a carefully adjusted restart mechanism maximizes the probability that the process will complete in a desired way. We reveal the universal aspects of this kind of optimal behavior by applying the general approach recently proposed for the problem of first-passage under restart.