Robust control and management of the grid relies on accurate data. Both phasor measurement units and remote terminal units are prone to false data injection attacks. Thus, it is crucial to have a ...mechanism for fast and accurate detection of tampered data-both for preventing attacks that may lead to blackouts, and for routine monitoring and control of current and future grids. We propose a decentralized false data injection detection scheme based on the Markov graph of the bus phase angles. We utilize the conditional covariance test CMIT to learn the structure of the grid. Using the dc power flow model, we show that, under normal circumstances, the Markov graph of the voltage angles is consistent with the power grid graph. Therefore, a discrepancy between the calculated Markov graph and learned structure should trigger the alarm. Our method can detect the most recent stealthy deception attack on the power grid that assumes knowledge of the bus-branch model of the system and is capable of deceiving the state estimator; hence damaging power network control, monitoring, demand response, and pricing scheme. Specifically, under the stealthy deception attack, the Markov graph of phase angles changes. In addition to detecting a state of attack, our method can detect the set of attacked nodes. To the best of our knowledge, our remedy is the first to comprehensively detect this sophisticated attack and it does not need additional hardware. Moreover, it is successful no matter the size of the attacked subset. Simulation of various power networks confirms our claims.
We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly ...according to gaussian distributions. We show that if the activation function
satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the “length process” converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases and the activation function
. We also show that this convergence may fail for
that violate our assumptions. We show how to use this analysis to choose the variance of weight initialization, depending on the activation function, so that hidden variables maintain a consistent scale throughout the network.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Given a knowledge base or KB containing (noisy) facts about common nouns or generics, such as “all trees produce oxygen” or “some animals live in forests”, we consider the problem of inferring ...additional such facts at a precision similar to that of the starting KB. Such KBs capture general knowledge about the world, and are crucial for various applications such as question answering. Different from commonly studied named entity KBs such as Freebase, generics KBs involve quantification, have more complex underlying regularities, tend to be more incomplete, and violate the commonly used locally closed world assumption (LCWA). We show that existing KB completion methods struggle with this new task, and present the first approach that is successful. Our results demonstrate that external information, such as relation schemas and entity taxonomies, if used appropriately, can be a surprisingly powerful tool in this setting. First, our simple yet effective knowledge guided tensor factorization approach achieves state-of-the-art results on two generics KBs (80% precise) for science, doubling their size at 74%–86% precision. Second, our novel taxonomy guided, submodular, active learning method for collecting annotations about rare entities (e.g., oriole, a bird) is 6x more effective at inferring further new facts about them than multiple active learning baselines.
Although synchronous PMUs are being deployed across the grid, it is not economical to place them at every node. Therefore, at some nodes in the system state estimators will be used. Both PMUs and ...state estimators are prone to false data injection attacks. Thus, it is crucial to have a mechanism for fast and accurate detection of malicious tampering; both for preventing the attacks that may lead to blackouts, and for routine monitoring and control tasks of smart grid. We propose a decentralized false data injection detection scheme based on Markov graph of bus phase angles. We utilize Conditional Covariance Test (CCT) to learn the structure of smart grid. Using the DC power flow model, we show that under normal circumstances, and because of walk-summability of the grid graph, the Markov graph of voltage angles matches the power grid graph; otherwise, a discrepancy should trigger the alarm. Local grid topology is available online from the protection system and we exploit it to check for mismatch. Our method can detect the most recent stealthy deception attack on power grid that assumes knowledge of bus-branch model of the system and is capable of deceiving the state estimator. Specifically, under the stealthy deception attack, the Markov graph of phase angles changes. To the best of our knowledge, our remedy is the first to comprehensively detect this sophisticated attack and it does not need additional hardware. Moreover, our detection scheme is successful no matter the size of the attacked subset. Simulation of various power networks confirms our claims.
In this thesis, we consider two main problems in learning with big data: data integrity and high dimension. We specifically consider the problem of data integrity in smart grid as it is of paramount ...importance for grid maintenance and control. In addition, data manipulation can lead to catastrophic events. Inspired by this problem, we then expand the horizon to designing a general framework for stochastic optimization in high dimension for any loss function and any underlying low dimensional structure. We propose Regularized Epoch-based Admm for Stochastic Optimization in high-dimensioN (REASON). Our ADMM method is based on epoch-based annealing and consists of inexpensive steps which involve projections on to simple norm balls. We provide explicit bounds for the sparse optimization problem and the noisy matrix decomposition problem and show that our convergence rate in both cases matches the minimax lower bound. For matrix decomposition into sparse and low rank components, we provide the first guarantees for any online method. Experiments show that for both sparse optimization and matrix decomposition problems, our algorithm outperforms the state-of-the-art methods. In particular, we reach higher accuracy with same time complexity.
Cooperative communication exploits wireless broadcast advantage to confront the severe fading effect on wireless communications. Proper allocation of power can play an important role in the ...performance of cooperative communication. In this paper, we propose a distributed game-theoretical method for power allocation in bidirectional cooperative communication networks. In this work, we consider two nodes as data sources who want to cooperate in sending data to the destination. In addition to being data source, each source node has to relay the other's data. We answer the question: How much power each node contributes for relaying other node's data? We use Stackelberg game which is an extensive-form game to find a solution to this problem. The proposed method reaches equilibrium in only one stage. It is shown that there are more benefits when bidirectional cooperation is done between node pairs who are closer to each other. Simulation results show that the proposed method leads to fair solution and the nodes farther to the destination should contribute more power to cooperate with others.
We prove bounds on the generalization error of convolutional networks. The bounds are in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from ...the weights to the initial weights. They are independent of the number of pixels in the input, and the height and width of hidden feature maps. We present experiments using CIFAR-10 with varying hyperparameters of a deep convolutional network, comparing our bounds with practical generalization gaps.
Given a knowledge base or KB containing (noisy) facts about common nouns or generics, such as "all trees produce oxygen" or "some animals live in forests", we consider the problem of inferring ...additional such facts at a precision similar to that of the starting KB. Such KBs capture general knowledge about the world, and are crucial for various applications such as question answering. Different from commonly studied named entity KBs such as Freebase, generics KBs involve quantification, have more complex underlying regularities, tend to be more incomplete, and violate the commonly used locally closed world assumption (LCWA). We show that existing KB completion methods struggle with this new task, and present the first approach that is successful. Our results demonstrate that external information, such as relation schemas and entity taxonomies, if used appropriately, can be a surprisingly powerful tool in this setting. First, our simple yet effective knowledge guided tensor factorization approach achieves state-of-the-art results on two generics KBs (80% precise) for science, doubling their size at 74%-86% precision. Second, our novel taxonomy guided, submodular, active learning method for collecting annotations about rare entities (e.g., oriole, a bird) is 6x more effective at inferring further new facts about them than multiple active learning baselines.
Deep neural networks may easily memorize noisy labels present in real-world data, which degrades their ability to generalize. It is therefore important to track and evaluate the robustness of models ...against noisy label memorization. We propose a metric, called susceptibility, to gauge such memorization for neural networks. Susceptibility is simple and easy to compute during training. Moreover, it does not require access to ground-truth labels and it only uses unlabeled data. We empirically show the effectiveness of our metric in tracking memorization on various architectures and datasets and provide theoretical insights into the design of the susceptibility metric. Finally, we show through extensive experiments on datasets with synthetic and real-world label noise that one can utilize susceptibility and the overall training accuracy to distinguish models that maintain a low memorization on the training set and generalize well to unseen clean data.
We propose a new framework for reasoning about generalization in deep learning. The core idea is to couple the Real World, where optimizers take stochastic gradient steps on the empirical loss, to an ...Ideal World, where optimizers take steps on the population loss. This leads to an alternate decomposition of test error into: (1) the Ideal World test error plus (2) the gap between the two worlds. If the gap (2) is universally small, this reduces the problem of generalization in offline learning to the problem of optimization in online learning. We then give empirical evidence that this gap between worlds can be small in realistic deep learning settings, in particular supervised image classification. For example, CNNs generalize better than MLPs on image distributions in the Real World, but this is "because" they optimize faster on the population loss in the Ideal World. This suggests our framework is a useful tool for understanding generalization in deep learning, and lays a foundation for future research in the area.