Designing Network Design Spaces Radosavovic, Ilija; Kosaraju, Raj Prateek; Girshick, Ross ...
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Conference Proceeding
Odprti dostop
In this work, we present a new network design paradigm. Our goal is to help advance the understanding of network design and discover design principles that generalize across settings. Instead of ...focusing on designing individual network instances, we design network design spaces that parametrize populations of networks. The overall process is analogous to classic manual design of networks, but elevated to the design space level. Using our methodology we explore the structure aspect of network design and arrive at a low-dimensional design space consisting of simple, regular networks that we call RegNet. The core insight of the RegNet parametrization is surprisingly simple: widths and depths of good networks can be explained by a quantized linear function. We analyze the RegNet design space and arrive at interesting findings that do not match the current practice of network design. The RegNet design space provides simple and fast networks that work well across a wide range of flop regimes. Under comparable training settings and flops, the RegNet models outperform the popular EfficientNet models while being up to 5x faster on GPUs.
Recently, channel attention mechanism has demonstrated to offer great potential in improving the performance of deep convolutional neural networks (CNNs). However, most existing methods dedicate to ...developing more sophisticated attention modules for achieving better performance, which inevitably increase model complexity. To overcome the paradox of performance and complexity trade-off, this paper proposes an Efficient Channel Attention (ECA) module, which only involves a handful of parameters while bringing clear performance gain. By dissecting the channel attention module in SENet, we empirically show avoiding dimensionality reduction is important for learning channel attention, and appropriate cross-channel interaction can preserve performance while significantly decreasing model complexity. Therefore, we propose a local cross-channel interaction strategy without dimensionality reduction, which can be efficiently implemented via 1D convolution. Furthermore, we develop a method to adaptively select kernel size of 1D convolution, determining coverage of local cross-channel interaction. The proposed ECA module is both efficient and effective, e.g., the parameters and computations of our modules against backbone of ResNet50 are 80 vs. 24.37M and 4.7e-4 GFlops vs. 3.86 GFlops, respectively, and the performance boost is more than 2% in terms of Top-1 accuracy. We extensively evaluate our ECA module on image classification, object detection and instance segmentation with backbones of ResNets and MobileNetV2. The experimental results show our module is more efficient while performing favorably against its counterparts.
High-performance lithium-ion batteries are commonly built with heterogeneous composite electrodes that combine multiple active components for serving various electrochemical and structural functions. ...Engineering these heterogeneous composite electrodes toward drastically improved battery performance is hinged on a fundamental understanding of the mechanisms of multiple active components and their synergy or trade-off effects. Herein, we report a rational design, fabrication, and understanding of yolk@shell Bi
S
@N-doped mesoporous carbon (C) composite anode, consisting of a Bi
S
nanowire (NW) core within a hollow space surrounded by a thin shell of N-doped mesoporous C. This composite anode exhibits desirable rate performance and long cycle stability (700 cycles, 501 mAhg
at 1.0 Ag
, 85% capacity retention). By in situ transmission electron microscopy (TEM), X-ray diffraction, and NMR experiments and computational modeling, we elucidate the dominant mechanisms of the phase transformation, structural evolution, and lithiation kinetics of the Bi
S
NWs anode. Our combined in situ TEM experiments and finite element simulations reveal that the hollow space between the Bi
S
NWs core and carbon shell can effectively accommodate the lithiation-induced expansion of Bi
S
NWs without cracking C shells. This work demonstrates an effective strategy of engineering the yolk@shell-architectured anodes and also sheds light onto harnessing the complex multistep reactions in metal sulfides to enable high-performance lithium-ion batteries.
Light-weight convolutional neural networks (CNNs) suffer performance degradation as their low computational budgets constrain both the depth (number of convolution layers) and the width (number of ...channels) of CNNs, resulting in limited representation capability. To address this issue, we present Dynamic Convolution, a new design that increases model complexity without increasing the network depth or width. Instead of using a single convolution kernel per layer, dynamic convolution aggregates multiple parallel convolution kernels dynamically based upon their attentions, which are input dependent. Assembling multiple kernels is not only computationally efficient due to the small kernel size, but also has more representation power since these kernels are aggregated in a non-linear way via attention. By simply using dynamic convolution for the state-of-the-art architecture MobileNetV3-Small, the top-1 accuracy of ImageNet classification is boosted by 2.9% with only 4% additional FLOPs and 2.9 AP gain is achieved on COCO keypoint detection.
Exploring Self-Attention for Image Recognition Zhao, Hengshuang; Jia, Jiaya; Koltun, Vladlen
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
06/2020
Conference Proceeding
Odprti dostop
Recent work has shown that self-attention can serve as a basic building block for image recognition models. We explore variations of self-attention and assess their effectiveness for image ...recognition. We consider two forms of self-attention. One is pairwise self-attention, which generalizes standard dot-product attention and is fundamentally a set operator. The other is patchwise self-attention, which is strictly more powerful than convolution. Our pairwise self-attention networks match or outperform their convolutional counterparts, and the patchwise models substantially outperform the convolutional baselines. We also conduct experiments that probe the robustness of learned representations and conclude that self-attention networks may have significant benefits in terms of robustness and generalization.
With the breakthroughs in deep learning, the recent years have witnessed a booming of artificial intelligence (AI) applications and services, spanning from personal assistant to recommendation ...systems to video/audio surveillance. More recently, with the proliferation of mobile computing and Internet of Things (IoT), billions of mobile and IoT devices are connected to the Internet, generating zillions bytes of data at the network edge. Driving by this trend, there is an urgent need to push the AI frontiers to the network edge so as to fully unleash the potential of the edge big data. To meet this demand, edge computing, an emerging paradigm that pushes computing tasks and services from the network core to the network edge, has been widely recognized as a promising solution. The resulted new interdiscipline, edge AI or edge intelligence (EI), is beginning to receive a tremendous amount of interest. However, research on EI is still in its infancy stage, and a dedicated venue for exchanging the recent advances of EI is highly desired by both the computer system and AI communities. To this end, we conduct a comprehensive survey of the recent research efforts on EI. Specifically, we first review the background and motivation for AI running at the network edge. We then provide an overview of the overarching architectures, frameworks, and emerging key technologies for deep learning model toward training/inference at the network edge. Finally, we discuss future research opportunities on EI. We believe that this survey will elicit escalating attentions, stimulate fruitful discussions, and inspire further research ideas on EI.
Blockchained On-Device Federated Learning Kim, Hyesung; Park, Jihong; Bennis, Mehdi ...
IEEE communications letters,
06/2020, Letnik:
24, Številka:
6
Journal Article
Recenzirano
Odprti dostop
By leveraging blockchain, this letter proposes a blockchained federated learning (BlockFL) architecture where local learning model updates are exchanged and verified. This enables on-device machine ...learning without any centralized training data or coordination by utilizing a consensus mechanism in blockchain. Moreover, we analyze an end-to-end latency model of BlockFL and characterize the optimal block generation rate by considering communication, computation, and consensus delays.
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of ...different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity. Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design. Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet 1 classification, COCO object detection 2, VOC image segmentation 3. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.