Atomic force microscopy (AFM) is one of the most popular imaging and characterizing methods applicable to a wide range of nanoscale material systems. However, high‐resolution imaging using AFM ...generally suffers from a low scanning yield due to its method of raster scanning. Here, a systematic method of data acquisition and preparation combined with a deep‐learning‐based image super‐resolution, enabling rapid AFM characterization with accuracy, is proposed. Its application to measuring the geometrical and mechanical properties of structured DNA assemblies reveals that around a tenfold reduction in AFM imaging time can be achieved without significant loss of accuracy. Through a transfer learning strategy, it can be efficiently customized for a specific target sample on demand.
A simple and practical way of accelerating atomic force microscopy (AFM) characterization enabled by a deep‐learning‐based image super‐resolution method combined with the data acquisition and preparation process is developed. Its application to measuring the geometrical and mechanical properties of DNA assemblies reveals that around a tenfold reduction of time in AFM characterization can be achieved without significant loss of accuracy.
Recently, on-device training has become crucial for the success of edge intelligence. However, frequent data movement between computing units and memory during training has been a major problem for ...battery-powered edge devices. Processing-in-memory (PIM) is a novel computing paradigm that merges computing logic into memory, which can address the data movement problem with excellent power efficiency. However, previous PIM accelerators cannot support the entire training process on chip due to its computing complexity. This article presents a PIM accelerator for end-to-end on-device training (T-PIM), the first PIM realization that enables end-to-end on-device training as well as high-speed inference. Its full-custom PIM macro contains 8T-SRAM cells to perform the energy-efficient in-cell AND operation and the bit-serial-based computation logic enables fully variable bit-precision for input data. The macro supports various data mapping methods and computational paths for both fully connected and convolutional layers, in order to handle the complex training process. An efficient tiling scheme is also proposed to enable T-PIM to compute any size of deep neural network with the implemented hardware. In addition, configurable arithmetic units in a forward propagation path make T-PIM handle power-of-two bit-precision for weight data, enabling a significant performance boost during inference. Finally, T-PIM efficiently handles sparsity in both operands by skipping the computation of zeros in the input data and by gating-off computing units when the weight data are zero. Finally, we fabricate the T-PIM chip in 28-nm CMOS technology, occupying a die area of 5.04 mm2, including five T-PIM cores. It dissipates 5.25-51.23 mW at 50-280 MHz operating frequency with 0.75-1.05-V supply voltage. We successfully demonstrate that T-PIM can run the end-to-end training of VGG16 model on the CIFAR10 and CIFAR100 datasets, achieving 0.13-161.08- and 0.25-7.59-TOPS/W power efficiency during inference and training, respectively. The result shows that T-PIM is <inline-formula> <tex-math notation="LaTeX">2.02\times </tex-math></inline-formula> more energy-efficient than the state-of-the-art PIM chip that only supports backward propagation, not a whole training. Furthermore, we conduct an architectural experiment using a cycle-level simulator based on actual measurement results, which suggests that the T-PIM architecture is scalable and its scaled-up version provides up to <inline-formula> <tex-math notation="LaTeX">203.26\times </tex-math></inline-formula> higher power efficiency than a comparable GPU.
Existing inefficient traffic signal plans are causing traffic congestions in many urban areas. In recent years, many deep reinforcement learning (RL) methods have been proposed to control traffic ...signals in real-time by interacting with the environment. However, most of existing state-of-the-art RL methods use complex state definition and reward functions and/or neglect the real-world constraints such as cyclic phase order and minimum/maximum duration for each traffic phase. These issues make existing methods infeasible to implement for real-world applications. In this paper, we propose an RL-based multi-intersection traffic light control model with a simple yet effective combination of state, reward, and action definitions. The proposed model uses a novel pressure method called Biased Pressure (BP). We use a state-of-the-art advantage actor-critic learning mechanism in our model. Due to the decentralized nature of our state, reward, and action definitions, we achieve a scalable model. The performance of the proposed method is compared with related methods using both synthetic and real-world datasets. Experimental results show that our method outperforms the existing cyclic phase control methods with a significant margin in terms of throughput and average travel time. Moreover, we conduct ablation studies to justify the superiority of the BP method over the existing pressure methods.
We propose a software/hardware co-design framework called Agamotto for the complete design automation and performance optimization of the row stationary-based CNN accelerator. We design a scalable ...accelerator template whose critical design parameters can be configured. Based on the hardware template, Agamotto estimates the performance of the numerous possible hardware implementations for the target FPGA device and CNN model using the latency modeling tool. It chooses the best hardware design and generates the instructions and optimal runtime variables for each target CNN layer. As a result, Agamotto can generate the best hardware design within 61.67 seconds, achieving up to 2.8x higher hardware utilization than the original accelerator. In addition, experimental results show that the performance estimation is accurate, showing only 4.8% difference against the FPGA runtime for the end-to-end CNN model execution. The accelerator implemented on the Xilinx VCU118 evaluation board achieves 402 giga operations per second (GOPS) at 200 MHz, resulting in 13 frames per second (FPS) for the end-to-end execution of VGG-16. It is flexible enough to run more complex CNN models such as ResNet-50 and DarkNet-53, achieving 29.3 FPS and 16.9 FPS, respectively.
We present an energy-efficient processing-in-memory (PIM) architecture named Z-PIM that supports both sparsity handling and fully variable bit-precision in weight data for energy-efficient deep ...neural networks. Z-PIM adopts the bit-serial arithmetic that performs a multiplication bit-by-bit through multiple cycles to reduce the complexity of the operation in a single cycle and to provide flexibility in bit-precision. To this end, it employs a zero-skipping convolution SRAM, which performs in-memory AND operations based on custom 8T-SRAM cells and channel-wise accumulations, and a diagonal accumulation SRAM that performs bit- and spatial-wise accumulation on the channel-wise accumulation results using diagonal logic and adders to produce the final convolution outputs. We propose the hierarchical bitline structure for energy-efficient weight bit pre-charging and computational readout by reducing the parasitic capacitances of the bitlines. Its charge reuse scheme reduces the switching rate by 95.42% for the convolution layers of VGG-16 model. In addition, Z-PIM's channel-wise data mapping enables sparsity handling by skip-reading the input channels with zero weight. Its read-operation pipelining enabled by a read-sequence scheduling improves the throughput by 66.1%. The Z-PIM chip is fabricated in a 65-nm CMOS process on a 7.568-mm 2 die, while it consumes average 5.294-mW power at 1.0-V voltage and 200-MHz frequency. It achieves 0.31-49.12-TOPS/W energy efficiency for convolution operations as the weight sparsity and bit-precision vary from 0.1 to 0.9 and 1 to 16 bit, respectively. For the figure of merit considering input bit-width, weight bit-width, and energy efficiency, the Z-PIM shows more than 2.1 times improvement over the state-of-the-art PIM implementations.
Purpose
We examined regression patterns in pediatric optic pathway gliomas (OPGs) after proton beam therapy (PBT) and evaluated local control and visual outcomes.
Methods
A total of 42 brain magnetic ...resonance imaging (MRI) scans from seven consecutive sporadic OPGs that were initially treated with chemotherapy and received PBT between June 2007 and September 2016 at the National Cancer Center, Korea were analyzed. Patients underwent brain MRI regularly before and after PBT. Total tumor, cystic lesion, and solid enhancing lesion area delineation and volume calculations were performed on gadolinium‐enhanced T1‐weighted MRI using Eclipse version 13, Varian.
Results
The median follow‐up period after PBT was 70 months (range 47–88). The median age at the time of PBT was 7 years (range 4–16) and the median duration of chemotherapy before referral to PBT center was 25 months (range 3–70). The median time to the greatest increase in cystic volume was 32 months (range 12–43) after PBT. Solid enhancing lesion volume gradually decreased throughout the follow‐up period. On an individual basis, total volume change was varied. However, on average, it regressed, although at a slower rate than solid enhancing lesion volume did. The local control rate was 85.7% (5‐year progression‐free survival rate, 80%; 5‐year overall survival rate, 100%) and the rate of vision preservation was 71.4% (five of seven patients).
Conclusion
The regression patterns in pediatric OPGs after PBT involve significant cystic change. Therefore, total volume is not appropriate for evaluating response. Care by a multidisciplinary team is necessary to manage clinical symptoms related to radiologic changes.
Over the past few years, on-device learning (ODL) has become an integral aspect of the success of edge devices that embrace machine learning (ML) since it plays a crucial role in restoring ML model ...accuracy when the edge environment changes. However, implementing ODL on battery-limited edge devices poses significant challenges due to the generation of large-size intermediate data during ML training and the frequent data movement between the processor and memory, resulting in substantial power consumption. To address this limitation, certain ML accelerators in edge devices have adopted a processing-in-memory (PIM) paradigm, integrating computing logic into memory. Nevertheless, these accelerators still face hurdles such as long latency caused by the lack of a pipelined approach in the training process, notable power and area overheads related to floating-point arithmetic, and incomplete handling of data sparsity during training. This article presents a high-throughput super-pipelined PIM accelerator, named SP-PIM, designed to overcome the limitations of existing PIM-based ODL accelerators. To this end, SP-PIM implements a holistic multi-level pipelining scheme based on local error prediction (EP), enhancing training speed by 7.31<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula>. In addition, SP-PIM introduces a local EP unit (LEPU), a lightweight circuit that performs accurate EP leveraging power-of-two (PoT) random weights. This strategy significantly reduces power-hungry external memory access (EMA) by 59.09%. Moreover, SP-PIM fully exploits sparsities in both activation and error data during training, facilitated by a highly optimized PIM macro design. Finally, the SP-PIM chip, fabricated using 28-nm CMOS technology, achieves a training speed of 8.81 epochs/s. It occupies a die area of 5.76 mm2 and consumes between 6.91 and 433.25 mW at operating frequencies of 20-450 MHz with a supply voltage of 0.56-1.05 V. We demonstrate that it can successfully execute end-to-end ODL for the CIFAR10 and CIFAR100 datasets. Consequently, it achieves state-of-the-art area efficiency (560.6 GFLOPS/mm2) and competitive power efficiency (22.4 TFLOPS/W), marking a 3.95<inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> higher figure-of-merit (area efficiency <inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> power efficiency <inline-formula> <tex-math notation="LaTeX">\times </tex-math></inline-formula> capacity) than previous work. Furthermore, we implemented a cycle-level simulator using Python to investigate and validate the scalability of SP-PIM. By doing architectural experiments in various hardware configurations, we successfully verified that the core computing unit within SP-PIM possesses both scale-up and scale-out capabilities.
This study examined how higher body mass index (BMI) affects the work hours of men and women and how the impact varies by gender and the value of BMI. Using a longitudinal dataset of 1603 British ...adults (men:
= 775; women:
= 828) and a panel threshold regression model, this study estimated that BMI has significant impacts on work hours but the pattern is different by gender and BMI groups. BMI is positively associated with work hours up to the estimated BMI threshold of 30, which corresponds to the clinical cutoff point of obesity; above this point, additional increases in BMI is associated with reduced work hours. The asymmetric nonlinear relationship between BMI and work hours was more evident among women, particularly female low-skilled workers. The results imply reduced work capacity and lower labor income for women with a higher BMI above an obesity threshold, highlighting a practical role of BMI's obesity cutoff value. The findings of this study provide a new perspective regarding the economic burden of workplace obesity and point out the need to design gender-specific and BMI-based strategies to tackle productivity loss from obesity.
Abstract
The ultrasensitive threshold response is ubiquitous in biochemical systems. In contrast, achieving ultrasensitivity in synthetic molecular structures in a controllable way is challenging. ...Here, we propose a chemomechanical approach inspired by Michell’s instability to realize it. A sudden reconfiguration of topologically constrained rings results when the torsional stress inside reaches a critical value. We use DNA origami to construct molecular rings and then DNA intercalators to induce torsional stress. Michell’s instability is achieved successfully when the critical concentration of intercalators is applied. Both the critical point and sensitivity of this ultrasensitive threshold reconfiguration can be controlled by rationally designing the cross-sectional shape and mechanical properties of DNA rings.
In higher plants, gravitropism proceeds through three sequential steps in the responding organs: perception of gravity signals, signal transduction and asymmetric cell elongation. Light and ...temperature also influence the gravitropic orientation of plant organs. A series of Arabidopsis shoot gravitropism (sgr) mutants has been shown to exhibit disturbed shoot gravitropism. SGR5 is functionally distinct from other SGR members in that it mediates the early events of gravitropic responses in inflorescence stems. Here, we demonstrated that SGR5 alternative splicing produces two protein variants (SGR5α and SGR5β) in modulating the gravitropic response of inflorescence stems at high temperatures. SGR5β inhibits SGR5α function by forming non‐DNA‐binding heterodimers. Transgenic plants overexpressing SGR5β (35S:SGR5β) exhibit reduced gravitropic growth of inflorescence stems, as observed in the SGR5‐deficient sgr5‐5 mutant. Interestingly, SGR5 alternative splicing is accelerated at high temperatures, resulting in the high‐level accumulation of SGR5β transcripts. When plants were exposed to high temperatures, whereas gravitropic curvature was reduced in Col‐0 inflorescence stems, it was uninfluenced in the inflorescence stems of 35S:SGR5β transgenic plants and sgr5‐5 mutant. We propose that the thermoresponsive alternative splicing of SGR5 provides an adaptation strategy by which plants protect the shoots from hot air under high temperature stress in natural habitats.