The ability to quickly learn from a small quantity oftraining data widens the range of machine learning applications. In this paper, we propose a data-efficient image captioning model, VisualGPT, ...which leverages the linguistic knowledge from a large pretrained language model(LM). A crucial challenge is to balance between the use of visual information in the image and prior linguistic knowledge acquired from pretraining. We designed a novel self-resurrecting encoder-decoder attention mechanism to quickly adapt the pretrained LM as the language decoder ona small amount of in-domain training data. The proposed self-resurrecting activation unit produces sparse activations but has reduced susceptibility to zero gradients. We train the proposed model, VisualGPT, on 0.1%, 0.5% and 1% of MSCOCO and Conceptual Captions training data. Under these conditions, we outperform the best baseline model by up to 10.8% CIDEr on MS COCO and upto 5.4% CIDEr on Conceptual Captions. Further, Visual-GPT achieves the state-of-the-art result on IU X-ray, a medical report generation dataset. To the best of our knowledge, this is the first work that improves data efficiency of image captioning by utilizing LM pretrained on unimodal data. Our code is available at: https://github.com/Vision-CAIR/VisualGPT.
Pointly Supervised Object Detection (PSOD) has attracted considerable interests due to its lower labeling cost as compared to box-level supervised object detection. However, the complex scenes, ...densely packed and dynamic-scale objects in Remote Sensing (RS) images hinder the development of PSOD methods in RS field. In this paper, we make the first attempt to achieve RS object detection with single point supervision, and propose a PSOD method tailored for RS images. Specifically, we design a point label upgrader (PLUG) to generate pseudo box labels from single point labels, and then use the pseudo boxes to supervise the optimization of existing detectors. Moreover, to handle the challenge of the densely packed objects in RS images, we propose a sparse feature guided semantic prediction module which can generate high-quality semantic maps by fully exploiting informative cues from sparse objects. Extensive ablation studies on the DOTA dataset have validated the effectiveness of our method. Our method can achieve significantly better performance as compared to state-of-the-art image-level and point-level supervised detection methods, and reduce the performance gap between PSOD and box-level supervised object detection. Code is available at https://github.com/heshitian/PLUG.
The progress of deep learning (DL), especially the recent development of automatic design of networks, has brought unprecedented performance gains at heavy computational cost. On the other hand, ...blockchain systems routinely perform a huge amount of computation that does not achieve practical purposes in order to build Proof-of-Work (PoW) consensus from decentralized participants. In this paper, we propose a new consensus mechanism, Proof of Learning (PoLe), which directs the computation spent for consensus toward optimization of neural networks (NN). In our mechanism, the training/testing data are released to the entire blockchain network (BCN) and the consensus nodes train NN models on the data, which serves as the proof of learning. When the consensus on the BCN considers a NN model to be valid, a new block is appended to the blockchain. We experimentally compare the PoLe protocol with Proof of Work (PoW) and show that PoLe can achieve a more stable block generation rate, which leads to more efficient transaction processing. We also introduce a novel cheating prevention mechanism, Secure Mapping Layer (SML), which can be straightforwardly implemented as a linear NN layer. Empirical evaluation shows that SML can detect cheating nodes at small cost to the predictive performance.
In many recent novel blockchain consensuses, deep learning training procedure becomes the task for miners to prove their workload, thus the computation power of miners will not purely be spent on the ...hash puzzle. Therefore, the hardware and energy will support the blockchain service and deep learning training at the same time. The incentive of miners is to earn tokens and individual miners will find mining pools become more competitive. To the best of our knowledgeWe are the first to demonstrate a mining pool solution for novel consensuses based on deep learning. This work adopts from exist Proof-of-Deep-Learning (PoDL) as the consensus and Neural Architecture Search (NAS) as the workload. The mining pool manager partitions the full searching space into subspaces and all miners contributes to the NAS task in the assigned tasks. The strong miners are assigned for exploration and the weak miners are assigned for exploitation. In section IV, it shows the performance of this mining pool is more competitive than an individual miner in conducting NAS as workload.
In L3 autonomous driving, driver needs to keep supervising the road condition and take over the vehicle in time when the system fails. According to related research, the transfer of vehicle control ...between autonomous driving system and driver is risky. Therefore, it is necessary to design a safety assistance system for the driver's takeover action, so as to improve the safety of L3 autonomous driving. Current research on driving assistance systems can not respond adequately to dangerous situations caused by the driver's behavior and is not satisfactory in terms of response speed. Due to the development of braincomputer interface technology, it is feasible to study the driver's electroencephalogram(EEG) that precedes the moment at which the brake pedal is activated, in order to perform an early detection of emergency braking. However, the signal-to-noise ratio(SNR) of EEG collected by non-invasive EEG analyzer is very low. It is a challenge to extract effective features from EEG signal with low SNR and realize online detection of braking intention. To overcome this problem, this paper proposes an online braking intention detection algorithm based on regularized linear discriminant analysis(RLDA). In offline analysis, our algorithm achieves an accuracy rate of nearly 90%, and the advancement reaches 189ms. In online analysis, the accuracy rate is close to 80%, and the advamcement reaches 140ms.
In L3 autonomous driving, the driver needs to keep supervising the road conditions and take over the vehicle in time when the system fails. However, the driver may not be able to quickly recover his ...driving ability due to fatigue or distraction. Vigilance is the ability to maintain attention and alertness. It is necessary to detect the driver's vigilance for the safety of L3 autonomous driving. Research on driving vigilance is mostly aimed at manual driving scenarios, and methods in related work are not feasible in the L3 autonomous driving. Reaction time, which is closely related to the Electroencephalogram (EEG) rhythm can be used as a measure for vigilance. In order to achieve the purpose of vigilance detection, this paper uses the wearable EEG device to collect the driver's EEG signal and predict the reaction time. However, the signal-to-noise ratio(SNR) of the EEG signal collected by the wearable EEG analyzer is very low. It is a challenge to extract effective features from EEG signals with low SNR and predict reaction time with low error. To solve this problem, this paper firstly extracts the power spectrum characteristics and time-frequency characteristics from the EEG signal. Secondly, the extracted characteristics are sent into Support Vector Regression(SVR) and Random Forest(RF) to predict the reaction time. In the simulation experiment, the prediction accuracy of our algorithm is nearly 90%, and the optimal RMSE can reach 100.19ms.
In this work, we propose a dynamic landing solution without the need for onboard exteroceptive sensors and an expensive computation unit, where all localization and control modules are carried out on ...the ground in a non-inertial frame. Our system starts with a relative state estimator of the aerial robot from the perspective of the landing platform, where the state tracking of the UAV is done through a set of onboard LED markers and an on-ground camera; the state is expressed geometrically on manifold, and is returned by Iterated Extended Kalman filter (IEKF) algorithm. Subsequently, a motion planning module is developed to guide the landing process, formulating it as a minimum jerk trajectory by applying the differential flatness property. Considering visibility and dynamic constraints, the problem is solved using quadratic programming, and the final motion primitive is expressed through piecewise polynomials. Through a series of experiments, the applicability of this approach is validated by successfully landing 18 cm x 18 cm quadrotor on a 43 cm x 43 cm platform, exhibiting performance comparable to conventional methods. Finally, we provide comprehensive hardware and software details to the research community for future reference.
Control invariant sets are crucial for various methods that aim to design safe control policies for systems whose state constraints must be satisfied over an indefinite time horizon. In this article, ...we explore the connections among reachability, control invariance, and Control Barrier Functions (CBFs). Unlike prior formulations based on backward reachability concepts, by examining a forward reachability problem, we are able to establish a strong link between these three concepts. First, our findings show that the inevitable Forward Reachable Tube (FRT), which is the set of states such that every trajectory reaching the FRT must have passed through a given initial set of states, is precisely this initial set of states itself if it is a robust control invariant set with a differentiable boundary. We highlight that this statement may not hold if the boundary is not differentiable. Next, we formulate a differential game between the control and disturbance, where the inevitable FRT is characterized by the zero-superlevel set of the value function. By incorporating a discount factor in the cost function of the game, the barrier constraint of the CBF naturally arises as the constraint that is imposed on the optimal control policy. Combining these results, the value function of our FRT formulation serves as a CBF-like function, and conversely, any valid CBF is also a forward reachability value function inside the control invariant set, thereby revealing the inverse optimality of the CBF. This strong link we establish between the reachability problem and the barrier constraint, while guaranteeing the continuity of the value function, is not achievable by previous backward reachability-based formulations. As such, our work fills a crucial gap in the existing literature that is vital for constructing valid CBFs to ensure safety.
We propose an improved support vector machine (SVM) classifier by introducing a new offset, for solving the real-world unbalanced classification problem. The new offset is calculated based on the ...unbalanced support vectors resulting from the unbalanced training data. We developed a weighted harmonic mean (WHM) algorithm to further reduce the effects of noise on offset calculation. We apply the proposed approach to classify real-world data. Results of simulation demonstrate the effectiveness of our proposed approach.
The power level of on-board charger (OBC) is limited by its cost and space occupation. Although the power density of OBC can be improved by system integration or applying wide bandgap power ...semiconductor device, the power level is still limited to 22 kW. Using the drivetrain of EV to constitute the AC/DC rectifier stage of OBC can save "half" the space of OBC. Therefore, the OBC will only need isolated DC/DC converter stage so that the power level will be improved a lot. The crux of integrated motor drive and charger system (IMDCS) is how to eliminate the motor torque in charging mode. This paper reviews and compares the existing torque cancellation methods. Furthermore, the essences of those methods are analyzed and classified. Finally, simulations and experiments results are used to verify the point of this paper.