Image-text retrieval is a fundamental cross-modal task whose main idea is to learn image-text matching. Generally, according to whether there exist interactions during the retrieval process, existing ...image-text retrieval methods can be classified into independent representation matching methods and cross-interaction matching methods. The independent representation matching methods generate the embeddings of images and sentences independently and thus are convenient for retrieval with hand-crafted matching measures (e.g., cosine or Euclidean distance). As to the cross-interaction matching methods, they achieve improvement by introducing the interaction-based networks for inter-relation reasoning, yet suffer the low retrieval efficiency. This article aims to develop a method that takes the advantages of cross-modal inter-relation reasoning of cross-interaction methods while being as efficient as the independent methods. To this end, we propose a graph-based
Cross-modal Graph Matching Network (CGMN)
, which explores both intra- and inter-relations without introducing network interaction. In CGMN, graphs are used for both visual and textual representation to achieve intra-relation reasoning across regions and words, respectively. Furthermore, we propose a novel graph node matching loss to learn fine-grained cross-modal correspondence and to achieve inter-relation reasoning. Experiments on benchmark datasets MS-COCO, Flickr8K, and Flickr30K show that CGMN outperforms state-of-the-art methods in image retrieval. Moreover, CGMM is much more efficient than state-of-the-art methods using interactive matching. The code is available at
https://github.com/cyh-sj/CGMN
.
The paper presents a hybrid indoor positioning solution based on a pedestrian dead reckoning (PDR) approach using built-in sensors on a smartphone. To address the challenges of flexible and complex ...contexts of carrying a phone while walking, a robust step detection algorithm based on motion-awareness has been proposed. Given the fact that step length is influenced by different motion states, an adaptive step length estimation algorithm based on motion recognition is developed. Heading estimation is carried out by an attitude acquisition algorithm, which contains a two-phase filter to mitigate the distortion of magnetic anomalies. In order to estimate the heading for an unconstrained smartphone, principal component analysis (PCA) of acceleration is applied to determine the offset between the orientation of smartphone and the actual heading of a pedestrian. Moreover, a particle filter with vector graph assisted particle weighting is introduced to correct the deviation in step length and heading estimation. Extensive field tests, including four contexts of carrying a phone, have been conducted in an office building to verify the performance of the proposed algorithm. Test results show that the proposed algorithm can achieve sub-meter mean error in all contexts.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Precise pedestrian positioning based on smartphone-grade sensors has been a research hotspot for several years. Due to the poor performance of the mass-market Micro-Electro-Mechanical Systems (MEMS) ...Magnetic, Angular Rate, and Gravity (MARG) sensors, the standalone pedestrian dead reckoning (PDR) module cannot avoid long-time heading drift, which leads to the failure of the entire positioning system. In outdoor scenes, the Global Navigation Satellite System (GNSS) is one of the most popular positioning systems, and smartphone users can use it to acquire absolute coordinates. However, the smartphone's ultra-low-cost GNSS module is limited by some components such as the antenna, and so it is susceptible to serious interference from the multipath effect, which is a main error source of smartphone-based GNSS positioning. In this paper, we propose a multi-phase GNSS/PDR fusion framework to overcome the limitations of standalone modules. The first phase is to build a pseudorange double-difference based on smartphone and reference stations, the second phase proposes a novel multipath mitigation method based on multipath partial parameters estimation (MPPE) and a Double-Difference Code-Minus-Carrier (DDCMC) filter, and the third phase is to propose the joint stride lengths and heading estimations of the two standalone modules, to reduce the long-time drift and noise. The experimental results demonstrate that the proposed multipath error estimation can effectively suppress the double-difference multipath error exceeding 4 m, and compared to other methods, our fusion method achieves a minimum error RMSE of 1.63 m in positioning accuracy, and a minimum error RMSE of 4.71 m in long-time robustness for 20 min of continuous walking.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
State-of-the-art human detection methods focus on deep network architectures to achieve higher recognition performance, at the expense of huge computation. However, computational efficiency and ...real-time performance are also important evaluation indicators. This paper presents a fast real-time human detection and flow estimation method using depth images captured by a top-view TOF camera. The proposed algorithm mainly consists of head detection based on local pooling and searching, classification refinement based on human morphological features, and tracking assignment filter based on dynamic multi-dimensional feature. A depth image dataset record with more than 10k entries and departure events with detailed human location annotations is established. Taking full advantage of the distance information implied in the depth image, we achieve high-accuracy human detection and people counting with accuracy of 97.73% and significantly reduce the running time. Experiments demonstrate that our algorithm can run at 23.10 ms per frame on a CPU platform. In addition, the proposed robust approach is effective in complex situations such as fast walking, occlusion, crowded scenes, etc.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Indoor pedestrian localization measurement is a hot topic and is widely used in indoor navigation and unmanned devices. PDR (Pedestrian Dead Reckoning) is a low-cost and independent indoor ...localization method, estimating position of pedestrians independently and continuously. PDR fuses the accelerometer, gyroscope and magnetometer to calculate relative distance from starting point, which is mainly composed of three modules: step detection, stride length estimation and heading calculation. However, PDR is affected by cumulative error and can only work in two-dimensional planes, which makes it limited in practical applications. In this paper, a novel localization method V-PDR is presented, which combines VPR (Visual Place Recognition) and PDR in a loosely coupled way. When there is error between the localization result of PDR and VPR, the algorithm will correct the localization of PDR, which significantly reduces the cumulative error. In addition, VPR recognizes scenes on different floors to correct floor localization due to vertical movement, which extends application scene of PDR from two-dimensional planes to three-dimensional spaces. Extensive experiments were conducted in our laboratory building to verify the performance of the proposed method. The results demonstrate that the proposed method outperforms general PDR method in accuracy and can work in three-dimensional space.
An optical flow-based pedestrian gait modeling method integrating with attitude acquisition is proposed. The proposed method accomplishes online training of the gait model with displacement and ...frequency information whenever steps are detected. The displacement information inferred from optical flow is assigned adaptive weight to suppress outliers that arise from the pedestrian's feet and legs in the images. Moreover, a self-pruning linear regression mechanism is presented in gait modeling process to attenuate the adverse effects of abnormal samples. The experimental results demonstrate that the proposed method can achieve better performance compared with the existing methods in terms of accuracy and efficiency. Furthermore, complex scenario experiments where the textures of the ground changed, were also conducted and the results verified the adaptability of our proposed method.
As the demand for embedded-vision grows, solving large optimization problems in real-time with energy and cost budget is a challenge. We present BAX, a hardware accelerator of bundle adjustment (BA), ...which solves the least-squares problem of state estimation in visual odometry (VO). BAX consists of a frontend and a backend for control and computation, respectively. The frontend generates instructions on-the-fly executed at the backend to perform the BA algorithm. The backend adopts decoupled access/execute (DAE) architecture, which separates the memory access unit (MAU) from the pipeline. The MAU can prefetch vectors and matrices ahead of computations. To further reduce the latency of data reorganization, three transpose-free dataflows are proposed for matrix multiplication operations on the vector processing unit (VPU). Besides, a unified architecture for both forward and backward substitution is designed for matrix decomposition in the linear solver. All the data are stored in 442kB on-chip memory, and the local map is maintained efficiently by the hierarchical graph memory. Compared with the baseline architecture, the processing time is reduced by 53.9% through the above techniques. BAX is implemented in 32-bit floating-point precision with data normalization on FPGA. It completes a full BA in about 63.44ms at 200MHz, consuming 1.12W power. BAX is <inline-formula> <tex-math notation="LaTeX">1.73\times </tex-math></inline-formula> and <inline-formula> <tex-math notation="LaTeX">22.38\times </tex-math></inline-formula> faster than the desktop and embedded CPUs, respectively, and achieves 90% performance of the GPU at much less power consumption.
Global Navigation Satellite System (GNSS) is the most reliable navigation system for location-based applications where accuracy and consistency is an essential requirement. The LSE (least squares ...estimator) has been used since the start of GNSS for position estimation. However; LSE is affected by outliers and errors in GNSS measurements and results in wrong user position. In this paper; we proposed a novel three-phase estimator for enhancing GNSS positioning accuracy in the presence of outliers and errors; relying upon the robust MM estimation theory. In the first phase; a subsampling process is proposed on available observations. IRWLS (iterative reweighted LS) is applied to all subsamples up to a predefined number of observations to obtain a positioning estimate and a scale factor. Secondly; IRWLS is applied up to the convergence point on a set of selected subsamples. The third phase involves the selection of optimum positioning solution having minimum scale factor. An outlier detection and exclusion process is applied on a probabilistic set of outlying observations to maintain the integrity and reliability of the position. Multiple simulated and real scenarios are tested. Results show high accuracy and reliability of the proposed algorithm in challenging environments.
Deep neural networks dominate in the machine learning field. However, deploying deep neural networks on mobile devices requires aggressive compression of models due to huge amounts of parameters. An ...extreme case is to restrict weights to binary values {+1/-1} without much loss of accuracy. This promising method not only reduces hardware overhead of memory and computation, but also improves the performance of network inference. In this work, a flexible architecture for binary weight network acceleration is proposed. The architecture fully exploits the inherent multi-level parallelism of neural networks, resulting in utilization of processing elements over 80% for different layers. In addition, we present efficient data placement and transmission methods in coordination with multi-level parallel processing. The accelerator is implemented using SMIC 40nm technology. It operates at 1.2V and achieves up to 974GOPS/W power efficiency.
•We propose a global single-pass method for content-based image retrieval.•A novel trainable pooling method using graph-based reasoning and attention.•Structural relations can effectively help ...emphasize the key features.•Adopting curriculum design to modify the network training.•The results shows the superiority over other methods on retrieval efficiency and accuracy on popular benchmarks.
Global single-pass methods have shown superior efficiency over local aggregation methods on content-based image retrieval. However, they tend to fail under challenging environments since the structural relations among regions are not exploited. To address this issue, we propose a novel Graph-based Reasoning Attention Pooling with Curriculum Design (GRAP-CD) to improve the network capability through training modification and trainable pooling. GRAP-CD can not only explore relations among salient regions but also gradually train the network to achieve better local minima. The graph-based reasoning layers regard the feature map from the last convolution layer as a graph and construct the structural relations. Then the graph-based attention layer enhances the key information guided by the relations. Besides, a front-end curriculum design is introduced to split the training dataset from simple to complex and train the model step by step, which further helps the GRAP firstly learn the basic feature information from simple samples and then learn to dig the more representative features with hard positive samples. Experimental results on popular benchmarks ROxford and RParis datasets achieve improvement over state-of-the-art global single-pass methods and competitive results with local aggregation methods.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP