Digital video stabilization aims to remove camera motion jitters through software implementation. The first step of the classical video stabilization methodology is called camera motion estimation, ...which is usually performed using only RGB frames of the unstable video. Despite recent advances in camera motion estimation strategies, methods classified as two-dimensional are still not properly evaluated, even though it is well known that motion estimation is a crucial step for classical approaches to video stabilization. The main purpose of this work is to draw attention to two-dimensional camera motion estimation assessment and reinforce its importance on video stabilization progress. We proposed a new approach to perform this evaluation using camera motion fields in a pixel-by-pixel comparison and demonstrated through experimental results that our metrics are reliable for diverse scenarios comparing them to image similarity metrics. In addition, we showed and analyzed the results of our metrics for a global and a local method of camera motion estimation. We believe that our assessment and study presented in this work is an important starting point for a more rigorous analysis of this task. In addition, this can be a foundation for coming 2D camera motion estimation methods based on deep learning.
For long-term vision-based cable force monitoring, it is essential to fully consider different types of challenges, such as complex backgrounds in the field of view, illumination changes, occlusion, ...camera motion, and real-time realization. To address these challenges, this study proposes a vision-based, robust real-time cable force measurement method. In terms of innovation, three unique aspects were incorporated. First, to solve the difficulty of cable edge recognition caused by complex backgrounds, a linear feature detection operator called edge drawing lines was optimized by setting screening strategy based on the slope and line segments length. Second, the average of bilateral edge displacements was used as the final value to offset the displacement deviation caused by the illumination change. Third, to overcome obstacles to identifying the cable frequency caused by the camera motion, an interference frequency elimination method without reference targets or additional sensors was proposed. Furthermore, the integrated cable force calculation method was developed into a real-time measurement software. Finally, the proposed method was applied to a long-span cable-stayed bridge, where the camera was vibrated by the bridge deck or the wind significantly. The reliability of the proposed method was proved by comparing the results obtained with those achieved by accelerometers.
Display omitted
•A novel 6-DOF camera motion correction method is proposed using only IMU sensors.•The accuracy and robustness are verified under different distances and focal lengths.•The motion ...correction ratio is statistically analysed and reaches approximately 80 %.•Pixel movements are caused mainly by camera rotation, especially at long distances.•Translation-induced pixel movement is inversely correlated with the object distance.
Environmental conditions such as wind and ground traffic introduce motion in camera measurement systems and affect measurement accuracy. Conventional camera motion correction methods track static reference points with one or multiple cameras, reducing applicability. This study proposes a novel 6-degree-of-freedom (DOF) camera motion correction method using only an inertial measurement unit (IMU) sensor. A Kalman filter is adopted as a data fusion method to estimate the camera orientation and translation using IMU data. Six pinhole camera models are built to evaluate and correct 6-DOF camera motions. The motion correction efficiency and robustness are tested for different object distances and focal lengths of optical lenses. The motion correction ratio is statistically analysed and reaches approximately 80%. The object distance has little effect on the motion correction ratio. The rotation-induced pixel movement is independent of the object distance. More than 90% of the pixel movement noise is caused by camera rotation. The translation-induced pixel movement is inversely correlated with the object distance.
The vibration data are quite important for structural health monitoring (SHM). This paper proposed a novel method, to adaptively estimate video motions of the structure in subpixel accuracy, without ...attaching any targets. The proposed method includes three steps. In the first step, to remove outliers and simultaneously preserve feature points, the Gaussian range kernel is used along with the Gaussian spatial kernel, and calculated by the polynomial fitting and recursive integrals computing. In the second step, to calculate video pixel motions varied with spatial coordinates in the region of interest (ROI) for testing, the ROI is divided into multiple grid cells. Motions in each grid cell are modeled as local spatially-variant homography matrices, and their spatial consistency are enhanced by a shape-preserving constraint. The third step is to enhance both spatial and temporal correlations of the calculated homography matrices, achieved by the data term and the smoothness term in both space and time domains. The superiority of the proposed method over traditional methods was validated in several case studies for analyzing structural motions. Among the comparisons, the proposed method can produce image denoising, camera motions, structural motions, and structural modal information in subpixel accuracy, and with the best accuracy.
•Noncontact cable tension force estimation using an integrated vision and inertial measurement system placed on a vibrating reference point.•Development of a contour-based algorithm to estimate cable ...displacement.•Improvement of the accuracy of cable tension force estimation by compensating the reference point vibration.
In this study, a noncontact cable tension force estimation technique was developed using an integrated vision and inertial measurement system (VIS) installed at a reference point, and the movement of the VIS was explicitly considered. Cable displacement was first estimated by applying a proposed contour-based algorithm to the vision measurements. Thereafter, the movement of the VIS at the reference point was estimated from the inertial measurement system and was used to compensate the error in the previously estimated cable displacement. Finally, the cable tension force was estimated using the compensated cable displacement. The feasibility of the proposed technique was validated through a laboratory test on a full-scale pedestrian bridge cable and a field test on a single-pylon cable-stayed pedestrian bridge. Overall, the proposed technique estimated the cable tension forces reliably with less than 1.3% discrepancy compared to those estimated by accelerometers.
Planes and edges are attractive features for simultaneous localization and mapping (SLAM) in indoor environments because they can be reliably extracted and are robust to illumination changes. ...However, it remains a challenging problem to seamlessly fuse two different kinds of features to avoid degeneracy and accurately estimate the camera motion. In this article, a plane-edge-SLAM system using an RGB-D sensor is developed to address the seamless fusion of planes and edges. Constraint analysis is first performed to obtain a quantitative measure of how the planes constrain the camera motion estimation. Then, using the results of the constraint analysis, an adaptive weighting algorithm is elaborately designed to achieve seamless fusion. Through the fusion of planes and edges, the solution to motion estimation is fully constrained, and the problem remains well-posed in all circumstances. In addition, a probabilistic plane fitting algorithm is proposed to fit a plane model to the noisy 3-D points. By exploiting the error model of the depth sensor, the proposed plane fitting is adaptive to various measurement noises corresponding to different depth measurements. As a result, the estimated plane parameters are more accurate and robust to the points with large uncertainties. Compared with the existing plane fitting methods, the proposed method definitely benefits the performance of motion estimation. The results of extensive experiments on public data sets and in real-world indoor scenes demonstrate that the plane-edge-SLAM system can achieve high accuracy and robustness. Note to Practitioners -This article is motivated by the robust localization and mapping for mobile robots. We suggest a novel simultaneous localization and mapping (SLAM) approach fusing the plane and edge features in indoor scenes (plane-edge-SLAM). This newly proposed approach works well in the textureless or dark scenes and is robust to the sensor noise. The experiments are carried out in various indoor scenes for mobile robots, and the results demonstrate the robustness and effectiveness of the proposed framework. In future work, we will address the fusion of other high-level features (for example, 3-D lines) and the active exploration of the environments.
•The influence of camera motion on stereo-DIC is investigated and the effects of pair motion and relative motion are examined separately using precisely controllable simulated experiments.•By ...integrating into the stereo matching, the speckle-based compensation method for relative motion is proposed and the accuracy of the compensation methods is analyzed in detail.•After relative motion compensation, most of the systematic errors of in-plane displacements are less than 0.01 mm and the systematic errors of strain are less than 50 microstrain.
Stereo-digital image correlation (stereo-DIC) is now a standard technique for determination of the mechanical properties of materials and structures. In stereo-DIC, cameras are assumed to be motionless after camera calibration, so the three-dimensional (3D) reconstruction can be implemented using pre-calibrated parameters. However, this assumption is not true in some situations, such as drop test, seismic shaking table test and non-laboratory environment. Due to the presence of ground shaking or wind blowing, it's almost impossible to avoid camera motion in these experiments even if mechanical fixing is adopted. Camera motion during the experiment can undoubtedly introduce significant errors on measured results. Generally, camera motion can be divided into pair motion and relative motion. The influence of these two kinds of motion on stereo-DIC measurement is different and worth of separate study. Keep this in mind, in this paper, the influence of camera motion on stereo-DIC is investigated and the effects of pair motion and relative motion are examined separately using precisely controllable simulated experiments. Specifically, by integrating into the stereo matching, the speckle-based compensation method for relative motion is proposed and the accuracy of the compensation methods is analyzed in detail. The reduction of camera motion-induced systematic errors will be helpful for the further applications of stereo-DIC in non-laboratory environments and engineering fields.
Event cameras respond to scene dynamics and provide signals naturally suitable for motion estimation with advantages, such as high dynamic range. The emerging field of event-based vision motivates a ...revisit of fundamental computer vision tasks related to motion, such as optical flow and depth estimation. However, state-of-the-art event-based optical flow methods tend to originate in frame-based deep-learning methods, which require several adaptations (data conversion, loss function, etc.) as they have very different properties. We develop a principled method to extend the Contrast Maximization framework to estimate dense optical flow, depth, and ego-motion from events alone. The proposed method sensibly models the space-time properties of event data and tackles the event alignment problem. It designs the objective function to prevent overfitting, deals better with occlusions, and improves convergence using a multi-scale approach. With these key elements, our method ranks first among unsupervised methods on the MVSEC benchmark and is competitive on the DSEC benchmark. Moreover, it allows us to simultaneously estimate dense depth and ego-motion, exposes the limitations of current flow benchmarks, and produces remarkable results when it is transferred to unsupervised learning settings. Along with various downstream applications shown, we hope the proposed method becomes a cornerstone on event-based motion-related tasks. Code is available at https://github.com/tub-rip/event_based_optical_flow
Multi-object tracking (MOT) detects multiple targets in an image and assigns a unique identifier to each target. However, challenges such as rapid motion, occlusion, and camera motion in the tracking ...scene may lead to identity switches (IDs) and missing trajectory problems, which degrade the performance of the tracker. To address these issues, this paper presents an MOT algorithm based on an interactive attention network and adaptive trajectory reconnection. First, an interactive attention network was created to learn the features for two different tasks of detection and tracking to alleviate feature conflicts in order to extract sufficient feature information. A new cost matrix was then designed to fuse the motion and feature information, thereby reducing the number of IDs. Meanwhile, the extreme gradient boosting reconnection module was used to achieve adaptive trajectory reconnection and reduce missing trajectories. The proposed algorithm achieved 61.5% and 55.4% HOTA using the standard MOT17 and MOT20 datasets, respectively. In comparison to FairMOT, our algorithm showcased notable enhancements of 3% and 1.6% on these datasets. Furthermore, when compared to state-of-the-art algorithms, the proposed algorithm demonstrated superior tracking performance.
Implicit scene representations have recently shown promising results in photo-realistic 3D reconstruction and view synthesis based on calibrated views. However, their applications face several ...challenges, including unknown camera pose, boundary ambiguity, and observation noise. This paper proposes a novel online scene representation method that simultaneously learns to represent the target scene and estimates the camera poses from an RGB-D stream. An implicit scene representation function built with scale-encoded cascaded grids is proposed to represent scenes online from incremental observations. This implicit function is optimized in a reparameterized domain that provides defined boundaries. In this reparameterized domain, the cascaded grids are progressively distilled under geometric and photometric supervision to improve their model capacity and geometry accuracy. A radiance field deblurring module based on the physical imaging process is further proposed to restore a photo-realistic reconstruction against camera motion blur, which is the main component of the observation noise. The proposed method can produce sharp and photo-realistic representations of scenes under various shooting conditions without known camera poses. Experiments on multiple datasets have demonstrated the effectiveness of the proposed method in improving view synthesis and camera tracking results for online scene representation tasks.