Purpose
Four‐dimensional cone‐beam computed tomography (4D CBCT) is developed to reconstruct a sequence of phase‐resolved images, which could assist in verifying the patient's position and offering ...information for cancer treatment planning. However, 4D CBCT images suffer from severe streaking artifacts and noise due to the extreme sparse‐view CT reconstruction problem for each phase. As a result, it would cause inaccuracy of treatment estimation. The purpose of this paper was to develop a new 4D CBCT reconstruction method to generate a series of high spatiotemporal 4D CBCT images.
Methods
Considering the advantage of (DL) on representing structural features and correlation between neighboring pixels effectively, we construct a novel DL‐based method for the 4D CBCT reconstruction. In this study, both a motion‐aware dictionary and a spatially structural 2D dictionary are trained for 4D CBCT by excavating the spatiotemporal correlation among ten phase‐resolved images and the spatial information in each image, respectively. Specifically, two reconstruction models are produced in this study. The first one is the motion‐aware dictionary learning‐based 4D CBCT algorithm, called motion‐aware DL based 4D CBCT (MaDL). The second one is the MaDL equipped with a prior knowledge constraint, called pMaDL. Qualitative and quantitative evaluations are performed using a 4D extended cardiac torso (XCAT) phantom, simulated patient data, and two sets of patient data sets. Several state‐of‐the‐art 4D CBCT algorithms, such as the McKinnon–Bates (MKB) algorithm, prior image constrained compressed sensing (PICCS), and the high‐quality initial image‐guided 4D CBCT reconstruction method (HQI‐4DCBCT) are applied for comparison to validate the performance of the proposed MaDL and prior constraint MaDL (pMaDL) pmadl reconstruction frameworks.
Results
Experimental results validate that the proposed MaDL can output the reconstructions with few streaking artifacts but some structural information such as tumors and blood vessels, may still be missed. Meanwhile, the results of the proposed pMaDL demonstrate an improved spatiotemporal resolution of the reconstructed 4D CBCT images. In these improved 4D CBCT reconstructions, streaking artifacts are suppressed primarily and detailed structures are also restored. Regarding the XCAT phantom, quantitative evaluations indicate that an average of 58.70%, 45.25%, and 40.10% decrease in terms of root‐mean‐square error (RMSE) and an average of 2.10, 1.37, and 1.37 times in terms of structural similarity index (SSIM) are achieved by the proposed pMaDL method when compared with piccs, PICCS, MaDL(2D), and MaDL(2D), respectively. Moreover the proposed pMaDL achieves a comparable performance with HQI‐4DCBCT algorithm in terms of RMSE and SSIM metrics. However, pMaDL has a better ability to suppress streaking artifacts than HQI‐4DCBCT.
Conclusions
The proposed algorithm could reconstruct a set of 4D CBCT images with both high spatiotemporal resolution and detailed features preservation. Moreover the proposed pMaDL can effectively suppress the streaking artifacts in the resultant reconstructions, while achieving an overall improved spatiotemporal resolution by incorporating the motion‐aware dictionary with a prior constraint into the proposed 4D CBCT iterative framework.
Sign language recognition (SLR) has long been plagued by insufficient model representation capabilities. Although current pre-training approaches have alleviated this dilemma to some extent and ...yielded promising performance by employing various pretext tasks on sign pose data, these methods still suffer from two primary limitations: i) Explicit motion information is usually disregarded in previous pretext tasks, leading to partial information loss and limited representation capability. ii) Previous methods focus on the local context of a sign pose sequence, without incorporating the guidance of the global meaning of lexical signs. To this end, we propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information in a self-supervised learning paradigm for SLR. Our framework contains two crucial components, i.e., a motion-aware masked autoencoder (MA) and a momentum semantic alignment module (SA). Specifically, in MA, we introduce an autoencoder architecture with a motion-aware masked strategy to reconstruct motion residuals of masked frames, thereby explicitly exploring dynamic motion cues among sign pose sequences. Moreover, in SA, we embed our framework with global semantic awareness by aligning the embeddings of different augmented samples from the input sequence in the shared latent space. In this way, our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation. Furthermore, we conduct extensive experiments to validate the effectiveness of our method, achieving new state-of-the-art performance on four public benchmarks. The source code are publicly available at https://github.com/sakura/MASA.
Exposure Trajectory Recovery From Motion Blur Zhang, Youjian; Wang, Chaoyue; Maybank, Stephen J. ...
IEEE transactions on pattern analysis and machine intelligence,
2022-Nov.-1, 2022-11-1, 20221101, Volume:
44, Issue:
11
Journal Article
Peer reviewed
Open access
Motion blur in dynamic scenes is an important yet challenging research topic. Recently, deep learning methods have achieved impressive performance for dynamic scene deblurring. However, the motion ...information contained in a blurry image has yet to be fully explored and accurately formulated because: (i) the ground truth of dynamic motion is difficult to obtain; (ii) the temporal ordering is destroyed during the exposure; and (iii) the motion estimation from a blurry image is highly ill-posed. By revisiting the principle of camera exposure, motion blur can be described by the relative motions of sharp content with respect to each exposed position. In this paper, we define exposure trajectories, which represent the motion information contained in a blurry image and explain the causes of motion blur. A novel motion offset estimation framework is proposed to model pixel-wise displacements of the latent sharp image at multiple timepoints. Under mild constraints, our method can recover dense, (non-)linear exposure trajectories, which significantly reduce temporal disorder and ill-posed problems. Finally, experiments demonstrate that the recovered exposure trajectories not only capture accurate and interpretable motion information from a blurry image, but also benefit motion-aware image deblurring and warping-based video extraction tasks. Codes are available on https://github.com/yjzhang96/Motion-ETR .
Moving object segmentation in real-world scenes is of critical significance for many computer vision applications. However, there are many challenges in moving object segmentation. It is difficult to ...distinguish objects with motion degeneracy. Besides, complex scenes and noisy 2D optical flows also effect the result of moving object segmentation. In this paper, to address difficulties caused by motion degeneracy, we analyze the classic motion degeneracy from a new geometric perspective. To identify objects with motion degeneracy, we propose a reprojection cost and an optical flow contrast cost which are fed into the network to enrich motion features. Furthermore, a novel geometric constraint called bidirectional motion constraint is proposed to detect moving objects with weak motion features. In order to tackle more complex scenes, we also introduce a motion-aware architecture to predict instance masks of moving objects. Extensive experiments are conducted on the KITTI dataset, the JNU-UISEE dataset and the KittiMoSeg dataset, and our proposed method achieves excellent performance.
•We design different motion costs to deal with problems caused by motion degeneracy.•We propose a bidirectional motion constraint to identify objects with weak motions.•We introduce a geometric analysis based motion-aware architecture.
Recently, the growing demand for autonomous driving in the industry has led to a lot of interest in 3D object detection, resulting in many excellent 3D object detection algorithms. However, most 3D ...object detectors focus only on a single set of LiDAR points, ignoring their potential ability to improve performance by leveraging the information provided by the consecutive set of LIDAR points. In this paper, we propose a novel 3D object detection method called temporal motion-aware 3D object detection (TM3DOD), which utilizes temporal LiDAR data. In the proposed TM3DOD method, we aggregate LiDAR voxels over time and the current BEV features by generating motion features using consecutive BEV feature maps. First, we present the temporal voxel encoder (TVE), which generates voxel representations by capturing the temporal relationships among the point sets within a voxel. Next, we design a motion-aware feature aggregation network (MFANet), which aims to enhance the current BEV feature representation by quantifying the temporal variation between two consecutive BEV feature maps. By analyzing the differences and changes in the BEV feature maps over time, MFANet captures motion information and integrates it into the current feature representation, enabling more robust and accurate detection of 3D objects. Experimental evaluations on the nuScenes benchmark dataset demonstrate that the proposed TM3DOD method achieved significant improvements in 3D detection performance compared with the baseline methods. Additionally, our method achieved comparable performance to state-of-the-art approaches.
Lossy compression introduces artifacts, and many conventional in-loop filters have been adopted in the AV1 standard to reduce these artifacts. Researchers have explored deep learning-based filters to ...remove artifacts in the compression loop. However, the high computational complexity of CNN-based filters remains a challenge. In this paper, a Texture- and Motion-Aware Perception (TMAP) in-loop filter is proposed to addresses this issue by selectively applying CNNs to texture-rich and high-motion regions, while utilizing non-learning methods to detect these regions. The proposed method introduces a new CNN structure, the Dense-Dual-Field Network (DDFN), which leverages a larger receptive field to enhance the quality of reconstructed frames by incorporating more contextual information. Furthermore, to improve perceptual quality, a novel loss function integrating wavelet-based perceptual information is presented. Experimental results demonstrate the superiority of our proposed models over other lightweight CNN models, and the effectiveness of the perceptual loss function is validated using the VMAF metric.
The discriminative correlation filters-based methods struggle deal with the problem of fast motion and heavy occlusion, the problem can severely degrade the performance of trackers, ultimately ...leading to tracking failures. In this paper, a novel Motion-Aware Correlation Filters (MACF) framework is proposed for online visual object tracking, where a motion-aware strategy based on joint instantaneous motion estimation Kalman filters is integrated into the Discriminative Correlation Filters (DCFs). The proposed motion-aware strategy is used to predict the possible region and scale of the target in the current frame by utilizing the previous estimated 3D motion information. Obviously, this strategy can prevent model drift caused by fast motion. On the base of the predicted region and scale, the MACF detects the position and scale of the target by using the DCFs-based method in the current frame. Furthermore, an adaptive model updating strategy is proposed to address the problem of corrupted models caused by occlusions, where the learning rate is determined by the confidence of the response map. The extensive experiments on popular Object Tracking Benchmark OTB-100, OTB-50 and unmanned aerial vehicles (UAV) video have demonstrated that the proposed MACF tracker performs better than most of the state-of-the-art trackers and achieves a high real-time performance. In addition, the proposed approach can be integrated easily and flexibly into other visual tracking algorithms.
Person re-identification aims to identify the same pedestrians captured by various cameras from different viewpoints in multiple scenarios. Occlusion is the toughest problem for practical ...applications. In video-based ReID tasks, motion information can be easily obtained from sampled frames, and provide discriminative human part representations. However, most motion-based methodologies are designed for video frames which are not suitable for processing single static image input. In this paper, we propose a Motion-Aware Fusion (MAF) network, aiming to acquire motion information from static images in order to improve the performance of ReID tasks. Specifically, a visual adapter is introduced to enable visual feature extraction, either from image or video data. We design a motion consistency task to guide the motion-aware transformer to learn representative human-part motion information and greatly improve the learning quality of features of occluded pedestrians. Extensive experiments on popular holistic, occluded, and video datasets demonstrate the effectiveness of our proposed method. This method outperforms state-of-the-art approaches by improving the mean average precision (mAP) by 1.5% and rank-1 accuracy by 1.2% on the challenging Occluded-REID dataset. At the same time, it surpasses other methods on the MARS dataset with an improvement of 0.2% in mAP and 0.1% in rank-1 accuracy.
In the volleyball game, estimating the 3D pose of the spiker is very valuable for training and analysis, because the spiker’s technique level determines the scoring or not of a round. The development ...of computer vision provides the possibility for the acquisition of the 3D pose. Most conventional pose estimation works are data-dependent methods, which mainly focus on reaching a high level on the dataset with the controllable scene, but fail to get good results in the wild real volleyball competition scene because of the lack of large labelled data, abnormal pose, occlusion and overlap. To refine the inaccurate estimated pose, this paper proposes a motion-aware and data-independent method based on a calibrated multi-camera system for a real volleyball competition scene. The proposed methods consist of three key components: 1) By utilizing the relationship of multi-views, an irrelevant projection based potential joint restore approach is proposed, which refines the wrong pose of one view with the other three views projected information to reduce the influence of occlusion and overlap. 2) Instead of training with a large amount labelled data, the proposed motion-aware method utilizes the similarity of specific motion in sports to achieve construct a spike model. Based on the spike model, joint and trajectory matching is proposed for coarse refinement. 3) To finely refine, a point distribution based posterior decision network is proposed. While expanding the receptive field, the pose estimation task is decomposed into a classification decision problem, which greatly avoids the dependence on a large amount of labelled data. The experimental dataset videos with four synchronous camera views are from a real game, the Game of 2014 Japan Inter High School of Men Volleyball. The experiment result achieves 76.25%, 81.89%, and 86.13% success rate at the 30mm, 50mm, and 70mm error range, respectively. Since the proposed refinement framework is based on a real volleyball competition, it is expected to be applied in the volleyball analysis.