Information about the 3D shape and motion of tissue surfaces at the surgical site during minimally invasive surgery is important for providing metric measurements that enable the deployment of ...image-guidance and enhanced robotic control. This article presents a scene flow algorithm that recovers the deformation and 3D structure of the surgical field-of-view from stereoscopic images by propagating information starting from a sparse set of candidate seed matches. By imposing spatial and temporal constraints the proposed algorithm is able to reconstruct dense 3D scene flow accurately and efficiently. Validation is performed using simulation data to evaluate the method against varying levels of image noise and results are also presented for benchmark phantom model data. The practical value of proposed method is shown by qualitative results for in vivo videos from robotic assisted procedures.
Recovering 3D geometry from cameras in underwater applications involves the Refractive Structure-from-Motion problem where the non-linear distortion of light induced by a change of medium density ...invalidates the single viewpoint assumption. The pinhole-plus-distortion camera projection model suffers from a systematic geometric bias since refractive distortion depends on object distance. This leads to inaccurate camera pose and 3D shape estimation. To account for refraction, it is possible to use the axial camera model or to explicitly consider one or multiple parallel refractive interfaces whose orientations and positions with respect to the camera can be calibrated. Although it has been demonstrated that the refractive camera model is well-suited for underwater imaging, Refractive Structure-from-Motion remains particularly difficult to use in practice when considering the seldom studied case of a camera with a flat refractive interface. Our method applies to the case of underwater imaging systems whose entrance lens is in direct contact with the external medium. By adopting the refractive camera model, we provide a succinct derivation and expression for the refractive fundamental matrix and use this as the basis for a novel two-view reconstruction method for underwater imaging. For validation we use synthetic data to show the numerical properties of our method and we provide results on real data to demonstrate its practical application within laboratory settings and for medical applications in fluid-immersed endoscopy. We demonstrate our approach outperforms classic two-view Structure-from-Motion method relying on the pinhole-plus-distortion camera model.
Purpose
Concentric tube robots are composed of multiple concentric, pre-curved, super-elastic, telescopic tubes that are compliant and have a small diameter suitable for interventions that must be ...minimally invasive like fetal surgery. Combinations of rotation and extension of the tubes can alter the robot’s shape but the inverse kinematics are complex to model due to the challenge of incorporating friction and other tube interactions or manufacturing imperfections. We propose a model-free reinforcement learning approach to form the inverse kinematics solution and directly obtain a control policy.
Method
Three exploration strategies are shown for deep deterministic policy gradient with hindsight experience replay for concentric tube robots in simulation environments. The aim is to overcome the joint to Cartesian sampling bias and be scalable with the number of robotic tubes. To compare strategies, evaluation of the trained policy network to selected Cartesian goals and associated errors are analyzed. The learned control policy is demonstrated with trajectory following tasks.
Results
Separation of extension and rotation joints for Gaussian exploration is required to overcome Cartesian sampling bias. Parameter noise and Ornstein–Uhlenbeck were found to be optimal strategies with less than 1 mm error in all simulation environments. Various trajectories can be followed with the optimal exploration strategy learned policy at high joint extension values. Our inverse kinematics solver in evaluation has 0.44 mm extension and
0
.
3
∘
rotation error.
Conclusion
We demonstrate the feasibility of effective model-free control for concentric tube robots. Directly using the control policy, arbitrary trajectories can be followed and this is an important step towards overcoming the challenge of concentric tube robot control for clinical use in minimally invasive interventions.
Colonoscopy is the gold standard for colon cancer screening though some polyps are still missed, thus preventing early disease detection and treatment. Several computational systems have been ...proposed to assist polyp detection during colonoscopy but so far without consistent evaluation. The lack of publicly available annotated databases has made it difficult to compare methods and to assess if they achieve performance levels acceptable for clinical use. The Automatic Polyp Detection sub-challenge, conducted as part of the Endoscopic Vision Challenge (http://endovis.grand-challenge.org) at the international conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) in 2015, was an effort to address this need. In this paper, we report the results of this comparative evaluation of polyp detection methods, as well as describe additional experiments to further explore differences between methods. We define performance metrics and provide evaluation databases that allow comparison of multiple methodologies. Results show that convolutional neural networks are the state of the art. Nevertheless, it is also demonstrated that combining different methodologies can lead to an improved overall performance.
Instrument detection, pose estimation, and tracking in surgical videos are an important vision component for computer-assisted interventions. While significant advances have been made in recent ...years, articulation detection is still a major challenge. In this paper, we propose a deep neural network for articulated multi-instrument 2-D pose estimation, which is trained on detailed annotations of endoscopic and microscopic data sets. Our model is formed by a fully convolutional detection-regression network. Joints and associations between joint pairs in our instrument model are located by the detection subnetwork and are subsequently refined through a regression subnetwork. Based on the output from the model, the poses of the instruments are inferred using maximum bipartite graph matching. Our estimation framework is powered by deep learning techniques without any direct kinematic information from a robot. Our framework is tested on single-instrument RMIT data, and also on multi-instrument EndoVis and in vivo data with promising results. In addition, the data set annotations are publicly released along with our code and model.
Purpose:
We tackle the problem of online surgical phase recognition in laparoscopic procedures, which is key in developing context-aware supporting systems. We propose a novel approach to take ...temporal context in surgical videos into account by precise modeling of temporal neighborhoods.
Methods:
We propose a two-stage model to perform phase recognition. A CNN model is used as a feature extractor to project RGB frames into a high-dimensional feature space. We introduce a novel paradigm for surgical phase recognition which utilizes graph neural networks to incorporate temporal information. Unlike recurrent neural networks and temporal convolution networks, our graph-based approach offers a more generic and flexible way for modeling temporal relationships. Each frame is a node in the graph, and the edges in the graph are used to define temporal connections among the nodes. The flexible configuration of temporal neighborhood comes at the price of losing temporal order. To mitigate this, our approach takes temporal orders into account by encoding frame positions, which is important to reliably predict surgical phases.
Results:
Experiments are carried out on the public Cholec80 dataset that contains 80 annotated videos. The experimental results highlight the superior performance of the proposed approach compared to the state-of-the-art models on this dataset.
Conclusion:
A novel approach for formulating video-based surgical phase recognition is presented. The results indicate that temporal information can be incorporated using graph-based models, and positional encoding is important to efficiently utilize temporal information. Graph networks open possibilities to use evidence theory for uncertainty analysis in surgical phase recognition.
Introduction
Robot-assisted surgery is becoming increasingly adopted by multiple surgical specialties. There is evidence of inherent risks of utilising new technologies that are unfamiliar early in ...the learning curve. The development of standardised and validated training programmes is crucial to deliver safe introduction. In this review, we aim to evaluate the current evidence and opportunities to integrate novel technologies into modern digitalised robotic training curricula.
Methods
A systematic literature review of the current evidence for novel technologies in surgical training was conducted online and relevant publications and information were identified. Evaluation was made on how these technologies could further enable digitalisation of training.
Results
Overall, the quality of available studies was found to be low with current available evidence consisting largely of expert opinion, consensus statements and small qualitative studies. The review identified that there are several novel technologies already being utilised in robotic surgery training. There is also a trend towards standardised validated robotic training curricula. Currently, the majority of the validated curricula do not incorporate novel technologies and training is delivered with more traditional methods that includes centralisation of training services with wet laboratories that have access to cadavers and dedicated training robots.
Conclusions
Improvements to training standards and understanding performance data have good potential to significantly lower complications in patients. Digitalisation automates data collection and brings data together for analysis. Machine learning has potential to develop automated performance feedback for trainees. Digitalised training aims to build on the current gold standards and to further improve the ‘continuum of training’ by integrating PBP training, 3D-printed models, telementoring, telemetry and machine learning.
Purpose
Bile duct injury is a significant problem in laparoscopic cholecystectomy and can have grave consequences for patient outcomes. Automatic identification of the critical structures (cystic ...duct and cystic artery) could potentially reduce complications during surgery by helping the surgeon establish Critical View of Safety, or eventually may even provide real time intra-operative guidance.
Methods
A computer vision model was trained to identify the critical structures. Label relaxation enabled the model to cope with ambiguous spatial extent and high annotation variability. Pseudo-label self-supervision allowed the model to use unlabelled data, which can be particularly beneficial when scarce labelled data is available for training. Intrinsic variability in annotations was assessed across several annotators, quantifying the extent of annotation ambiguity and setting a baseline for model accuracy.
Results
Using 3050 labelled and 3682 unlabelled cholecystectomy frames, the model achieved an IoU of 65% and presence detection F1 score of 75%. Inter-annotator IoU agreement was 70%, demonstrating the model was near human-level agreement on average in this dataset. The model’s outputs were validated by three expert surgeons, who confirmed that its outputs were accurate and promising for future usage.
Conclusion
Identification of critical structures can achieve high accuracy, and is a promising step towards computer-assisted intervention in addition to potential applications in analytics and education. High accuracy and surgeon approval is maintained when detecting the structures separately as distinct classes. Future work will focus on guaranteeing safe identification of critical anatomy, including the bile duct, and validating the performance of automated approaches.
Purpose
Surgical workflow estimation techniques aim to divide a surgical video into temporal segments based on predefined surgical actions or objectives, which can be of different granularity such as ...steps or phases. Potential applications range from real-time intra-operative feedback to automatic post-operative reports and analysis. A common approach in the literature for performing automatic surgical phase estimation is to decouple the problem into two stages: feature extraction from a single frame and temporal feature fusion. This approach is performed in two stages due to computational restrictions when processing large spatio-temporal sequences.
Methods
The majority of existing works focus on pushing the performance solely through temporal model development. Differently, we follow a data-centric approach and propose a training pipeline that enables models to maximise the usage of existing datasets, which are generally used in isolation. Specifically, we use dense phase annotations available in
Cholec80
, and sparse scene (i.e., instrument and anatomy) segmentation annotation available in
CholecSeg8k
in less than 5% of the overlapping frames. We propose a simple multi-task encoder that effectively fuses both streams, when available, based on their importance and jointly optimise them for performing accurate phase prediction.
Results and conclusion
We show that with a small fraction of scene segmentation annotations, a relatively simple model can obtain comparable results than previous state-of-the-art and more complex architectures when evaluated in similar settings. We hope that this data-centric approach can encourage new research directions where data, and how to use it, plays an important role along with model development.