The aim of the presented review is to summarize the literature data on the accuracy and clinical applicability of artificial intelligence (AI) models as a valuable alternative to the current ...guidelines in predicting cardiac resynchronization therapy (CRT) response and phenotyping of patients eligible for CRT implantation. This systematic review was performed according to the PRISMA guidelines. After a search of Scopus, PubMed, Cochrane Library, and Embase databases, 675 records were identified. Twenty supervised (prediction of CRT response) and 9 unsupervised (clustering and phenotyping) AI models were analyzed qualitatively (22 studies, 14,258 patients). Fifty-five percent of AI models were based on retrospective studies. Unsupervised AI models were able to identify clusters of patients with significantly different rates of primary outcome events (death, heart failure event). In comparison to the guideline-based CRT response prediction accuracy of 70%, supervised AI models trained on cohorts with > 100 patients achieved up to 85% accuracy and an AUC of 0.86 in their prediction of response to CRT for echocardiographic and clinical outcomes, respectively. AI models seem to be an accurate and clinically applicable tool in phenotyping of patients eligible for CRT implantation and predicting potential responders. In the future, AI may help to increase CRT response rates to over 80% and improve clinical decision-making and prognosis of the patients, including reduction of mortality rates. However, these findings must be validated in randomized controlled trials.
The aim of this work is to detect and automatically generate high-level explanations of anomalous events in video. Understanding the cause of an anomalous event is crucial as the required response is ...dependant on its nature and severity. Recent works typically use object or action classifier to detect and provide labels for anomalous events. However, this constrains detection systems to a finite set of known classes and prevents generalisation to unknown objects or behaviours. Here we show how to robustly detect anomalies without the use of object or action classifiers yet still recover the high level reason behind the event. We make the following contributions: (1) a method using saliency maps to decouple the explanation of anomalous events from object and action classifiers, (2) show how to improve the quality of saliency maps using a novel neural architecture for learning discrete representations of video by predicting future frames and (3) beat the state-of-the-art anomaly explanation methods by 60% on a subset of the public benchmark X-MAN dataset 25.
Abstract The recent increase in popularity of volumetric representations for scene reconstruction and novel view synthesis has put renewed focus on animating volumetric content at high visual quality ...and in real‐time. While implicit deformation methods based on learned functions can produce impressive results, they are ‘black boxes’ to artists and content creators, they require large amounts of training data to generalize meaningfully, and they do not produce realistic extrapolations outside of this data. In this work, we solve these issues by introducing a volume deformation method which is real‐time even for complex deformations, easy to edit with off‐the‐shelf software and can extrapolate convincingly. To demonstrate the versatility of our method, we apply it in two scenarios: physics‐based object deformation and telepresence where avatars are controlled using blendshapes. We also perform thorough experiments showing that our method compares favourably to both volumetric approaches combined with implicit deformation and methods based on mesh deformation.
We introduce the \method, an ultra-efficient approach for monocular 3D object reconstruction. Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D ...scenes from multiple images. We apply Gaussian Splatting to monocular reconstruction by learning a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS. Our main innovation is the surprisingly straightforward design of this network, which, using 2D operators, maps the input image to one 3D Gaussian per pixel. The resulting set of Gaussians thus has the form an image, the Splatter Image. We further extend the method take several images as input via cross-view attention. Owning to the speed of the renderer (588 FPS), we use a single GPU for training while generating entire images at each iteration to optimize perceptual metrics like LPIPS. On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works. Code, models, demo and more results are available at https://szymanowiczs.github.io/splatter-image.
X-MAN: Explaining multiple sources of anomalies in video Szymanowicz, Stanislaw; Charles, James; Cipolla, Roberto
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),
2021-June
Conference Proceeding
Odprti dostop
Our objective is to detect anomalies in video while also automatically explaining the reason behind the detector's response. In a practical sense, explainability is crucial for this task as the ...required response to an anomaly depends on its nature and severity. However, most leading methods (based on deep neural networks) are not interpretable and hide the decision making process in uninterpretable feature representations. In an effort to tackle this problem we make the following contributions: (1) we show how to build interpretable feature representations suitable for detecting anomalies with state of the art performance, (2) we propose an interpretable probabilistic anomaly detector which can describe the reason behind it's response using high level concepts, (3) we are the first to directly consider object interactions for anomaly detection and (4) we propose a new task of explaining anomalies and release a large dataset for evaluating methods on this task. Our method competes well with the state of the art on public datasets while also providing anomaly explanation based on objects and their interactions.
We present Viewset Diffusion, a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. We note that there exists a one-to-one mapping between viewsets, ...i.e., collections of several 2D views of an object, and 3D models. Hence, we train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally corresponding 3D models, thus generating those too. We fit a diffusion model to a large number of viewsets for a given category of objects. The resulting generator can be conditioned on zero, one or more input views. Conditioned on a single view, it performs 3D reconstruction accounting for the ambiguity of the task and allowing to sample multiple solutions compatible with the input. The model performs reconstruction efficiently, in a feed-forward manner, and is trained using only rendering losses using as few as three views per viewset. Project page: szymanowiczs.github.io/viewset-diffusion.
Delivering immersive, 3D experiences for human communication requires a method to obtain 360 degree photo-realistic avatars of humans. To make these experiences accessible to all, only commodity ...hardware, like mobile phone cameras, should be necessary to capture the data needed for avatar creation. For avatars to be rendered realistically from any viewpoint, we require training images and camera poses from all angles. However, we cannot rely on there being trackable features in the foreground or background of all images for use in estimating poses, especially from the side or back of the head. To overcome this, we propose a novel landmark detector trained on synthetic data to estimate camera poses from 360 degree mobile phone videos of a human head for use in a multi-stage optimization process which creates a photo-realistic avatar. We perform validation experiments with synthetic data and showcase our method on 360 degree avatars trained from mobile phone videos.
In this paper, we propose Flash3D, a method for scene reconstruction and
novel view synthesis from a single image which is both very generalisable and
efficient. For generalisability, we start from a ..."foundation" model for
monocular depth estimation and extend it to a full 3D shape and appearance
reconstructor. For efficiency, we base this extension on feed-forward Gaussian
Splatting. Specifically, we predict a first layer of 3D Gaussians at the
predicted depth, and then add additional layers of Gaussians that are offset in
space, allowing the model to complete the reconstruction behind occlusions and
truncations. Flash3D is very efficient, trainable on a single GPU in a day, and
thus accessible to most researchers. It achieves state-of-the-art results when
trained and tested on RealEstate10k. When transferred to unseen datasets like
NYU it outperforms competitors by a large margin. More impressively, when
transferred to KITTI, Flash3D achieves better PSNR than methods trained
specifically on that dataset. In some instances, it even outperforms recent
methods that use multiple views as input. Code, models, demo, and more results
are available at https://www.robots.ox.ac.uk/~vgg/research/flash3d/.
The aim of this work is to detect and automatically generate high-level explanations of anomalous events in video. Understanding the cause of an anomalous event is crucial as the required response is ...dependant on its nature and severity. Recent works typically use object or action classifier to detect and provide labels for anomalous events. However, this constrains detection systems to a finite set of known classes and prevents generalisation to unknown objects or behaviours. Here we show how to robustly detect anomalies without the use of object or action classifiers yet still recover the high level reason behind the event. We make the following contributions: (1) a method using saliency maps to decouple the explanation of anomalous events from object and action classifiers, (2) show how to improve the quality of saliency maps using a novel neural architecture for learning discrete representations of video by predicting future frames and (3) beat the state-of-the-art anomaly explanation methods by 60\% on a subset of the public benchmark X-MAN dataset.