Synthetic datasets, for which we propose the term synthsets, are not a novelty but have become a necessity. Although they have been used in computer vision since 1989, helping to solve the problem of ...collecting a sufficient amount of annotated data for supervised machine learning, intensive development of methods and techniques for their generation belongs to the last decade. Nowadays, the question shifts from whether you should use synthetic datasets to how you should optimally create them. Motivated by the idea of discovering best practices for building synthetic datasets to represent dynamic environments (such as traffic, crowds, and sports), this study provides an overview of existing synthsets in the computer vision domain. We have analyzed the methods and techniques of synthetic datasets generation: from the first low-res generators to the latest generative adversarial training methods, and from the simple techniques for improving realism by adding global noise to those meant for solving domain and distribution gaps. The analysis extracts nine unique but potentially intertwined methods and reveals the synthsets generation diagram, consisting of 17 individual processes that synthset creators should follow and choose from, depending on the specific requirements of their task.
Human Action Recognition (HAR) is a challenging task used in sports such as volleyball, basketball, soccer, and tennis to detect players and recognize their actions and teams' activities during ...training, matches, warm-ups, or competitions. HAR aims to detect the person performing the action on an unknown video sequence, determine the action's duration, and identify the action type. The main idea of HAR in sports is to monitor a player's performance, that is, to detect the player, track their movements, recognize the performed action, compare various actions, compare different kinds and skills of acting performances, or make automatic statistical analysis.
As an action that can occur in the sports field refers to a set of physical movements performed by a player in order to complete a task using their body or interacting with objects or other persons, actions can be of different complexity. Because of that, a novel systematization of actions based on complexity and level of performance and interactions is proposed.
The overview of HAR research focuses on various methods performed on publicly available datasets, including actions of everyday activities. That is just a good starting point; however, HAR is increasingly represented in sports and is becoming more directed towards recognizing similar actions of a particular sports domain. Therefore, this paper presents an overview of HAR applications in sports primarily based on Computer Vision as the main contribution, along with popular publicly available datasets for this purpose.
Machine Learning, Human Action Recognition, Action systematization, Sports Dataset, Human Action Recognition in Sports, Sport.
Due to a growing number of people who carry out various adrenaline activities or adventure tourism and stay in the mountains and other inaccessible places, there is an increasing need to organize a ...search and rescue operation (SAR) to provide assistance and health care to the injured. The goal of SAR operation is to search the largest area of the territory in the shortest time possible and find a lost or injured person. Today, drones (UAVs or drones) are increasingly involved in search operations, as they can capture a large, controlled area in a short amount of time. However, a detailed examination of a large amount of recorded material remains a problem. Even for an expert, it is not easy to find searched people who are relatively small considering the area where they are, often sheltered by vegetation or merged with the ground and in unusual positions due to falls, injuries, or exhaustion. Therefore, the automatic detection of persons and objects in images/videos taken by drones in these operations is very significant. In this paper, the reliability of existing state-of-the-art detectors such as Faster R-CNN, YOLOv4, RetinaNet, and Cascade R-CNN on a VisDrone benchmark and custom-made dataset SARD build to simulate rescue scenes was investigated. After training the models on selected datasets, detection results were compared. Because of the high speed and accuracy and the small number of false detections, the YOLOv4 detector was chosen for further examination. YOLOv4 model results related to different network sizes, different detection accuracies, and transfer learning settings were analyzed. The model robustness to weather conditions and motion blur were also investigated. The paper proposes a model that can be used in SAR operations because of the excellent results in detecting people in search and rescue scenarios.
In this paper, we present automatic, deep-learning methods for pipeline detection in underwater environments. Seafloor pipelines are critical infrastructure for oil and gas transport. The inspection ...of those pipelines is required to verify their integrity and determine the need for maintenance. Underwater conditions present a harsh environment that is challenging for image recognition due to light refraction and absorption, poor visibility, scattering, and attenuation, often causing poor image quality. Modern machine-learning object detectors utilize Convolutional Neural Network (CNN), requiring a training dataset of sufficient quality. In the paper, six different deep-learning CNN detectors for underwater object detection were trained and tested: five are based on the You Only Look Once (YOLO) architectures (YOLOv4, YOLOv4-Tiny, CSP-YOLOv4, YOLOv4@Resnet, YOLOv4@DenseNet), and one on the Faster Region-based CNN (RCNN) architecture. The models' performances were evaluated in terms of detection accuracy, mean average precision (mAP), and processing speed measured with the Frames Per Second (FPS) on a custom dataset containing underwater pipeline images. In the study, the YOLOv4 outperformed other models for underwater pipeline object detection resulting in an mAP of 94.21% with the ability to detect objects in real-time. Based on the literature review, this is one of the pioneering works in this field.
NANOG is an important stem cell transcription factor involved in human development and cancerogenesis. Its expression is complex and regulated on different levels. Moreover, NANOG protein might ...regulate hundreds of target genes at the same time. NANOG is crucial for preimplantation development phase and progressively decreases during embryonic stem cells differentiation, thus regulating embryonic and fetal development. Postnatally, NANOG is undetectable or expressed in very low amounts in the majority of human tissues. NANOG re-expression can be detected during cancerogenesis, already in precancerous lesions, with increasing levels of NANOG in high grade dysplasia. NANOG is believed to enable cancer cells to obtain stem-cell like properties, which are believed to be the source of expanding growth, tumor maintenance, metastasis formation, and tumor relapse. High NANOG expression in cancer is frequently associated with advanced stage, poor differentiation, worse overall survival, and resistance to treatment, and is therefore a promising prognostic and predictive marker. We summarize the current knowledge on the role of NANOG in cancerogenesis and development, including our own experience. We provide a critical overview of NANOG as a prognostic and diagnostic factor, including problems regarding its regulation and detection.
Impact statement
NANOG has emerged as a key stem cell transcription factor in normal development and cancerogenesis. It is generally regarded as a good prognostic and predictive factor in various human cancers. It is less known that it is expressed already at precancerous stages in various organs, suggesting that finally an ideal candidate diagnostic marker has been discovered, enabling to distinguish between true dysplasia and reactive atypia. NANOG regulation is complex, and new insights into our understanding of its regulation might provide important information for future development in a broad field of two entirely different processes, i.e. normal development and cancerogenesis, showing how a physiologic mechanism can be used and abused, transforming itself into a key mechanism of disease development and progression.
Global terrorist threats and illegal migration have intensified concerns for the security of citizens, and every effort is made to exploit all available technological advances to prevent adverse ...events and protect people and their property. Due to the ability to use at night and in weather conditions where RGB cameras do not perform well, thermal cameras have become an important component of sophisticated video surveillance systems. In this paper, we investigate the task of automatic person detection in thermal images using convolutional neural network models originally intended for detection in RGB images. We compare the performance of the standard state-of-the-art object detectors such as Faster R-CNN, SSD, Cascade R-CNN, and YOLOv3, that were retrained on a dataset of thermal images extracted from videos that simulate illegal movements around the border and in protected areas. Videos are recorded at night in clear weather, rain, and in the fog, at different ranges, and with different movement types. YOLOv3 was significantly faster than other detectors while achieving performance comparable with the best, so it was used in further experiments. We experimented with different training dataset settings in order to determine the minimum number of images needed to achieve good detection results on test datasets. We achieved excellent detection results with respect to average accuracy for all test scenarios although a modest set of thermal images was used for training. We test our trained model on different well known and widely used thermal imaging datasets as well. In addition, we present the results of the recognition of humans and animals in thermal images, which is particularly important in the case of sneaking around objects and illegal border crossings. Also, we present our original thermal dataset used for experimentation that contains surveillance videos recorded at different weather and shooting conditions.
Player pose estimation is particularly important for sports because it provides more accurate monitoring of athlete movements and performance, recognition of player actions, analysis of techniques, ...and evaluation of action execution accuracy. All of these tasks are extremely demanding and challenging in sports that involve rapid movements of athletes with inconsistent speed and position changes, at varying distances from the camera with frequent occlusions, especially in team sports when there are more players on the field. A prerequisite for recognizing the player’s actions on the video footage and comparing their poses during the execution of an action is the detection of the player’s pose in each element of an action or technique. First, a 2D pose of the player is determined in each video frame, and converted into a 3D pose, then using the tracking method all the player poses are grouped into a sequence to construct a series of elements of a particular action. Considering that action recognition and comparison depend significantly on the accuracy of the methods used to estimate and track player pose in real-world conditions, the paper provides an overview and analysis of the methods that can be used for player pose estimation and tracking using a monocular camera, along with evaluation metrics on the example of handball scenarios. We have evaluated the applicability and robustness of 12 selected 2-stage deep learning methods for 3D pose estimation on a public and a custom dataset of handball jump shots for which they have not been trained and where never-before-seen poses may occur. Furthermore, this paper proposes methods for retargeting and smoothing the 3D sequence of poses that have experimentally shown a performance improvement for all tested models. Additionally, we evaluated the applicability and robustness of five state-of-the-art tracking methods on a public and a custom dataset of a handball training recorded with a monocular camera. The paper ends with a discussion apostrophizing the shortcomings of the pose estimation and tracking methods, reflected in the problems of locating key skeletal points and generating poses that do not follow possible human structures, which consequently reduces the overall accuracy of action recognition.
In team sports training scenes, it is common to have many players on the court, each with his own ball performing different actions. Our goal is to detect all players in the handball court and ...determine the most active player who performs the given handball technique. This is a very challenging task, for which, apart from an accurate object detector, which is able to deal with complex cluttered scenes, additional information is needed to determine the active player. We propose an active player detection method that combines the Yolo object detector, activity measures, and tracking methods to detect and track active players in time. Different ways of computing player activity were considered and three activity measures are proposed based on optical flow, spatiotemporal interest points, and convolutional neural networks. For tracking, we consider the use of the Hungarian assignment algorithm and the more complex Deep SORT tracker that uses additional visual appearance features to assist the assignment process. We have proposed the evaluation measure to evaluate the performance of the proposed active player detection method. The method is successfully tested on a custom handball video dataset that was acquired in the wild and on basketball video sequences. The results are commented on and some of the typical cases and issues are shown.
In recent years, multi-person pose forecasting has gained significant attention due to its potential applications in various fields such as computer vision, robotics, sports analysis, and human-robot ...interaction. In this paper, we propose a novel deep learning model for multi-person pose forecasting called MPFSIR (multi-person pose forecasting and social interaction recognition) that achieves comparable results with state-of-the-art models, but with up to 30 times fewer parameters. In addition, the model includes a social interaction prediction component to model and predict interactions between individuals. We evaluate our model on three benchmark datasets: 3DPW, CMU-Mocap, and MuPoTS-3D, compare it with state-of-the-art methods, and provide an ablation study to analyze the impact of the different model components. Experimental results show the effectiveness of MPFSIR in accurately predicting future poses and capturing social interactions. Furthermore, we introduce the metric MW-MPJPE to evaluate the performance of pose forecasting, which focuses on motion dynamics. Overall, our results highlight the potential of MPFSIR for predicting the poses of multiple people and understanding social dynamics in complex scenes and in various practical applications, especially where computational resources are limited. The code is available at https://github.com/RomeoSajina/MPFSIR.
This paper focuses on image and video content analysis of handball scenes and applying deep learning methods for detecting and tracking the players and recognizing their activities. Handball is a ...team sport of two teams played indoors with the ball with well-defined goals and rules. The game is dynamic, with fourteen players moving quickly throughout the field in different directions, changing positions and roles from defensive to offensive, and performing different techniques and actions. Such dynamic team sports present challenging and demanding scenarios for both the object detector and the tracking algorithms and other computer vision tasks, such as action recognition and localization, with much room for improvement of existing algorithms. The aim of the paper is to explore the computer vision-based solutions for recognizing player actions that can be applied in unconstrained handball scenes with no additional sensors and with modest requirements, allowing a broader adoption of computer vision applications in both professional and amateur settings. This paper presents semi-manual creation of custom handball action dataset based on automatic player detection and tracking, and models for handball action recognition and localization using Inflated 3D Networks (I3D). For the task of player and ball detection, different configurations of You Only Look Once (YOLO) and Mask Region-Based Convolutional Neural Network (Mask R-CNN) models fine-tuned on custom handball datasets are compared to original YOLOv7 model to select the best detector that will be used for tracking-by-detection algorithms. For the player tracking, DeepSORT and Bag of tricks for SORT (BoT SORT) algorithms with Mask R-CNN and YOLO detectors were tested and compared. For the task of action recognition, I3D multi-class model and ensemble of binary I3D models are trained with different input frame lengths and frame selection strategies, and the best solution is proposed for handball action recognition. The obtained action recognition models perform well on the test set with nine handball action classes, with average F1 measures of 0.69 and 0.75 for ensemble and multi-class classifiers, respectively. They can be used to index handball videos to facilitate retrieval automatically. Finally, some open issues, challenges in applying deep learning methods in such a dynamic sports environment, and direction for future development will be discussed.