Automatska detekcija objekata u moru na slikama nadzornih ili panoramskih kamera otvara mogućnost automatskog praćenja prometa, detekcije neovlaštenoga kretanja, opasnosti ili onečišćenja. U ovom ...radu analiziraju se performanse modela temeljenih na arhitekturi YOLOv7 za zadatak detekcije plovila i plutača na takvim slikama. Modeli su naučeni na vlastitom skupu podataka različitih pomorskih scena izrađenom za tu svrhu, korištenjem prijenosa učenja s modela naučenih na općenitim slikama. Također, ispitane su dvije varijante rukovanja ulazom u mrežu, te je korištenje strategije rezanja ulazne slike značajno poboljšalo rezultate detekcije posebno malih objekata u odnosu na osnovni model.
Automatic object detection in maritime surveillance or panoramic camera images opens up possibilities for automatic traffic monitoring, unauthorized movement detection, and hazard or pollution identification. This study investigates the performance of models based on the YOLOv7 architecture for the task of detecting vessels and buoys in images captured by panoramic and surveillance cameras. The models are trained on a dedicated dataset comprising diverse maritime scenes created for this purpose, utilizing transfer learning from models trained on generic images. Additionally, two variants of input handling strategies are examined, and the use of the input image cropping strategy significantly improves detection results, especially for small objects, compared to the baseline model.
Global terrorist threats and illegal migration have intensified concerns for the security of citizens, and every effort is made to exploit all available technological advances to prevent adverse ...events and protect people and their property. Due to the ability to use at night and in weather conditions where RGB cameras do not perform well, thermal cameras have become an important component of sophisticated video surveillance systems. In this paper, we investigate the task of automatic person detection in thermal images using convolutional neural network models originally intended for detection in RGB images. We compare the performance of the standard state-of-the-art object detectors such as Faster R-CNN, SSD, Cascade R-CNN, and YOLOv3, that were retrained on a dataset of thermal images extracted from videos that simulate illegal movements around the border and in protected areas. Videos are recorded at night in clear weather, rain, and in the fog, at different ranges, and with different movement types. YOLOv3 was significantly faster than other detectors while achieving performance comparable with the best, so it was used in further experiments. We experimented with different training dataset settings in order to determine the minimum number of images needed to achieve good detection results on test datasets. We achieved excellent detection results with respect to average accuracy for all test scenarios although a modest set of thermal images was used for training. We test our trained model on different well known and widely used thermal imaging datasets as well. In addition, we present the results of the recognition of humans and animals in thermal images, which is particularly important in the case of sneaking around objects and illegal border crossings. Also, we present our original thermal dataset used for experimentation that contains surveillance videos recorded at different weather and shooting conditions.
In team sports training scenes, it is common to have many players on the court, each with his own ball performing different actions. Our goal is to detect all players in the handball court and ...determine the most active player who performs the given handball technique. This is a very challenging task, for which, apart from an accurate object detector, which is able to deal with complex cluttered scenes, additional information is needed to determine the active player. We propose an active player detection method that combines the Yolo object detector, activity measures, and tracking methods to detect and track active players in time. Different ways of computing player activity were considered and three activity measures are proposed based on optical flow, spatiotemporal interest points, and convolutional neural networks. For tracking, we consider the use of the Hungarian assignment algorithm and the more complex Deep SORT tracker that uses additional visual appearance features to assist the assignment process. We have proposed the evaluation measure to evaluate the performance of the proposed active player detection method. The method is successfully tested on a custom handball video dataset that was acquired in the wild and on basketball video sequences. The results are commented on and some of the typical cases and issues are shown.
This paper focuses on image and video content analysis of handball scenes and applying deep learning methods for detecting and tracking the players and recognizing their activities. Handball is a ...team sport of two teams played indoors with the ball with well-defined goals and rules. The game is dynamic, with fourteen players moving quickly throughout the field in different directions, changing positions and roles from defensive to offensive, and performing different techniques and actions. Such dynamic team sports present challenging and demanding scenarios for both the object detector and the tracking algorithms and other computer vision tasks, such as action recognition and localization, with much room for improvement of existing algorithms. The aim of the paper is to explore the computer vision-based solutions for recognizing player actions that can be applied in unconstrained handball scenes with no additional sensors and with modest requirements, allowing a broader adoption of computer vision applications in both professional and amateur settings. This paper presents semi-manual creation of custom handball action dataset based on automatic player detection and tracking, and models for handball action recognition and localization using Inflated 3D Networks (I3D). For the task of player and ball detection, different configurations of You Only Look Once (YOLO) and Mask Region-Based Convolutional Neural Network (Mask R-CNN) models fine-tuned on custom handball datasets are compared to original YOLOv7 model to select the best detector that will be used for tracking-by-detection algorithms. For the player tracking, DeepSORT and Bag of tricks for SORT (BoT SORT) algorithms with Mask R-CNN and YOLO detectors were tested and compared. For the task of action recognition, I3D multi-class model and ensemble of binary I3D models are trained with different input frame lengths and frame selection strategies, and the best solution is proposed for handball action recognition. The obtained action recognition models perform well on the test set with nine handball action classes, with average F1 measures of 0.69 and 0.75 for ensemble and multi-class classifiers, respectively. They can be used to index handball videos to facilitate retrieval automatically. Finally, some open issues, challenges in applying deep learning methods in such a dynamic sports environment, and direction for future development will be discussed.
Around the world many maritime surveillance cameras exist whose functionality could be expanded with computer vision-based object detection in order to monitor traffic and provide automated ...statistics or to increase safety. To this aim, we train two versions of the Yolov7 object detection model on a suitable custom dataset with four object categories and evaluate their detection performance. In order to handle small objects such as buoys or objects that appear visually small in the frame such as distant boats, we examine two different configurations of input to the model.
Automatic image annotation involves automatically assigning useful keywords to an unlabelled image. The major goal is to bridge the so-called semantic gap between the available image features and the ...keywords that people might use to annotate images. Although different people will most likely use different words to annotate the same image, most people can use object or scene labels when searching for images.
We propose a two-tier annotation model where the first tier corresponds to object-level and the second tier to scene-level annotation. In the first tier, images are annotated with labels of objects present in them, using multi-label classification methods on low-level features extracted from images. Scene-level annotation is performed in the second tier, using the originally developed inference-based algorithms for annotation refinement and for scene recognition. These algorithms use a fuzzy knowledge representation scheme based on Fuzzy Petri Net, KRFPNs, that is defined to enable reasoning with concepts useful for image annotation. To define the elements of the KRFPNs scheme, novel data-driven algorithms for acquisition of fuzzy knowledge are proposed.
The proposed image annotation model is evaluated separately on the first and on the second tier using a dataset of outdoor images. The results outperform the published results obtained on the same image collection, both on the object-level and on scene-level annotation. Different subsets of features composed of dominant colours, image moments, and GIST descriptors, as well as different classification methods (RAKEL, ML-kNN and Naïve Bayes), were tested in the first tier. The results of scene level annotation in the second tier are also compared with a common classification method (Naïve Bayes) and have shown superior performance. The proposed model enables the expanding of image annotation with new concepts regardless of their level of abstraction.
•Multi-label classification and knowledge-based approach to image annotation.•The definition of the fuzzy knowledge representation scheme based on FPN.•Novel data-driven algorithms for automatic acquisition of fuzzy knowledge.•Novel inference based algorithms for annotation refinement and scene recognition.•A comparison of inference-based scene classification with an ordinary approach.
Velika količina podataka koja se svaki dan kreira može se upotrijebiti za razvoj algoritama umjetne inteligencije u domeni računalnog vida koji rješavaju zadatke poput klasifikacije slika, detekcije ...osoba i raspoznavanja akcija. Ti skupovi podataka su najčešće izrađeni od videozapisa i slika preuzetih s televizijskih kanala ili s društvene mreže YouTube i prikupljeni su i pripremljeni za odgovarajući zadatak. Nas je zanimao zadatak detekcije plivača, kako bi se model mogao koristiti za raspoznavanje i unaprjeđenje plivačkih tehnika. Iako danas postoje ogromne otvorene baze slika poput COCO i ImageNet, pripremljene za nadzirano strojno učenje te baze sportskih scena poput Olympic Sports Dataset, UCF Action Sport dataset ili Sport-1M koje uključuju slike popularnijih (gledanijih) sportova, nijedna od njih ne uključuje slike koje bi se mogle koristiti za izradu našeg modela za detekciju plivača. Stoga je u ovom radu opisan postupak snimanja i prikupljanja video materijala te priprema skupa slika UNIRI-SWM za detekciju plivača. Skup uključuje snimke plivača u realnim, situacijskim uvjetima treninga i natjecanja snimljenih akcijskim kamerama iz različitih kutova snimanja. U radu su dani rezultati detekcije plivača korištenjem dubokih konvolucijskih neuronskih mreža Mask R-CNN i Yolov3, naučenim na skupu općih slika prije i nakon učenja na skupu UNIRI-SWM. Rezultati pokazuju da se nakon prilagodbe modela na odgovarajućem skupu slika iz domene plivanja mogu postići jako dobri rezultati detekcije plivača.
The large amount of data that is created every day can be used to develop artificial intelligencealgorithms in the domain of computer vision that solve tasks such as image classification, facedetection and action recognition. These datasets are most often created from videos and imagesdownloaded from television channels or the YouTube social network and are collected and preparedfor the appropriate task. We were interested in the task of detecting swimmers, so that the modelcould be used to recognize and improve swimming techniques. Although today there are huge openimage databases like COCO and ImageNet, prepared for supervised machine learning and sportsscene databases like Olympic Sports Dataset, UCF Action Sport dataset or Sport-1M that includeimages of more popular (watched) sports, none of them include images that could be used to makeour swimmer detection model. Therefore, this paper describes the process of recording and collectingvideo material and preparing a set of UNIRI-SWM images for swimmer detection. The set includesshots of swimmers in real, situational training and competition conditions filmed by action camerasfrom different shooting angles. The paper presents the results of swimmer detection using deepconvolutional neural networks Mask R-CNN and Yolo v3, learned in the set of general images beforeand after learning in the set UNIRI-SWM. The results show that after adjusting the model on theappropriate set of images from the swimming domain, very good results of swimmer detection canbe achieved.
A well known problem in unit selection speech synthesis is designing the join and target function sub-costs and optimizing their corresponding weights so that they reflect the human listeners' ...preferences. To achieve this we propose a procedure where an objective criterion for optimal speech unit selection is used. The objective criterion for tuning the cost function weights is based on automatic speech recognition results. In order to demonstrate the effectiveness of the proposed method listening tests with 31 naive listeners were performed. The experimental results have shown that the proposed method improves speech quality and intelligibility. In order to evaluate the quality of synthesized speech the unit selection speech synthesis system is compared with two other Croatian speech synthesis systems with voices built using the same recorded speech corpus. One of these voices was built with the Festival speech synthesis system using the statistical parametric method and the other is a diphone concatenation based text-to-speech system. The comparison is based on subjective tests using MOS (mean opinion score) evaluation. The system using the proposed method used for cost function weights optimization performs better than other compared systems according to the subjective tests.
Ball Detection Using Yolo and Mask R-CNN Buric, Matija; Pobar, Miran; Ivasic-Kos, Marina
2018 International Conference on Computational Science and Computational Intelligence (CSCI)
Conference Proceeding
Many computer vision applications rely on accurate and fast object detection, and in our case, ball detection serves as a prerequisite for action recognition in handball scenes. We compare the ...performance of two of the state-of-the-art convolutional neural network-based object detectors for the task of ball detection in non-staged, real-world conditions. The comparison is performed in terms of speed and accuracy measures on a dataset comprising custom handball footage and a sample of images obtained from the Internet. The performance of the models is compared with and without additional training with examples from our dataset.
To build a successful supervised learning model for action recognition a large amount of training data needs to be labeled first. Labeling is normally done manually and it is a tedious and ...time-consuming task, especially in the case of video footage, when each individual athlete performing a given action should be labeled. To minimize the manual labor, we propose a Mask R-CNN and Optical flow based method to determine the active players who perform a given action among all players presented on the scene. The Mask R-CNN is a deep learning object recognition method used for player detection and optical flow measures player activity. Combining both methods ensures tracking and labeling of active players in handball video sequences. The method was successfully tested on a dataset of handball practice videos recorded in the wild.