Strojno učenje u uvjetima manje raspoloživosti podataka

E-resources

Peer reviewed Open access

Strojno učenje u uvjetima manje raspoloživosti podataka

Juričić, Vedran

Politehnika, 12/2023, Volume: 7, Issue: 2

Journal Article, Web Resource

Strojno učenje je predmet istraživanja brojnih znanstvenih i stručnih projekata, i važan sastavni dio sustava koji se koriste u medicini, bankarstvu, računalnoj sigurnosti, komunikaciji i brojnim drugim domenama. Jedno je od najaktivnijih područja istraživanja, s konstantnim napretkom i razvojem novih algoritama i pristupa, te poboljšanjem postojećih metoda. Značajan utjecaj na performanse modela strojnog učenja ima skup podataka nad kojim je napravljeno treniranje, odnosno kvaliteta podataka, ravnomjerna razdioba vrijednosti i veličina skupa. To predstavlja potencijalan problem kod metoda strojnog učenja koje zahtijevaju prethodno označene podatke, jer prikupljanje podataka može biti iznimno složeno, skupo i vremenski zahtjevno. U tom slučaju klasičan model strojnog učenja vrlo vjerojatno neće imati dobre performanse. Jedan od pristupa rješavanja ovog problema je primjena učenja prijenosom, u kojem model koristi skup podataka ne samo iz promatrane domene, već i iz druge, idealno srodne domene. U radu su simulirani uvjeti manje raspoloživosti skupa podataka, na kojem su analizirane performanse tri modela temeljena na neuronskim mrežama, od kojih se jedan temelji na prethodno istreniranom modelu. Opisan je postupak kreiranja skupova za treniranje i prezentirani su rezultati analize navedena tri modela s različitim veličinama skupova. Machine learning is the subject of numerous scientific and professional research projects and is an important component of systems used in medicine, banking, computer security, communications and numerous other fields. It is one of the most active areas of research with constant progress and development of new algorithms and approaches as well as improvement of existing methods. The performance of the machine learning model is significantly affected by the dataset used for training, i.e. the quality of the data, the uniform distribution of values and the size of the set. This is a potential problem with machine learning methods that require pre-labelled data, as data acquisition can be extremely complex, expensive and time-consuming. In this case, the classical machine learning model will most likely not perform well. One approach to solve this problem is to apply transfer learning, where the model uses a dataset not only from the target domain but also from other, and ideally related domains. In the work, conditions with lower availability of datasets were simulated, under which the performance of three models was analyzed, one of which was based on a previously trained model. The process of creating training sets is described, and the results of analyzing the three models with different sized sets are presented.

Keep searching

Author

Access to the JCR database is permitted only to users from Slovenia. Your current IP address is not on the list of IP addresses with access permission, and authentication with the relevant AAI accout is required.

Year	Impact factor		Edition		Category		Classification
Year	JCR	SNIP	JCR	SNIP	JCR	SNIP	JCR	SNIP

Links to authors' personal bibliographies	Links to information on researchers in the SICRIS system

Source: Personal bibliographies and: SICRIS

Upload image

Shelf entry

Adding material to shelf was successful.

Adding material to shelf failed.

It was not necessary to add the material to the shelf.

Permalink

E-mail

Impact factor

Select the library membership card:

DRS, in which the journal is indexed

Citations

Theme