The necessity of caring for elderly people is increasing. Great efforts are being made to enable the elderly population to remain independent for as long as possible. Technologies are being developed ...to monitor the daily activities of a person to detect their state. Approaches that recognize activities from simple environment sensors have been shown to perform well. It is also important to know the habits of a resident to distinguish between common and uncommon behavior. In this paper, we propose a novel approach to discover a person’s common daily routines. The approach consists of sequence comparison and a clustering method to obtain partitions of daily routines. Such partitions are the basis to detect unusual sequences of activities in a person’s day. Two types of partitions are examined. The first partition type is based on daily activity vectors, and the second type is based on sensor data. We show that daily activity vectors are needed to obtain reasonable results. We also show that partitions obtained with generalized Hamming distance for sequence comparison are better than partitions obtained with the Levenshtein distance. Experiments are performed with two publicly available datasets.
Večina sodobnih sistemov za strojno prevajanje temelji na arhitekturi nevronskih mrež. To velja za spletne ponudnike strojnega prevajanja, za raziskovalne sisteme in za orodja, ki so lahko v pomoč ...poklicnim prevajalcem v njihovi praksi. Čeprav lahko sisteme nevronskih mrež uporabljamo na običajnih centralnih procesnih enotah osebnih računalnikov in strežnikov, je za delovanje s smiselno hitrostjo potrebna uporaba grafičnih procesnih enot. Pri tem smo omejeni z velikostjo slovarja, kar zmanjšuje kakovost prevodov. Velikost slovarja besednih enot je še posebej pereč problem visoko pregibnih jezikov. Rešujemo ga z uporabo podbesednih enot, s katerimi dosežemo večjo pokritost jezika. V članku predstavljamo različne metode razcepljanja besed na podbesedne enote z različno velikimi slovarji in primerjamo njihovo uporabo v strojnem prevajalniku za jezikovni par slovenščina-angleščina. V primerjavo vključujemo še prevajalnik brez razcepljanja besed. Predstavljamo rezultate uspešnosti prevajanja z metriko BLEU, hitrosti učenja modelov in hitrosti prevajanja ter velikosti modelov. Dodajamo pregled praktičnih vidikov uporabe podbesednih enot v strojnem prevajalniku, ki ga uporabljamo skupaj z orodji za računalniško podprto prevajanje.
With the transition to neural architectures, machine translation achieves very good quality for several resource-rich languages. However, the results are still much worse for languages with complex ...morphology, especially if they are low-resource languages. This paper reports the results of a systematic analysis of adding morphological information into neural machine translation system training. Translation systems presented and compared in this research exploit morphological information from corpora in different formats. Some formats join semantic and grammatical information and others separate these two types of information. Semantic information is modeled using lemmas and grammatical information using Morpho-Syntactic Description (MSD) tags. Experiments were performed on corpora of different sizes for the English–Slovene language pair. The conclusions were drawn for a domain-specific translation system and for a translation system for the general domain. With MSD tags, we improved the performance by up to 1.40 and 1.68 BLEU points in the two translation directions. We found that systems with training corpora in different formats improve the performance differently depending on the translation direction and corpora size.
Ambient assisted living in smart home environments is becoming an important goal in an aging society with challenges in elderly care. A key component in such environments is the accurate recognition ...of activities of daily living from various sensor data. Recent research directions explored several classification methods, including hidden Markov models. This research presents a hidden Markov model-based system for activity recognition, and extends it with a second-order Markov chain model of activity sequences to achieve long-term dependency in the model. We also introduce an activity transition cost to counteract the tendency of hidden Markov models to make a large number of transitions. The proposed models are used for activity recognition, with their scores being combined using heuristically determined weights for optimal performance. We also present a modified Viterbi algorithm, which incorporates both models and the activity transition cost. We used a dataset from the CASAS project to test and evaluate the proposed models. A comparison of the results shows the potential of introducing long term dependencies and the managing the number of activity transitions. We show results regarding the modeling ability to predict activity sequences, a comparison of predicted and actual activity transitions, and final recognition accuracy results. The results show an increase of total activity recognition accuracy from 93.9 % to 94.52 % on individual activities, and from 68.89 % to 70.95 % over the combination of all concurrent activities. The results also show a reduction of predicted activity transitions from 741 to 236, whereas the number of actual activity transitions in the evaluation set is 141.
This paper presents a method of binocular visual stimulation for brain-computer interfaces (BCIs) based on steady-state visual evoked potentials (SSVEPs) using phase-coded symbols. The proposed ...method's emphasis is on a binocular phase-coded visual stimulus, which is based on the phase differences between the left- and right-eye stimuli, and a symbol detection and recognition procedure based on SSVEP response of the left and right occipital lobes of the user's scalp, where the SSVEP response is obtained as electroencephalography (EEG) signaling. The symbols are coded as phase differences and maintain the same frequency of the sine wave-modulated light provided to the user's left and right eyes as a binocular visual stimulation. Based on this method, a basic system setup is presented to explore the possibilities of binocular phase-coded visual stimuli for virtual or augmented reality applications, where the binocular visual stimulation was achieved by the specially designed head-mounted displays. Multiple visually coded targets are realized as eight different phase-coded binocular symbols and further evaluated as a random sequence of single targets, thus representing the situations in virtual or augmented reality, where multiple visually coded targets are present but not visualized to the user simultaneously within the same field of view. The offline results obtained from ten healthy subjects revealed that an average symbol recognition accuracy of 90.63% and an information transfer rate (ITR) of 70.55 bits/min were achieved for a symbol stimulation time of 2 s. The results of this paper demonstrate the feasibility of using binocular visual stimuli for SSVEP-based BCIs, where reasonable ITR is achieved using single-frequency binocular phase-coded symbols. The proposed method indicates the possibility of combining it with 3D wearable visualization technologies, such as binocular head-mounted displays (HMDs), in order to improve the intuitiveness of the interaction with more immersive user experience using BCI modalities.
Na področju govornih in jezikovnih tehnologij predstavlja avtomatsko razpoznavanje govora enega izmed ključnih gradnikov. V prispevku bomo predstavili razvoj avtomatskega razpoznavalnika slovenskega ...govora za domeno dnevnoinformativnih oddaj. Arhitektura sistema je zasnovana na globokih nevronskih mrežah. Pri tem smo ob upoštevanju razpoložljivih govornih virov izvedli modeliranje z različnimi aktivacijskimi funkcijami. V postopku razvoja razpoznavalnika govora smo preverili tudi, kakšen je vpliv izgubnih govornih kodekov na rezultate razpoznavanja govora. Za učenje razpoznavalnika govora smo uporabili bazi UMB BNSI Broadcast News in IETK-TV. Skupni obseg govornih posnetkov je znašal 66 ur. Vzporedno z globokimi nevronskimi mrežami smo povečali slovar razpoznavanja govora, ki je tako znašal 250.000 besed. Na ta način smo znižali delež besed izven slovarja na 1,33 %. Z razpoznavanjem govora na testni množici smo dosegli najboljšo stopnjo napačno razpoznanih besed (WER) 15,17 %. Med procesom vrednotenja rezultatov smo izvedli tudi podrobnejšo analizo napak razpoznavanja govora na osnovi lem in F-razredov, ki v določeni meri pokažejo na zahtevnost slovenskega jezika za takšne scenarije uporabe tehnologije.
Context-dependent factored language models Donaj, Gregor; Kačič, Zdravko
EURASIP journal on audio, speech, and music processing,
02/2017, Letnik:
2017, Številka:
1
Journal Article
Recenzirano
Odprti dostop
The incorporation of grammatical information into speech recognition systems is often used to increase performance in morphologically rich languages. However, this introduces demands for sufficiently ...large training corpora and proper methods of using the additional information. In this paper, we present a method for building factored language models that use data obtained by morphosyntactic tagging. The models use only relevant factors that help to increase performance and ignore data from other factors, thus also reducing the need for large morphosyntactically tagged training corpora. Which data is relevant is determined at run-time, based on the current text segment being estimated, i.e., the context. We show that using a context-dependent model in a two-pass recognition algorithm, the overall speech recognition accuracy in a Broadcast News application improved by 1.73% relatively, while simpler models using the same data achieved only 0.07% improvement. We also present a more detailed error analysis based on lexical features, comparing first-pass and second-pass results.
In speech recognition systems language models are used to estimate the probabilities of word sequences. In this paper special emphasis is given to numerals–words that express numbers. One reason for ...this is the fact that in a practical application a falsely recognized numeral can change important content information inside the sentence more than other types of errors. Standard
n
-gram language models can sometimes assign very different probabilities to different numerals, according to their relative frequencies in training corpus. Based on the assumption that some different numbers are more equally likely to occur, than what a standard
n
-gram language model estimates, this paper proposes several methods for sorting numerals into classes in an inflective language and language models based on these sorting techniques. We treat these classes as basic vocabulary units for the language model. We also expose the differences between the proposed language models and well known class-based language models. The presented approach is also transferable to other classes of words with similar properties, e.g. proper nouns. Results of experiments show that significant improvements are obtained on numeral-rich domains. Although numerals represent only a small portion of words in the test set, a relative reduction in word error rate of 1.4 % was achieved. Statistical significance tests were performed, which showed that these improvements are statistically significant. We also show that depending on the amount of numerals in a target domain the improvement in performance can grow up to 16 % relative.
V nalogi smo se posvetili jezikovnemu modeliranju za avtomatsko razpoznavanje govora z velikim slovarjem besed. Pri takšnem razpoznavanju je še vedno velika težava pravilnost razpoznavanja ...izgovorjenih besed. Ta je še posebej izrazita pri morfološko kompleksnejših jezikih, kot je slovenščina. Za delovanje sistema razpoznavanja tekočega govora potrebujemo jezikovne modele. Da lahko zgradimo primeren jezikovni model, potrebujemo ustrezno velike učne množice podatkov, ki morajo pri morfološko kompleksnejših jezikih biti še večje. Sodobni razpoznavalniki govora za slovenščino delajo več napak kot razpoznavalniki za druge jezike. Pogost problem so napačno razpoznane končnice besed. To kaže, da je smiselno razmišljati o vključevanju oblikoskladenjskih informacij v jezikovno modeliranje, če hočemo zmanjšati število napak. V doktorski nalogi predstavljamo zasnovo sistema, ki ob običajnih n-gramskih besednih jezikovnih modelih uporablja tudi modele, ki vključujejo informacije o besedni vrsti in slovničnih kategorijah prepoznanih besed. Imenujemo jih morfološki modeli. Razvili smo algoritem, ki na osnovi rezultatov perpleksnosti na razvojni množici določa najprimernejšo strukturo takšnih modelov glede na besedne vrste konteksta besede, ki jo ocenjujemo. Pravimo, da imajo modeli kontekstno odvisno strukturo. Implementirali smo jih kot faktorizirane jezikovne modele. V teh modelih se soočamo z veliko množico različnih možnih kontekstov besede in za vsak kontekst gradimo strukturo modelov ločeno. Pri tem lahko uporabimo le majhen del učne množice. Zato prihaja tudi tukaj do pomanjkanja učnih podatkov, kljub temu da imamo manjše zahteve po velikosti učne množice. Zato smo razvili pristope združevanja različnih kontekstov. Zaradi velikega števila možnih kontekstov in veliko različnih možnosti struktur modelov smo razvili tudi pristope za omejeno iskanje možnih struktur modelov na podlagi postopne gradnje njihovih struktur in sprotnega ocenjevanja. Sistem razpoznavanja je zasnovan v obliki dvoprehodnega algoritma, kjer v drugem prehodu uporabljamo v okviru doktorske disertacije razvite modele. Razvili smo tudi postopek za hitro optimizacijo uteži modelov in postopek dinamičnega uteževanja glede na kontekst besede. Uspešnost razpoznavanja z razvitimi modeli in brez njih smo testirali na slovenski govorni bazi Broadcast News.