Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to ...classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.
•A platform for intelligent ambience with several HMI services is proposed.•For mobile application within the ambience robotic unit Genesis is presented.•Distributed architecture provides flexibility ...and extended connectivity.•Mobile unit Genesis proves to be highly manoeuvrable and easy to operate.
In the paper, a speech-based platform for intelligent ambience and/or supportive environment applications is presented. The platform has a distributed architecture, which enables extended connectivity and support for multiple intelligent ambience services. The mobile unit Genesis is an integral part of the distributed platform, enabling interaction between several users and the environment. Furthermore, the sophisticated client/server platform's architecture incorporates robust speech recognition and text-to-speech synthesis engines for more natural human-machine interaction between users and the mobile unit Genesis. Both engines are multilingual oriented. Although the whole system is developed for the Slovenian language, it can be quickly adapted for other languages when appropriate language resources are available. With high speaker independent speech recognition accuracy and low command-to-operation delay, Genesis proves to have good manoeuvrability and it is easy to operate even by a non-experienced operator.
Display omitted
Na področju govornih in jezikovnih tehnologij predstavlja avtomatsko razpoznavanje govora enega izmed ključnih gradnikov. V prispevku bomo predstavili razvoj avtomatskega razpoznavalnika slovenskega ...govora za domeno dnevnoinformativnih oddaj. Arhitektura sistema je zasnovana na globokih nevronskih mrežah. Pri tem smo ob upoštevanju razpoložljivih govornih virov izvedli modeliranje z različnimi aktivacijskimi funkcijami. V postopku razvoja razpoznavalnika govora smo preverili tudi, kakšen je vpliv izgubnih govornih kodekov na rezultate razpoznavanja govora. Za učenje razpoznavalnika govora smo uporabili bazi UMB BNSI Broadcast News in IETK-TV. Skupni obseg govornih posnetkov je znašal 66 ur. Vzporedno z globokimi nevronskimi mrežami smo povečali slovar razpoznavanja govora, ki je tako znašal 250.000 besed. Na ta način smo znižali delež besed izven slovarja na 1,33 %. Z razpoznavanjem govora na testni množici smo dosegli najboljšo stopnjo napačno razpoznanih besed (WER) 15,17 %. Med procesom vrednotenja rezultatov smo izvedli tudi podrobnejšo analizo napak razpoznavanja govora na osnovi lem in F-razredov, ki v določeni meri pokažejo na zahtevnost slovenskega jezika za takšne scenarije uporabe tehnologije.
Automatic speech recognition is essential for establishing natural communication with a human–computer interface. Speech recognition accuracy strongly depends on the complexity of language. Highly ...inflected word forms are a type of unit present in some languages. The acoustic background presents an additional important degradation factor influencing speech recognition accuracy. While the acoustic background has been studied extensively, the highly inflected word forms and their combined influence still present a major research challenge. Thus, a novel type of analysis is proposed, where a dedicated speech database comprised solely of highly inflected word forms is constructed and used for tests. Dedicated test sets with various acoustic backgrounds were generated and evaluated with the Slovenian UMB BN speech recognition system. The baseline word accuracy of 93.88% and 98.53% was reduced to as low as 23.58% and 15.14% for the various acoustic backgrounds. The analysis shows that the word accuracy degradation depends on and changes with the acoustic background type and level. The highly inflected word forms’ test sets without background decreased word accuracy from 93.3% to only 63.3% in the worst case. The impact of highly inflected word forms on speech recognition accuracy was reduced with the increased levels of acoustic background and was, in these cases, similar to the non-highly inflected test sets. The results indicate that alternative methods in constructing speech databases, particularly for low-resourced Slovenian language, could be beneficial.
This paper evaluates the impact of combined transcoding and packet loss degradation on speech as input for the interactive voice response service (IVR) and proposes a method for classification of ...user input according to speech quality. Careful optimization of a communication system and all of its segments need to be considered, as the quality of the user’s experience is becoming a more prominent part of the overall acceptance and desirability of modern service. Within our research, emulation environment was developed and the behavior of IVR analyzed under different packet loss and transcoding conditions. A set of frequently-used vocoders was tested on its performance with an automatic speech recognition module under degraded conditions. Further, quality estimation classifier was proposed, based on the Gaussian mixture models to determine best user’s input modality. Various train and test parameters were investigated to provide more detailed insight of input quality estimation for IVR service working under error prone conditions.
Beekeeping is one of the widespread and traditional fields in agriculture, where Internet of Things (IoT)-based solutions and machine learning approaches can ease and improve beehive management ...significantly. A particularly important activity is bee swarming. A beehive monitoring system can be applied for digital farming to alert the user via a service about the beginning of swarming, which requires a response. An IoT-based bee activity acoustic classification system is proposed in this paper. The audio data needed for acoustic training was collected from the Open Source Beehives Project. The input audio signal was converted into feature vectors, using the Mel-Frequency Cepstral Coefficients (with cepstral mean normalization) and Linear Predictive Coding. The influence of the acoustic background noise and denoising procedure was evaluated in an additional step. Different Hidden Markov Models' and Gaussian Mixture Models' topologies were developed for acoustic modeling, with the objective being to determine the most suitable one for the proposed IoT-based solution. The evaluation was carried out with a separate test set, in order to successfully classify sound between the normal and swarming conditions in a beehive. The evaluation results showed that good acoustic classification performance can be achieved with the proposed system.
The advanced smart home environment presents an important trend for the future of human wellbeing. One of the prerequisites for applying its rich functionality is the ability to differentiate between ...various user categories, such as gender, age, speakers, etc. We propose a model for an efficient acoustic gender and age classification system for human–computer interaction in a smart home. The objective was to improve acoustic classification without using high-complexity feature extraction. This was realized with pitch as an additional feature, combined with additional acoustic modeling approaches. In the first step, the classification is based on Gaussian mixture models. In the second step, two new procedures are introduced for gender and age classification. The first is based on the count of the frames with the speaker’s pitch values, and the second is based on the sum of the frames with pitch values belonging to a certain speaker. Since both procedures are based on pitch values, we have proposed a new, effective algorithm for pitch value calculation. In order to improve gender and age classification, we also incorporated speech segmentation with the proposed voice activity detection algorithm. We also propose a procedure that enables the quick adaptation of the classification algorithm to frequent smart home users. The proposed classification model with pitch values has improved the results in comparison with the baseline system.
This paper presents a new framework for integrating untranscribed spoken content into the acoustic training of an automatic speech recognition system. Untranscribed spoken content plays a very ...important role for under-resourced languages because the production of manually transcribed speech databases still represents a very expensive and time-consuming task. We proposed two new methods as part of the training framework. The first method focuses on combining initial acoustic models using a data-driven metric. The second method proposes an improved acoustic training procedure based on unsupervised transcriptions, in which word endings were modified by broad phonetic classes. The training framework was applied to baseline acoustic models using untranscribed spoken content from parliamentary debates. We include three types of acoustic models in the evaluation: baseline, reference content, and framework content models. The best overall result of 18.02% word error rate was achieved with the third type. This result demonstrates statistically significant improvement over the baseline and reference acoustic models.
The relationships between text or talk and the context are among the basic fields of pragmatic research and an insight into their nature may contribute to a better understanding of language use. In ...this article, we use the results of an analysis of discourse marker use in two different conversational genres (telephone conversation and television interviews) in an attempt to examine the impact of context on the use of discourse markers, generalized for each analysed genre. In the first stage of the analysis, we observe important differences between the two genres: discourse markers are far more frequently used in telephone conversations than in television interviews. In the second stage of the analysis, we identify several contextual factors which contribute to the differences in the use of discourse markers. In this way, we obtain insight into this particular aspect of genre context-talk relationships, and identify some of the characteristics of the genres in question.
The aim of this paper was to analyze the influence of narrow-band input channel on a broadcast news speech recognition system. Different acoustic conditions can be found within a typical broadcast ...news domain, where narrow-band channel presents one of those with possible high impact on accuracy. A method for acoustic channel detection, based on HMM models is proposed, in order to distinguish between the input channels. The advantage of this method is its low system complexity. The Slovenian BNSI Broadcast News and SNABI speech databases were used for the experimental setup. The Slovenian UMB Broadcast News automatic speech recognizer was applied as a test-bed, modified appropriately for the task. The evaluation of HMM models for channel detection showed accuracy higher than 90% for both channel types. The channel influence analysis confirmed that narrow-band input channel significantly degrades the speech recognition accuracy, decreasing it by more than 13% absolute.