Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to ...classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
Automatic speech recognition is essential for establishing natural communication with a human–computer interface. Speech recognition accuracy strongly depends on the complexity of language. Highly ...inflected word forms are a type of unit present in some languages. The acoustic background presents an additional important degradation factor influencing speech recognition accuracy. While the acoustic background has been studied extensively, the highly inflected word forms and their combined influence still present a major research challenge. Thus, a novel type of analysis is proposed, where a dedicated speech database comprised solely of highly inflected word forms is constructed and used for tests. Dedicated test sets with various acoustic backgrounds were generated and evaluated with the Slovenian UMB BN speech recognition system. The baseline word accuracy of 93.88% and 98.53% was reduced to as low as 23.58% and 15.14% for the various acoustic backgrounds. The analysis shows that the word accuracy degradation depends on and changes with the acoustic background type and level. The highly inflected word forms’ test sets without background decreased word accuracy from 93.3% to only 63.3% in the worst case. The impact of highly inflected word forms on speech recognition accuracy was reduced with the increased levels of acoustic background and was, in these cases, similar to the non-highly inflected test sets. The results indicate that alternative methods in constructing speech databases, particularly for low-resourced Slovenian language, could be beneficial.
Beekeeping is one of the widespread and traditional fields in agriculture, where Internet of Things (IoT)-based solutions and machine learning approaches can ease and improve beehive management ...significantly. A particularly important activity is bee swarming. A beehive monitoring system can be applied for digital farming to alert the user via a service about the beginning of swarming, which requires a response. An IoT-based bee activity acoustic classification system is proposed in this paper. The audio data needed for acoustic training was collected from the Open Source Beehives Project. The input audio signal was converted into feature vectors, using the Mel-Frequency Cepstral Coefficients (with cepstral mean normalization) and Linear Predictive Coding. The influence of the acoustic background noise and denoising procedure was evaluated in an additional step. Different Hidden Markov Models' and Gaussian Mixture Models' topologies were developed for acoustic modeling, with the objective being to determine the most suitable one for the proposed IoT-based solution. The evaluation was carried out with a separate test set, in order to successfully classify sound between the normal and swarming conditions in a beehive. The evaluation results showed that good acoustic classification performance can be achieved with the proposed system.
Full text
Available for:
IZUM, KILJ, NUK, PILJ, PNG, SAZU, UL, UM, UPUK
The advanced smart home environment presents an important trend for the future of human wellbeing. One of the prerequisites for applying its rich functionality is the ability to differentiate between ...various user categories, such as gender, age, speakers, etc. We propose a model for an efficient acoustic gender and age classification system for human–computer interaction in a smart home. The objective was to improve acoustic classification without using high-complexity feature extraction. This was realized with pitch as an additional feature, combined with additional acoustic modeling approaches. In the first step, the classification is based on Gaussian mixture models. In the second step, two new procedures are introduced for gender and age classification. The first is based on the count of the frames with the speaker’s pitch values, and the second is based on the sum of the frames with pitch values belonging to a certain speaker. Since both procedures are based on pitch values, we have proposed a new, effective algorithm for pitch value calculation. In order to improve gender and age classification, we also incorporated speech segmentation with the proposed voice activity detection algorithm. We also propose a procedure that enables the quick adaptation of the classification algorithm to frequent smart home users. The proposed classification model with pitch values has improved the results in comparison with the baseline system.
This paper presents a new framework for integrating untranscribed spoken content into the acoustic training of an automatic speech recognition system. Untranscribed spoken content plays a very ...important role for underresourced languages because the production of manually transcribed speech databases still represents a very expensive and time‐consuming task. We proposed two new methods as part of the training framework. The first method focuses on combining initial acoustic models using a data‐driven metric. The second method proposes an improved acoustic training procedure based on unsupervised transcriptions, in which word endings were modified by broad phonetic classes. The training framework was applied to baseline acoustic models using untranscribed spoken content from parliamentary debates. We include three types of acoustic models in the evaluation: baseline, reference content, and framework content models. The best overall result of 18.02% word error rate was achieved with the third type. This result demonstrates statistically significant improvement over the baseline and reference acoustic models.
Andrej Zgank (phone: +386 2 220 7206, email: andrej.zgank@uni‐mb.si) is with the Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia.
This paper presents a new framework for integrating untranscribed spoken content into the acoustic training of an automatic speech recognition system. Untranscribed spoken content plays a very ...important role for under-resourced languages because the production of manually transcribed speech databases still represents a very expensive and time-consuming task. We proposed two new methods as part of the training framework. The first method focuses on combining initial acoustic models using a data-driven metric. The second method proposes an improved acoustic training procedure based on unsupervised transcriptions, in which word endings were modified by broad phonetic classes. The training framework was applied to baseline acoustic models using untranscribed spoken content from parliamentary debates. We include three types of acoustic models in the evaluation: baseline, reference content, and framework content models. The best overall result of 18.02% word error rate was achieved with the third type. This result demonstrates statistically significant improvement over the baseline and reference acoustic models.
The development of big data, machine learning, and the Internet of Things has led to rapid advances in the research field of Active and Assisted Living (AAL). A human is placed in the center of such ...an environment, interacting with different modalities while using the system. Although video still plays a dominant role in AAL technologies, audio, as the most natural means of interaction, is also used commonly, either as a single source of information, or in combination with other modalities. Despite the rapidly increased research efforts in the last decade, there is a lack of systematic overview of audio based technologies and applications in AAL. This review tries to fill this gap, and identifies five major topics where audio is an essential AAL building block: Physiological monitoring, emotion recognition in the context of AAL, human activity recognition, fall detection, and food intake monitoring. We address the data work flow and standard sensing technologies for capturing audio in the AAL environment, provide a comprehensive overview of audio-based AAL applications, and identify datasets available to the research community. Finally, we address the main challenges that should be handled in the upcoming years, and try to identify the potential future trends in audio-based AAL.
Display omitted
•Comprehensive review of sensing technologies, datasets and applications of audio in AAL.•Audio can be standalone source of information, or combined with other modalities.•Challenges in deploying audio technologies in AAL and future trends are identified.•Creating benchmark platforms to facilitate the model evaluation is desirable.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UILJ, UL, UM, UPCLJ, UPUK, ZAGLJ, ZRSKP
•A platform for intelligent ambience with several HMI services is proposed.•For mobile application within the ambience robotic unit Genesis is presented.•Distributed architecture provides flexibility ...and extended connectivity.•Mobile unit Genesis proves to be highly manoeuvrable and easy to operate.
In the paper, a speech-based platform for intelligent ambience and/or supportive environment applications is presented. The platform has a distributed architecture, which enables extended connectivity and support for multiple intelligent ambience services. The mobile unit Genesis is an integral part of the distributed platform, enabling interaction between several users and the environment. Furthermore, the sophisticated client/server platform's architecture incorporates robust speech recognition and text-to-speech synthesis engines for more natural human-machine interaction between users and the mobile unit Genesis. Both engines are multilingual oriented. Although the whole system is developed for the Slovenian language, it can be quickly adapted for other languages when appropriate language resources are available. With high speaker independent speech recognition accuracy and low command-to-operation delay, Genesis proves to have good manoeuvrability and it is easy to operate even by a non-experienced operator.
Display omitted
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NLZOH, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK, ZRSKP
This paper addresses the topic of defining phonetic broad classes needed during acoustic modeling for speech recognition in the procedure of decision tree based clustering. The usual approach is to ...use phonetic broad classes which are defined by an expert. This method has some disadvantages, especially in the case of multilingual speech recognition. A new data-driven method is proposed for the generation of phonetic broad classes based on a phoneme confusion matrix. The similarity measure is defined using the number of confusions between the master phoneme and all other phonemes included in the set. This proposed method is compared to the standard approach based on expert knowledge and to the randomly generated broad classes approach. The proposed data-driven method is implicitly evaluated within a speech recognition experiment. The intention of the first evaluation stage is to test the generated acoustic models in a monolingual environment (Slovenian), to show that the proposed method does not contain a multilingual influence. In the second evaluation stage, the generated acoustic models are tested in a multilingual environment (Slovenian, German and Spanish). All experiments were based on SpeechDat(II) speech databases. The proposed data-driven method for the generation of phonetic broad classes, based on phoneme confusion matrix, improved speech recognition results when compared to the method based on expert knowledge.
Full text
Available for:
GEOZS, IJS, IMTLJ, KILJ, KISLJ, NUK, OILJ, PNG, SAZU, SBCE, SBJE, UL, UM, UPCLJ, UPUK
Na področju govornih in jezikovnih tehnologij predstavlja avtomatsko razpoznavanje govora enega izmed ključnih gradnikov. V prispevku bomo predstavili razvoj avtomatskega razpoznavalnika slovenskega ...govora za domeno dnevnoinformativnih oddaj. Arhitektura sistema je zasnovana na globokih nevronskih mrežah. Pri tem smo ob upoštevanju razpoložljivih govornih virov izvedli modeliranje z različnimi aktivacijskimi funkcijami. V postopku razvoja razpoznavalnika govora smo preverili tudi, kakšen je vpliv izgubnih govornih kodekov na rezultate razpoznavanja govora. Za učenje razpoznavalnika govora smo uporabili bazi UMB BNSI Broadcast News in IETK-TV. Skupni obseg govornih posnetkov je znašal 66 ur. Vzporedno z globokimi nevronskimi mrežami smo povečali slovar razpoznavanja govora, ki je tako znašal 250.000 besed. Na ta način smo znižali delež besed izven slovarja na 1,33 %. Z razpoznavanjem govora na testni množici smo dosegli najboljšo stopnjo napačno razpoznanih besed (WER) 15,17 %. Med procesom vrednotenja rezultatov smo izvedli tudi podrobnejšo analizo napak razpoznavanja govora na osnovi lem in F-razredov, ki v določeni meri pokažejo na zahtevnost slovenskega jezika za takšne scenarije uporabe tehnologije.