This paper focuses on the hypothesis of optimizing time series predictions using fractal interpolation techniques. In general, the accuracy of machine learning model predictions is closely related to ...the quality and quantitative aspects of the data used, following the principle of garbage-in, garbage-out. In order to quantitatively and qualitatively augment datasets, one of the most prevalent concerns of data scientists is to generate synthetic data, which should follow as closely as possible the actual pattern of the original data.
This study proposes three different data augmentation strategies based on fractal interpolation, namely the Closest Hurst Strategy, Closest Values Strategy and Formula Strategy. To validate the strategies, we used four public datasets from the literature, as well as a private dataset obtained from meteorological records in the city of Braşov, Romania. The prediction results obtained with the LSTM model using the presented interpolation strategies showed a significant accuracy improvement compared to the raw datasets, thus providing a possible answer to practical problems in the field of remote sensing and sensor sensitivity. Moreover, our methodologies answer some optimization-related open questions for the fractal interpolation step using Optuna framework.
Acoustic sensing has been widely exploited for the early detection of harmful situations in urban environments: in particular, several siren identification algorithms based on deep neural networks ...have been developed and have proven robust to the noisy and non-stationary urban acoustic scene. Although high classification accuracy can be achieved when training and evaluating on the same dataset, the cross-dataset performance of such models remains unexplored. To build robust models that generalize well to unseen data, large datasets that capture the diversity of the target sounds are needed, whose collection is generally expensive and time consuming. To overcome this limitation, in this work we investigate synthetic data generation techniques for training siren identification models. To obtain siren source signals, we either collect from public sources a small set of stationary, recorded siren sounds, or generate them synthetically. We then simulate source motion, acoustic propagation and Doppler effect, and finally combine the resulting signal with background noise. This way, we build two synthetic datasets used to train three different convolutional neural networks, then tested on real-world datasets unseen during training. We show that the proposed training strategy based on the use of recorded source signals and synthetic acoustic propagation performs best. In particular, this method leads to models that exhibit a better generalization ability, as compared to training and evaluating in a cross-dataset setting. Moreover, the proposed method loosens the data collection requirement and is entirely built using publicly available resources.
In the recent years, there has been an increasing academic and industrial interest for analyzing the electrical consumption of commercial buildings. Whilst having similarities with the Non Intrusive ...Load Monitoring (NILM) tasks for residential buildings, the nature of the signals that are collected from large commercial buildings introduces additional difficulties to the NILM research causing existing NILM approaches to fail. On the other hand, the amount of publicly available datasets collected from commercial buildings is very limited, which makes the NILM research even more challenging for this type of large buildings. In this study, we aim at addressing these issues. We first present an extensive statistical analysis of both commercial and residential measurements from public and private datasets and show important differences. Secondly, we develop an algorithm for generating synthetic current data based on a modelization of the current flowing through an electrical device. We then demonstrate that our electrical device model fits well real measurements and that our simulations are realistic by using the quantitative metrics described in the previous section. Finally, to encourage research on commercial buildings we release a synthesized dataset called SHED that can be used to evaluate NILM algorithms.
Defect detection is a critical research area in artificial intelligence. Recently, synthetic data-based self-supervised learning has shown great potential on this task. Although many sophisticated ...synthesizing strategies exist, little research has been done to investigate the robustness of models when faced with different strategies. In this article, we focus on this issue and find that existing methods are highly sensitive to them. To alleviate this issue, we present a discrepancy aware framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies across different anomaly detection benchmarks. We hypothesize that the high sensitivity to synthetic data of existing self-supervised methods arises from their heavy reliance on the visual appearance of synthetic data during decoding. In contrast, our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance. To this end, inspired by existing knowledge distillation methods, we employ a teacher-student network, which is trained based on synthesized outliers, to compute the discrepancy map as the cue. Extensive experiments on two challenging datasets prove the robustness of our method. Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
Air pollution poses significant risks to human health and the environment, necessitating effective air quality management strategies. This study presents a novel approach to air quality management by ...integrating an autoencoder (AE) with a convolutional neural network (CNN) algorithm in Tehran city of Iran. One of the primary and vital problems in deep learning is model complexity, and the complexity of a model is affected by data distribution, data complexity, and information volume. AE provide a helpful way to denoise input data and make building deep learning models much more efficient. The proposed methodology enables spatial modeling and risk mapping of six air pollutants, namely, particulate matter 2.5 (PM 2.5 ), particulate matter 10 (PM 10 ), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), ozone (O 3 ), and carbon monoxide (CO). For air pollution modelling, data from a spatial database containing the annual average of six pollutants from 2012 to 2022 was utilized. The model considered various parameters influencing air pollution: altitude, humidity, distance to industrial areas, NDVI (normalized difference vegetation index), population density, rainfall, distance to the street, temperature, traffic volume, wind direction, and wind speed. The risk map accuracy was assessed using the area under the receiver operating characteristic (ROC) curve for six pollutants. Among them, NO 2 , PM 10 , CO, PM 2.5 , O 3 , and SO 2 exhibited the highest accuracy with values of 0.964, 0.95, 0.896, 0.878, 0.877, and 0.811, respectively, in the risk map generated by the CNN-AE model. The findings demonstrated the CNN-AE model’s impressive precision when generating the pollution risk map.
Augmented reality applications use object tracking to estimate the pose of a camera and to superimpose virtual content onto the observed object. Today, a number of tracking systems are available, ...ready to be used in industrial applications. However, such systems are hard to handle for a service maintenance engineer, due to obscure configuration procedures. In this article, we investigate options towards replacing the manual configuration process with a machine learning approach based on automatically synthesized data. We present an automated process of creating object tracker facilities exclusively from synthetic data. The data is highly enhanced to train a convolutional neural network, while still being able to receive reliable and robust results during real world applications only from simple RGB cameras. Comparison against related work using the LINEMOD dataset showed that we are able to outperform similar approaches. For our intended industrial applications with high accuracy demands, its performance is still lower than common object tracking methods with manual configuration. Yet, it can greatly support those as an add-on during initialization, due to its higher reliability.
Motivated by recent innovations in biologically inspired neuromorphic hardware, this article presents a novel unsupervised machine learning algorithm named Hyperseed that draws on the principles of ...vector symbolic architectures (VSAs) for fast learning of a topology preserving feature map of unlabeled data. It relies on two major operations of VSA, binding and bundling. The algorithmic part of Hyperseed is expressed within the Fourier holographic reduced representations (FHRR) model, which is specifically suited for implementation on spiking neuromorphic hardware. The two primary contributions of the Hyperseed algorithm are few-shot learning and a learning rule based on single vector operation. These properties are empirically evaluated on synthetic datasets and on illustrative benchmark use cases, IRIS classification, and a language identification task using the <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>-gram statistics. The results of these experiments confirm the capabilities of Hyperseed and its applications in neuromorphic hardware.
IntroductionCareful development and evaluation of data linkage methods is limited by researcher access to personal identifiers. One solution is to generate synthetic identifiers, which do not pose ...equivalent privacy concerns, but can form a 'gold-standard' linkage algorithm training dataset. Such data could help inform choices about appropriate linkage strategies in different settings. ObjectivesWe aimed to develop and demonstrate a framework for generating synthetic identifier datasets to support development and evaluation of data linkage methods. We evaluated whether replicating associations between attributes and identifiers improved the utility of the synthetic data for assessing linkage error. MethodsWe determined the steps required to generate synthetic identifiers that replicate the properties of real-world data collection. We then generated synthetic versions of a large UK cohort study (the Avon Longitudinal Study of Parents and Children; ALSPAC), according to the quality and completeness of identifiers recorded over several waves of the cohort. We evaluated the utility of the synthetic identifier data in terms of assessing linkage quality (false matches and missed matches). ResultsComparing data from two collection points in ALSPAC, we found within-person disagreement in identifiers (differences in recording due to both natural change and non-valid entries) in 18% of surnames and 12% of forenames. Rates of disagreement varied by maternal age and ethnic group. Synthetic data provided accurate estimates of linkage quality metrics compared with the original data (within 0.13-0.55% for missed matches and 0.00-0.04% for false matches). Incorporating associations between identifier errors and maternal age/ethnicity improved synthetic data utility. ConclusionsWe show that replicating dependencies between attribute values (e.g. ethnicity), values of identifiers (e.g. name), identifier disagreements (e.g. missing values, errors or changes over time), and their patterns and distribution structure enables generation of realistic synthetic data that can be used for robust evaluation of linkage methods.
Developing reliable ultrasonic-guided wave monitoring systems requires a significant amount of inspection data for each application scenario. Experimental investigations are fundamental but require a ...long period and are costly, especially for real-life testing. This is exacerbated by a lack of experimental data that includes damage. In some guided wave applications, such as pipelines, it is possible to introduce artificial damage and perform lab experiments on the test structure. However, in rail track applications, laboratory experiments are either not possible or meaningful. The generation of synthetic data using modelling capabilities thus becomes increasingly important. This paper presents a variational autoencoder (VAE)-based deep learning approach for generating synthetic ultrasonic inspection data for welded railway tracks. The primary aim is to use a VAE model to generate synthetic data containing damage signatures at specified positions along the length of a rail track. The VAE is trained to encode an input damage-free baseline signal and decode to reconstruct an inspection signal with damage by adding a damage signature on either side of the transducer by specifying the distance to the damage signature as an additional variable in the latent space. The training data was produced from a physics-based model that computes virtual experimental response signals using the semi-analytical finite element and the traditional finite element procedures. The VAE reconstructed response signals containing damage signatures were almost identical to the original target signals simulated using the physics-based model. The VAE was able to capture the complex features in the signals resulting from the interaction of multiple propagating modes in a multi-discontinuous waveguide. The VAE model successfully generated synthetic inspection data by fusing reflections from welds with the reflection from a crack model at specified distances from the transducer on either the right or left side. In some cases, the VAE did not exactly reconstruct the peak amplitude of the reflections. This study demonstrated the potential and highlighted the benefit of using a VAE to generate synthetic data with damage signatures as opposed to using superposition to fuse the damage-free responses containing reflections from welds with a damage signature. The results show that it is possible to generate realistic inspection data for unavailable damage scenarios.
SimSpliceEvol is a tool for simulating the evolution of eukaryotic gene sequences that integrates exon-intron structure evolution as well as the evolution of the sets of transcripts produced from ...genes. It takes a guide gene tree as input and generates a gene sequence with its transcripts for each node of the tree, from the root to the leaves. However, the sets of transcripts simulated at different nodes of the guide gene tree lack evolutionary connections. Consequently, SimSpliceEvol is not suitable for evaluating methods for transcript phylogeny inference or gene phylogeny inference that rely on transcript conservation. Here, we introduce SimSpliceEvol2, which, compared to the first version, incorporates an explicit model of transcript evolution for simulating alternative transcripts along the branches of a guide gene tree, as well as the transcript phylogenies inferred. We offer a comprehensive software with a graphical user interface and an updated version of the web server, ensuring easy and user-friendly access to the tool. SimSpliceEvol2 generates synthetic datasets that are useful for evaluating methods and tools for spliced RNA sequence analysis, such as spliced alignment methods, methods for identifying conserved transcripts, and transcript phylogeny reconstruction methods. The web server is accessible at https://simspliceevol.cobius.usherbrooke.ca, where you can also download the standalone software. Comprehensive documentation for the software is available at the same address. For developers interested in the source code, which requires the installation of all prerequisites to run, it is provided at https://github.com/UdeS-CoBIUS/SimSpliceEvol.
Celotno besedilo
Dostopno za:
DOBA, IZUM, KILJ, NUK, PILJ, PNG, SAZU, SIK, UILJ, UKNU, UL, UM, UPUK
Dosegli ste najvišje možno število prikazanih rezultatov iskanja.
Zaradi večje učinkovitosti iskanje ponudi največ 1.000 rezultatov na poizvedbo (oz. 50 strani, če je izbrana možnost 10/stran).
Za nadaljnje pregledovanje rezultatov razmislite o uporabi filtrov rezultatov ali spremembi razvrstitve rezultatov.