Effectively utilizing readily available auxiliary data to improve predictive performance on new modeling tasks is a key problem in data mining. In this research, the goal is to transfer knowledge ...between sources of data, particularly when ground-truth information for the new modeling task is scarce or is expensive to collect where leveraging any auxiliary sources of data becomes a necessity. Toward seamless knowledge transfer among tasks, effective representation of the data is a critical but yet not fully explored research area for the data engineer and data miner. Here, we present a technique based on the idea of sparse coding, which essentially attempts to find an embedding for the data by assigning feature values based on subspace cluster membership. We modify the idea of sparse coding by focusing the identification of shared clusters between data when source and target data may have different distributions. In our paper, we point out cases where a direct application of sparse coding will lead to a failure of knowledge transfer. We then present the details of our extension to sparse coding, by incorporating distribution distance estimates for the embedded data, and show that the proposed algorithm can overcome the shortcomings of the sparse coding algorithm on synthetic data and achieve improved predictive performance on a real world chemical toxicity transfer learning task.
A Survey of Classical and Quantum Sequence Models Chen, I-Chi; Singh, Harshdeep; Anukruti, V L ...
2024 16th International Conference on COMmunication Systems & NETworkS (COMSNETS),
2024-Jan.-3
Conference Proceeding
Our primary objective is to conduct a brief survey of various classical and quantum neural net sequence models, which includes self-attention and recurrent neural networks, with a focus on recent ...quantum approaches proposed to work with near-term quantum devices, while exploring some basic enhancements for these quantum models. We re-implement a key representative set of these existing methods, adapting an image classification approach using quantum self-attention to create a quantum hybrid transformer that works for text and image classification, and applying quantum self-attention and quantum recurrent neural networks to natural language processing tasks. We also explore different encoding techniques and introduce positional encoding into quantum self-attention neural networks leading to improved accuracy and faster convergence in text and image classification experiments. This paper also performs a comparative analysis of classical self-attention models and their quantum counterparts, helping shed light on the differences in these models and their performance.
Probabilistic forecasting of high dimensional multivariate time series is a notoriously challenging task, both in terms of computational burden and distribution modeling. Most previous work either ...makes simple distribution assumptions or abandons modeling cross-series correlations. A promising line of work exploits scalable matrix factorization for latent-space forecasting, but is limited to linear embeddings, unable to model distributions, and not trainable end-to-end when using deep learning forecasting. We introduce a novel temporal latent auto-encoder method which enables nonlinear factorization of multivariate time series, learned end-to-end with a temporal deep learning latent space forecast model. By imposing a probabilistic latent space model, complex distributions of the input series are modeled via the decoder.
Extensive experiments demonstrate that our model achieves state-of-the-art performance on many popular multivariate datasets, with gains sometimes as high as 50% for several standard metrics.
Large margin transductive transfer learning Quanz, Brian; Huan, Jun
Proceedings of the 18th ACM conference on Information and knowledge management,
11/2009
Conference Proceeding
Recently there has been increasing interest in the problem of transfer learning, in which the typical assumption that training and testing data are drawn from identical distributions is relaxed. We ...specifically address the problem of transductive transfer learning in which we have access to labeled training data and unlabeled testing data potentially drawn from different, yet related distributions, and the goal is to leverage the labeled training data to learn a classifier to correctly predict data from the testing distribution.
We have derived efficient algorithms for transductive transfer learning based on a novel viewpoint and the Support Vector Machine (SVM) paradigm, of a large margin hyperplane classifier in a feature space. We show that our method can out-perform some recent state-of-the-art approaches for transfer learning on several data sets, with the added benefits of model and data separation and the potential to leverage existing work on support vector machines.
Probabilistic forecasting of high dimensional multivariate time series is a notoriously challenging task, both in terms of computational burden and distribution modeling. Most previous work either ...makes simple distribution assumptions or abandons modeling cross-series correlations. A promising line of work exploits scalable matrix factorization for latent-space forecasting, but is limited to linear embeddings, unable to model distributions, and not trainable end-to-end when using deep learning forecasting. We introduce a novel temporal latent auto-encoder method which enables nonlinear factorization of multivariate time series, learned end-to-end with a temporal deep learning latent space forecast model. By imposing a probabilistic latent space model, complex distributions of the input series are modeled via the decoder. Extensive experiments demonstrate that our model achieves state-of-the-art performance on many popular multivariate datasets, with gains sometimes as high as \(50\%\) for several standard metrics.
Effectively utilizing readily available auxiliary data to improve predictive performance on new modeling tasks is a key problem in data mining. In this research the goal is to transfer knowledge ...between sources of data, particularly when ground truth information for the new modeling task is scarce or is expensive to collect where leveraging any auxiliary sources of data becomes a necessity. Towards seamless knowledge transfer among tasks, effective representation of the data is a critical but yet not fully explored research area for the data engineer and data miner. Here we present a technique based on the idea of sparse coding, which essentially attempts to find an embedding for the data by assigning feature values based on subspace cluster membership. We modify the idea of sparse coding by focusing the identification of shared clusters between data when source and target data may have different distributions. In our paper, we point out cases where a direct application of sparse coding will lead to a failure of knowledge transfer. We then present the details of our extension to sparse coding, by incorporating distribution distance estimates for the embedded data, and show that the proposed algorithm can overcome the shortcomings of the sparse coding algorithm on synthetic data and achieve improved predictive performance on a real world chemical toxicity transfer learning task.
The focus of this thesis is on learning approaches for what we call "low-quality data'' and in particular data in which only small amounts of labeled target data is available. The first part provides ...background discussion on low-quality data issues, followed by preliminary study in this area. The remainder of the thesis focuses on a particular scenario: multi-view semi-supervised learning. Multi-view learning generally refers to the case of learning with data that has multiple natural views, or sets of features, associated with it. Multi-view semi-supervised learning methods try to exploit the combination of multiple views along with large amounts of unlabeled data in order to learn better predictive functions when limited labeled data is available. However, lack of complete view data limits the applicability of multi-view semi-supervised learning to real world data. Commonly, one data view is readily and cheaply available, but additionally views may be costly or only available in some cases. This thesis work aims to make multi-view semi-supervised learning approaches more applicable to real world data specifically by addressing the issue of missing views through both feature generation and active learning, and addressing the issue of model selection for semi-supervised learning with limited labeled data. This thesis introduces a unified approach for handling missing view data in multi-view semi-supervised learning tasks, which applies to both data with completely missing additional views and data only missing views in some instances. The idea is to learn a feature generation function mapping one view to another with the mapping biased to encourage the features generated to be useful for multi-view semi-supervised learning algorithms. The mapping is then used to fill in views as pre-processing. Unlike previously proposed single-view multi-view learning approaches, the proposed approach is able to take advantage of additional view data when available, and for the case of partial view presence is the first feature-generation approach specifically designed to take into account the multi-view semi-supervised learning aspect. The next component of this thesis is the analysis of an active view completion scenario. In some tasks, it is possible to obtain missing view data for a particular instance, but with some associated cost. Recent work has shown an active selection strategy can be more effective than a random one. In this thesis, a better understanding of active approaches is sought, and it is demonstrated that the effectiveness of an active selection strategy over a random one can depend on the relationship between the views. Finally, an important component of making multi-view semi-supervised learning applicable to real world data is the task of model selection, an open problem which is often avoided entirely in previous work. For cases of very limited labeled training data the commonly used cross-validation approach can become ineffective. This thesis introduces a re-training alternative to the method-dependent approaches similar in motivation to cross-validation, that involves generating new training and test data by sampling from the large amount of unlabeled data and estimated conditional probabilities for the labels. The proposed approaches are evaluated on a variety of multi-view semi-supervised learning data sets, and the experimental results demonstrate their efficacy.
Our primary objective is to conduct a brief survey of various classical and quantum neural net sequence models, which includes self-attention and recurrent neural networks, with a focus on recent ...quantum approaches proposed to work with near-term quantum devices, while exploring some basic enhancements for these quantum models. We re-implement a key representative set of these existing methods, adapting an image classification approach using quantum self-attention to create a quantum hybrid transformer that works for text and image classification, and applying quantum self-attention and quantum recurrent neural networks to natural language processing tasks. We also explore different encoding techniques and introduce positional encoding into quantum self-attention neural networks leading to improved accuracy and faster convergence in text and image classification experiments. This paper also performs a comparative analysis of classical self-attention models and their quantum counterparts, helping shed light on the differences in these models and their performance.
We introduce a new class of data-driven and distribution-free optimistic-robust bimodal inventory optimization (BIO) strategy to effectively allocate inventory across a retail chain to meet ...time-varying, uncertain omnichannel demand. While prior Robust optimization (RO) methods emphasize the downside, i.e., worst-case adversarial demand, BIO also considers the upside to remain resilient like RO while also reaping the rewards of improved average-case performance by overcoming the presence of endogenous outliers. This bimodal strategy is particularly valuable for balancing the tradeoff between lost sales at the store and the costs of cross-channel e-commerce fulfillment, which is at the core of our inventory optimization model. These factors are asymmetric due to the heterogenous behavior of the channels, with a bias towards the former in terms of lost-sales cost and a dependence on network effects for the latter. We provide structural insights about the BIO solution and how it can be tuned to achieve a preferred tradeoff between robustness and the average-case. Our experiments show that significant benefits can be achieved by rethinking traditional approaches to inventory management, which are siloed by channel and location. Using a real-world dataset from a large American omnichannel retail chain, a business value assessment during a peak period indicates over a 15% profitability gain for BIO over RO and other baselines while also preserving the (practical) worst case performance.