Membership Leakage in Label-Only Exposures Li, Zheng; Zhang, Yang
Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security,
11/2021
Conference Proceeding
Open access
Machine learning (ML) has been widely adopted in various privacy-critical applications, e.g., face recognition and medical image analysis. However, recent research has shown that ML models are ...vulnerable to attacks against their training data. Membership inference is one major attack in this domain: Given a data sample and model, an adversary aims to determine whether the sample is part of the model's training set. Existing membership inference attacks leverage the confidence scores returned by the model as their inputs (score-based attacks). However, these attacks can be easily mitigated if the model only exposes the predicted label, i.e., the final model decision. In this paper, we propose decision-based membership inference attacks and demonstrate that label-only exposures are also vulnerable to membership leakage. In particular, we develop two types of decision-based attacks, namely transfer attack and boundary attack. Empirical evaluation shows that our decision-based attacks can achieve remarkable performance, and even outperform the previous score-based attacks in some cases. We further present new insights on the success of membership inference based on quantitative and qualitative analysis, i.e., member samples of a model are more distant to the model's decision boundary than non-member samples. Finally, we evaluate multiple defense mechanisms against our decision-based attacks and show that our two types of attacks can bypass most of these defenses.
Mapping Land Use (LU) is crucial for monitoring and managing the dynamic evolution of the human activities of a given area and their consequential environmental impacts. In this study, a multimodal ...machine learning framework, using the XGBoost classifier applied to attributes constructed from heterogeneous spatial data sources, is defined and used to automatically classify LU in the two French departments of Gers and Rhône. It reaches a mean F1 score of 83% and 86% respectively. This research work also assesses the robustness and transferability of the machine learning model between these two diverse study areas and highlights the challenges encountered, arising mainly from the differences of distribution of the attributes and classes between the study areas. Adding a few samples from the test study area allows the model to learn some specificities of the test study area, and thus improves the results. Moreover, the study evaluates the individual contributions of each data source to the accuracy of predictions of the LU classes, providing insights concerning the relevance of each data source in enhancing the overall precision of the Land Use classification. The findings contribute to a validated LU classification workflow, identify valuable data sources, and enhance understanding of model transferability challenges.
La cartographie de l'usage des sols (LU) est cruciale pour le suivi et la gestion de l'évolution dynamique des activités humaines d'une zone donnée et de leurs impacts environnementaux consécutifs. Dans cette étude, un cadre d'apprentissage automatique multimodal, utilisant le classifieur XGBoost appliqué à des attributs construits à partir de sources de données spatiales hétérogènes, est défini et utilisé pour classifier automatiquement l'UD dans les deux départements français du Gers et du Rhône. Il atteint un score F1 moyen de 83% et 86% respectivement. Ce travail de recherche évalue également la robustesse et la transférabilité du modèle d'apprentissage automatique entre ces deux zones d'étude différentes et met en évidence les défis rencontrés, découlant principalement des différences de distribution des attributs et des classes entre les zones d'étude. L'ajout de quelques échantillons de la zone d'étude test permet au modèle d'apprendre certaines spécificités de la zone d'étude test, et donc d'améliorer les résultats. En outre, l'étude évalue les contributions individuelles de chaque source de données à la précision des prédictions des classes d'usage des sols, fournissant des indications sur la pertinence de chaque source de données dans l'amélioration de la précision globale de la classification de l'usage des sols. Les résultats contribuent à valider le flux de travail de la classification LU, à identifier les sources de données utiles et à améliorer la compréhension des défis liés à la transférabilité des modèles.
Purpose
The finite element method (FEM) is the preferred method to simulate phenomena in anatomical structures. However, purely FEM‐based mechanical simulations require considerable time, limiting ...their use in clinical applications that require real‐time responses, such as haptics simulators. Machine learning (ML) approaches have been proposed to help with the reduction of the required time. The present paper reviews cases where ML could help to generate faster simulations, without considerably affecting the performance results.
Methods
This review details the ML approaches used, considering the anatomical structures involved, the data collection strategies, the selected ML algorithms, with corresponding features, the metrics used for validation, and the resulting time gains.
Results
A total of 41 references were found. ML algorithms are mainly trained with FEM‐based simulations in 32 publications. The preferred ML approach is neural networks, including deep learning in 35 publications. Tissue deformation is simulated in 18 applications, but other features are also considered. The average distance error and mean squared error are the most frequently used performance metrics, in 14 and 17 publications, respectively. The time gains were considerable, going from hours or minutes for purely FEM‐based simulations to milliseconds, when using ML.
Conclusions
ML algorithms can be used to accelerate FEM‐based biomechanical simulations of anatomical structures, possibly reaching real‐time responses. Fast and real‐time simulations of anatomical structures, generated with ML algorithms, can help to reduce the time required by FEM‐based simulations and accelerate their adoption in the clinical practice.
Abstract
The
$$\mathcal {C}$$
C
-bound is a tight bound on the true risk of a majority vote classifier that relies on the individual quality and pairwise disagreement of the voters and provides ...PAC-Bayesian generalization guarantees. Based on this bound, MinCq is a classification algorithm that returns a dense distribution on a finite set of voters by minimizing it. Introduced later and inspired by boosting, CqBoost uses a column generation approach to build a sparse
$$\mathcal {C}$$
C
-bound optimal distribution on a possibly infinite set of voters. However, both approaches have a high computational learning time because they minimize the
$$\mathcal {C}$$
C
-bound by solving a quadratic program. Yet, one advantage of CqBoost is its experimental ability to provide sparse solutions. In this work, we address the problem of accelerating the
$$\mathcal {C}$$
C
-bound minimization process while keeping the sparsity of the solution and without losing accuracy. We present CB-Boost, a computationally efficient classification algorithm relying on a greedy–boosting-based–
$$\mathcal {C}$$
C
-bound optimization. An in-depth analysis proves the optimality of the greedy minimization process and quantifies the decrease of the
$$\mathcal {C}$$
C
-bound operated by the algorithm. Generalization guarantees are then drawn based on already existing PAC-Bayesian theorems. In addition, we experimentally evaluate the relevance of CB-Boost in terms of the three main properties we expect about it: accuracy, sparsity, and computational efficiency compared to MinCq, CqBoost, Adaboost and other ensemble methods. As observed in these experiments, CB-Boost not only achieves results comparable to the state of the art, but also provides
$$\mathcal {C}$$
C
-bound sub-optimal weights with very few computational demand while keeping the sparsity property of CqBoost.
Seismicity in the Raton Basin over the past two decades suggests reactivation of basement faults due to waste‐water injection. In the summer of 2018, 96 short period three‐component nodal instruments ...were installed in a highly active region of the basin for a month. A machine‐learning based phase picker (PhaseNet) was adopted and identified millions of picks, which were associated into events using an automated algorithm—REAL (Rapid Earthquake Association and Location). After hypocenter relocation with hypoDD, the earthquake catalog contains 9,259 ML −2.2 to 3 earthquakes focused at depths of 4–6 km. Magnitude of completeness (Mc) varies from −1 at nighttime to −0.5 in daytime, likely reflecting noise variation modulated by wind. The clustered hypocenters with variable depths and focal mechanisms suggest a complex network of basement faults. Frequency‐magnitude statistics and the spatiotemporal evolution of seismicity are comparable to tectonic systems.
Plain Language Summary
Earthquakes induced by waste‐water injection are widely observed worldwide and have been occurring in the Raton Basin (located at the border of New Mexico and Colorado) for two decades. We deployed 96 short period seismic stations in the summer of 2018 to investigate the faults in the southern section of Raton Basin. Earthquake detection was performed with state‐of‐the‐art machine‐learning techniques, which led to a catalog with ~10,000 earthquakes with magnitude ranging from −2.2 to 3. Clusters of earthquakes were investigated in detail. We found that the orientation of the faults varies within the study region and that induced earthquakes may exhibit spatial‐temporal‐magnitude clustering just like tectonic ones. Successful application of the automated catalog‐building workflow also sheds light on the power of “hands‐free” processing of large‐volume seismic data.
Key Points
A machine‐learning phase picker and dense nodal array enabled location of ~10,000 earthquakes in a month
Hypocenter patterns and moment tensors vary among clusters, unveiling reactivation of complex basement faults
Raton Basin seismicity exhibits frequency‐magnitude distribution and spatiotemporal evolution comparable to tectonic events
We present a machine learning approach that integrates geometric deep learning and Sobolev training to generate a family of finite strain anisotropic hyperelastic models that predict the homogenized ...responses of polycrystals previously unseen during the training. While hand-crafted hyperelasticity models often incorporate homogenized measures of microstructural attributes, such as the porosity or the averaged orientation of constituents, these measures may not adequately represent the topological structures of the attributes. We fill this knowledge gap by introducing the concept of the weighted graph as a new high-dimensional descriptor that represents topological information, such as the connectivity of anisotropic grains in an assemble. By leveraging a graph convolutional deep neural network in a hybrid machine learning architecture previously used in Frankel et al. (2019), the artificial intelligence extracts low-dimensional features from the weighted graphs and subsequently learns the influence of these low-dimensional features on the resultant stored elastic energy functionals. To ensure smoothness and prevent unintentionally generating a non-convex stored energy functional, we adopt the Sobolev training method for neural networks such that a stress measure is obtained implicitly by taking directional derivatives of the trained energy functional. Results from numerical experiments suggest that Sobolev training is capable of generating a hyperelastic energy functional that predicts both the elastic energy and stress measures more accurately than the classical training that minimizes L2 norms. Verification exercises against unseen benchmark FFT simulations and phase-field fracture simulations that employ the geometric learning generated elastic energy functional are conducted to demonstrate the quality of the predictions.
Recent research has shown the advantages of using autoencoders based on deep neural networks for collaborative filtering. In particular, the recently proposed Mult-VAE model, which used the ...multinomial likelihood variational autoencoders, has shown excellent results for top-N recommendations. In this work, we propose the Recommender VAE (RecVAE) model that originates from our research on regularization techniques for variational autoencoders. RecVAE introduces several novel ideas to improve Mult-VAE, including a novel composite prior distribution for the latent codes, a new approach to setting the beta hyperparameter for the beta-VAE framework, and a new approach to training based on alternating updates. In experimental evaluation, we show that RecVAE significantly outperforms previously proposed autoencoder-based models, including Mult-VAE and RaCT, across classical collaborative filtering datasets, and present a detailed ablation study to assess our new developments. Code and models are available at https://github.com/ilya-shenbin/RecVAE.
Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will ...reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income bracket, while the adversary tries to model a protected variable Z, here gender or zip code. The objective is to maximize the predictor's ability to predict Y while minimizing the adversary's ability to predict Z. Applied to analogy completion, this method results in accurate predictions that exhibit less evidence of stereotyping Z. When applied to a classification task using the UCI Adult (Census) Dataset, it results in a predictive model that does not lose much accuracy while achieving very close to equality of odds (Hardt, et al., 2016). The method is flexible and applicable to multiple definitions of fairness as well as a wide range of gradient-based learning models, including both regression and classification tasks.
Machine learning methods based on AdaBoost have been widely applied to various classification problems across many mission-critical applications including healthcare, law and finance. However, there ...is a growing concern about the unfairness and discrimination of data-driven classification models, which is inevitable for classical algorithms including AdaBoost. In order to achieve fair classification, a novel fair AdaBoost (FAB) approach is proposed that is an interpretable fairness-improving variant of AdaBoost. We mainly investigate binary classification problems and focus on the fairness of three different indicators (i.e., accuracy, false positive rate and false negative rate). By utilizing a fairness-aware reweighting technique for base classifiers, the proposed FAB approach can achieve fair classification while maintaining the advantage of AdaBoost with negligible sacrifice of predictive performance. In addition, a hyperparameter is introduced in FAB to show preferences for the fairness-accuracy trade-off. An upper bound for the target loss function that quantifies error rate and unfairness is theoretically derived for FAB, which provides a strict theoretical support for the fairness-improving methods designed for AdaBoost. The effectiveness of the proposed method is demonstrated on three real-world datasets (i.e., Adult, COMPAS and HSLS) with respect to the three fairness indicators. The results are accordant with theoretic analyses, and show that (i) FAB significantly improves classification fairness at a small cost of accuracy compared with AdaBoost; and (ii) FAB approach can achieve low levels of fairness loss while maintaining high accuracy compared with state-of-the-art fair classification methods including equalized odds method, exponentiated gradient method, grid search reduction method and disparate mistreatment method.