NUK - logo
E-resources
Full text
  • FakeDB: Generating Fake Syn...
    Gao, Chongyang; Jajodia, Sushil; Pugliese, Andrea; Subrahmanian, V.S.

    IEEE transactions on dependable and secure computing, 2024
    Journal Article

    Health care providers may wish to share limited information with researchers. Manufacturing companies may want to share some but not all data with regulators or partners. Since the emergence of generative adversarial networks (GANs), efforts have been made to generate synthetic data that preserves semantic properties on the one hand and distributions on the other hand. However, all past efforts focus on a single table at a time. We propose FakeDB, a general framework to generate synthetic data that preserves a a wide variety of semantic integrity constraints as well as a broad set of statistical properties, across an entire relational database. We compare FakeDB with natural extensions of prior work on 8 well known relational databases as well as on a synthetically generated dataset, and show that FakeDB outperforms them. We also show that FakeDB runs in reasonable amounts of time, making it a practical solution to the problem of generating synthetic data.