Manuscript Title:

PRIVACY PRESERVATION FOR BIG DATA PUBLISHING: APPLYING KANONYMITY AND DIFFERENTIALLY PRIVATE SYNTHETIC DATA GENERATION WITH DP-CTGAN

Author:

ANANNA HOQUE SHATHI, Dr. BOSHIR AHMED

DOI Number:

DOI:10.5281/zenodo.15803650

Published : 2025-07-10

About the author(s)

1. ANANNA HOQUE SHATHI - Department of Computer Science and Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh.
2. Dr. BOSHIR AHMED - Department of Computer Science and Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh.

Full Text : PDF

Abstract

One effective privacy protection method utilized in many tech domains, including big data, is anonymization, which protects extremely sensitive information from outside parties. Extracting enough information from anonymized data while preserving privacy is still difficult, even with major developments that promote secondary use of data. Existing systems often convert large data, compromising their structure and utility. Excessive modification can hinder the performance of mechanisms and their output in real-life circumstances. To solve these problems in our work, we suggest and put into practice a hybrid anonymization method that combines k-anonymity and Differential Privacy Conditional Tabular Generative Adversarial Network (DP-CTGAN) to produce extremely superior quality data that provides insights comparable to actual data while maintaining privacy. We implemented the Mondrian and DP-CTGAN algorithms on the UCI-Adult dataset to hide extremely private information related to the income of a person from unauthorized viewers. The raw data are processed to hide unique individual information from the intermediate data frame. The Mondrian algorithm generates a range of unique information, keeping the rest of the information the same, which is considered to be a fruitful information set without showing one's private information. Our proposed approach produces more reliable anonymized data compared to the present literature.


Keywords

Privacy Preservation; K-Anonymity; Differential Privacy; Big Data; Mondrian Algorithm; DPCTGAN.