In parallel to the increase in the collection of potentially sensitive private user data by companies and governments, the concerns over the risk of privacy leakage, due to sale and publication of the collected user data, has increased as well. Even though the data is published after anonymization, anonymization has been shown to be insufficient on its own to eliminate the risk of privacy leakage, as there often is a correlation between a person’s personal and public lives. For example, based on the publicly available Rotten Tomatoes ratings, it has been practically shown that one can infer the identity of an anonymous user present in Netflix Prize Dataset.
In this project, we build the theoretical foundation of the database matching problem by building an analogy between database matching and channel decoding problems. Borrowing tools from the channel decoding literature, we derive sufficient and necessary conditions on the asymptotic relationship between the number of users and the number of attributes, for the existence of a successful de-anonymization algorithm when the number of attributes grows to infinity. Particularly, we focus on
- Databases with labeled attributes, which are prone to obfuscation (noise)[1,4,6,7].
- Databases with unlabeled attributes, which are prone to synchronization errors (column deletions/replications)[2-7].
Featured Group Publications
- F. Shirani, S. Garg and E. Erkip, “A Concentration of Measure Approach to Database De-anonymization,” 2019 IEEE International Symposium on Information Theory (ISIT), 2019, pp. 2748-2752, doi: 10.1109/ISIT.2019.8849392.
- S. Bakirtas, E. Erkip, “Database Matching Under Column Deletions,” Proc. 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, July 2021.
- S. Bakirtas, E. Erkip, “Matching of Markov Databases Under Random Column Repetitions,” 2022 56th Asilomar Conference on Signals, Systems, and Computers, Pacific Grive, CA, November 2022.
- S. Bakirtas, E. Erkip, “Seeded Database Matching Under Noisy Column Repetitions,” Proc. 2023 IEEE Information Theory Worksop (ITW), Mumbai, India, November 2022.
- S. Bakirtas, E. Erkip, “Database Matching Under Adversarial Column Deletions”, Proc. 2023 IEEE Information Theory Worksop (ITW), Saint-Malo, France, April 2023.
- S. Bakirtas, E. Erkip, “Distribution-Agnostic Database De-Anonymization Under Synchronization Errors”, Proc. 2023 IEEE Workshop on Information Forensics and Security, Nuremberg, Germany, December 2023.
- S. Bakirtas, E. Erkip, ”Database Matching Under Noisy Synchronization Errors”, under review for journal publication.