Journal article
SYNNER synthetic data generator framework
Digital health, v 12, p20552076251411621
01 Jan 2026
PMID: 41732183
Abstract
Objectives Sharing medical data is hampered by technical, regulatory, and privacy challenges, including compliance with the Health Insurance Portability and Accountability Act of 1996. However, existing data anonymization methods are error-prone or vulnerable to re-identification, and synthetic data generation approaches are limited. This study introduces SYNNER, a novel synthetic data generation framework that overcomes existing limitations, preserving data utility while ensuring privacy.Methods We employ knowledge graph embeddings to encode data into a k-dimensional space, capturing complex relationships. For each entity, its nearest neighbors are identified, and their characteristics are used to generate a synthetic version that maintains statistical consistency. We evaluated SYNNER on seven publicly available datasets, measuring the preservation of original data signals and comparing macro-F1 scores across prediction tasks. A novel evaluation protocol for differential privacy was also introduced, simulating an adversarial attack to infer missing values.Results The evaluation shows that SYNNER maintains an average of 83.2% of the signals from the original datasets. In predictive tasks, models trained on SYNNER-generated data achieved a proportional average macro-F1 score of 74.4%, comparable to those trained on the original data. The proposed evaluation protocol for differential privacy assesses whether synthetic datasets meet expected privacy standards and highlights potential risks of individual data point reconstruction.Conclusion SYNNER provides a scalable and effective solution for generating synthetic data that maintains statistical fidelity. It overcomes the limitations of existing methods, providing a privacy-preserving solution for synthetic data generation and advancing research in sensitive domains such as healthcare.
Metrics
4 Record Views
Details
- Title
- SYNNER synthetic data generator framework
- Creators
- Hegler Tissot - Drexel UniversityJustin Moore - Drexel UniversityEric Benton - Drexel UniversitySarah Alshahrani - Drexel UniversityMaria Helena Franciscatto - Univ Fed Parana, Informat Dept, Curitiba, BrazilMarcos D. Del Fabro - Commissariat à l'Énergie Atomique et aux Énergies Alternatives
- Publication Details
- Digital health, v 12, p20552076251411621
- Publisher
- Sage
- Number of pages
- 22
- Grant note
- INSAFEDARE Project
- Resource Type
- Journal article
- Language
- English
- Academic Unit
- Information Science
- Web of Science ID
- WOS:001695058900001
- Scopus ID
- 2-s2.0-105030722611
- Other Identifier
- 991022163919204721