Logo image
SYNNER synthetic data generator framework
Journal article   Open access   Peer reviewed

SYNNER synthetic data generator framework

Hegler Tissot, Justin Moore, Eric Benton, Sarah Alshahrani, Maria Helena Franciscatto and Marcos D. Del Fabro
Digital health, v 12, p20552076251411621
01 Jan 2026
PMID: 41732183
url
https://doi.org/10.1177/20552076251411621View
Published, Version of Record (VoR) Open

Abstract

Health Care Sciences & Services Health Policy & Services Life Sciences & Biomedicine Medical Informatics Public, Environmental & Occupational Health Science & Technology
Objectives Sharing medical data is hampered by technical, regulatory, and privacy challenges, including compliance with the Health Insurance Portability and Accountability Act of 1996. However, existing data anonymization methods are error-prone or vulnerable to re-identification, and synthetic data generation approaches are limited. This study introduces SYNNER, a novel synthetic data generation framework that overcomes existing limitations, preserving data utility while ensuring privacy.Methods We employ knowledge graph embeddings to encode data into a k-dimensional space, capturing complex relationships. For each entity, its nearest neighbors are identified, and their characteristics are used to generate a synthetic version that maintains statistical consistency. We evaluated SYNNER on seven publicly available datasets, measuring the preservation of original data signals and comparing macro-F1 scores across prediction tasks. A novel evaluation protocol for differential privacy was also introduced, simulating an adversarial attack to infer missing values.Results The evaluation shows that SYNNER maintains an average of 83.2% of the signals from the original datasets. In predictive tasks, models trained on SYNNER-generated data achieved a proportional average macro-F1 score of 74.4%, comparable to those trained on the original data. The proposed evaluation protocol for differential privacy assesses whether synthetic datasets meet expected privacy standards and highlights potential risks of individual data point reconstruction.Conclusion SYNNER provides a scalable and effective solution for generating synthetic data that maintains statistical fidelity. It overcomes the limitations of existing methods, providing a privacy-preserving solution for synthetic data generation and advancing research in sensitive domains such as healthcare.

Metrics

4 Record Views

Details

Logo image