Logo image
How shuffling works for effective time series data protection
Journal article   Peer reviewed

How shuffling works for effective time series data protection

Matthew J. Schneider, Jinwook Lee and Lanqing Du
Annals of operations research, v 357(2-3), pp 1165-1190
01 Feb 2026

Abstract

Operations Research & Management Science Science & Technology Technology
Many existing data protection methods (DPMs) tend to overlook the use cases of forecasting, that can result in noisy forecasts and decreased accuracy. To address this issue, we develop a novel data protection framework for data providers who prioritize data privacy with minimal loss of forecasting accuracy. Recent studies show that k-nearest Time Series (k-nTS) swapping can be applied to achieve usable forecasting accuracy while maintaining an acceptable level of privacy. However, this method is often times inefficient because it requires a few tasks for each of time series data, say, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_j \in \mathbb {R}<^>n$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1, \dots , J$$\end{document}, where J is very large: (i) distance computing for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-1$$\end{document} vectors from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_j$$\end{document}; (ii) ordering all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-1$$\end{document} vectors based on the calculated distances from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_j$$\end{document}; iii) randomly selecting one of them to swap the values; (iv) and then, finally, moving to the next time series, to start over the entire process until the last vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_J$$\end{document}. This entire process is time-consuming, inefficient for practical applications, and lacks any performance guarantee. In this research, we propose a new method called the K-means Time Series (K-mTS) Shuffling, which can be applied to thousands of time series to enhance efficiency and ensure performance guarantees. The efficiency is improved by (i) clustering, which eliminates the need for iterative individual distance calculations, and (ii) simultaneous data swapping (or shuffling) within the same cluster. Furthermore, for performance guarantee, instead of using random selection as in the k-nTS approach, we apply the perfect matching scheme to find optimal vector to swap with for each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_j$$\end{document} that allows data providers to systematically manage the trade-offs between data privacy and forecasting accuracy. The presented numerical results indicate that our proposed method achieves guaranteed usable forecasting accuracy than that obtained using confidential data protected by traditional methods, while maintaing an acceptable level of privacy.

Metrics

3 Record Views

Details

InCites Highlights

Data related to this publication, from InCites Benchmarking & Analytics tool:

Collaboration types
Domestic collaboration
Web of Science research areas
Operations Research & Management Science
Logo image