How shuffling works for effective time series data protection

Matthew J. Schneider; Jinwook Lee; Lanqing Du

doi:10.1007/s10479-025-06949-2

Many existing data protection methods (DPMs) tend to overlook the use cases of forecasting, that can result in noisy forecasts and decreased accuracy. To address this issue, we develop a novel data protection framework for data providers who prioritize data privacy with minimal loss of forecasting accuracy. Recent studies show that k-nearest Time Series (k-nTS) swapping can be applied to achieve usable forecasting accuracy while maintaining an acceptable level of privacy. However, this method is often times inefficient because it requires a few tasks for each of time series data, say, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_j \in \mathbb {R}<^>n$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=1, \dots , J$$\end{document}, where J is very large: (i) distance computing for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-1$$\end{document} vectors from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_j$$\end{document}; (ii) ordering all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$J-1$$\end{document} vectors based on the calculated distances from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_j$$\end{document}; iii) randomly selecting one of them to swap the values; (iv) and then, finally, moving to the next time series, to start over the entire process until the last vector \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_J$$\end{document}. This entire process is time-consuming, inefficient for practical applications, and lacks any performance guarantee. In this research, we propose a new method called the K-means Time Series (K-mTS) Shuffling, which can be applied to thousands of time series to enhance efficiency and ensure performance guarantees. The efficiency is improved by (i) clustering, which eliminates the need for iterative individual distance calculations, and (ii) simultaneous data swapping (or shuffling) within the same cluster. Furthermore, for performance guarantee, instead of using random selection as in the k-nTS approach, we apply the perfect matching scheme to find optimal vector to swap with for each \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$A_j$$\end{document} that allows data providers to systematically manage the trade-offs between data privacy and forecasting accuracy. The presented numerical results indicate that our proposed method achieves guaranteed usable forecasting accuracy than that obtained using confidential data protected by traditional methods, while maintaing an acceptable level of privacy.

How shuffling works for effective time series data protection

Additional Links

Abstract

Metrics

Details

InCites Highlights

How shuffling works for effective time series data protection

Additional Links

Abstract

Metrics

Details

InCites Highlights

Drexel University Social media