Data perturbation

TRIPODS Seminar

Data perturbation

Series: TRIPODS Seminar

When: Monday, December 12, 2022, 12:00 PM

Location: ENR2 S225

Presenter: Xiaotong Tom Shen, Universtiy of Minnesota

Data perturbation is a technique for generating synthetic data by adding ``noise" to original data, which has a wide range of applications, primarily in data security. Yet, it has not received much attention within data science. In this presentation I will describe a fundamental principle of data perturbation that preserves the distributional information, thus ascertaining the validity of the downstream analysis and a machine learning task while protecting data privacy. Applying this principle, we derive a scheme to allow a user to perturb data nonlinearly while meeting the requirements of differential privacy and statistical analysis. It yields credible statistical analysis and high predictive accuracy of a machine learning task. Finally, I will highlight multiple facets of data perturbation through examples. This work is joint with B Xuan, R Shen, Y. Liu.

https://arizona.zoom.us/j/81984870698