Differentially Private Recurrent Variational Autoencoder For Text Privacy Preservation

[摘要] Deep learning techniques have been widely used in natural language processing (NLP) tasks and have made remarkable progress. However, training the deep learning model relies on a large amount of data which may involve sensitive information like electronic medical records. The attacker can infer sensitive information from the model, which leads to privacy leakage. To solve this problem, we propose a Differentially Private Recurrent Variational AutoEncoder (DP-RVAE) that can generate simulated data in place of the sensitive dataset to preserve privacy. To generate high utility synthetic text, a part of sensitive text data is employed as the conditional input of the model and uses a dropout and noise perturbing mechanism to preserve differential privacy. In addition, we expand the proposed DP-RVAE to a federated learning setting and design a novel training paradigm for NLP tasks. Specifically, DP-RVAE is deployed to the client-side to train and generate personalized text. These DP-RVAE models would be aggregated and updated through the Federated Optimisation (FedOPT) algorithm so that personal information can be well preserved. We evaluate our proposed DP-RVAE through a text classification task on the Tweets depression sentiment and IMDB reviews datasets. Our DP-RVAE achieves a higher average test accuracy by 5.90% and 3.94% compared to the typical centralized training and federated learning approach, respectively. We also perform the keywords inference attack experiment on the medical description dataset collected from the real world. Compared to the typical differentially private preserving approach, the DP-RVAE decreases by 15.2% in average attack accuracy. The experimental results demonstrate that DP-RVAE can be applied to the NLP models to leverage accuracy while preserving sensitive privacy.

[发布日期] [发布机构]

[效力级别] Early Access [学科分类]

[关键词] [时效性]

浏览次数：1

统一登录查看全文激活码登录查看全文