Citation: | TONG Lingling, LI Pengxiao, DUAN Dongsheng, et al. Data masking model for heterogeneous big data environment[J]. Journal of Beijing University of Aeronautics and Astronautics, 2022, 48(2): 249-257. doi: 10.13700/j.bh.1001-5965.2020.0403(in Chinese) |
Due to the variety of data types and desensitization demand in different scenarios, traditional data masking methods cannot meet the user privacy protection requirements in the environment of big data. How to realize the accurate pointing and efficient desensitization of heterogeneous big data for data security, trust and availability, has become the key in this area. In this paper, we propose a data masking model for heterogeneous big data applications, such as texts, images, voices and databases, and four key modules are presented in our model. First, the sensitive data automatic identification and classification in different applications are realized in different application scenarios by desensitization data preprocessing. Second, with data pre-masking method, the data masking evaluation is implemented in five dimensions, including data availability, data relevance, degree of privacy protection, and time and space complexity, to construct the customized desensitization strategy. Finally, after task scheduling, the allocation and execution of the data masking tasks are performed, and the masking data recovery can also be partially supported. Two typical data masking applications are verified and analyzed based on the proposed heterogeneous big data masking model, indicating that effective desensitization can be achieved in different application scenarios.
[1] |
SWEENEY L. k-anonymity: A model for protecting privacy[J]. Fuzziness and Knowledge-based Systems, 2002, 10(5): 557-570. doi: 10.1142/S0218488502001648
|
[2] |
RADHAKRISHNAN R, KHARRAZI M, MEMON N. Data masking: A new approach for steganography[J]. Journal of VLSI Signal Processing Systems for Signal, Image and Video Technology, 2005, 41(3): 293-303. doi: 10.1007/s11265-005-4153-1
|
[3] |
RAVIKUMAR G K, MANJUNATH T N, RAVINDRA S, et al. A survey on recent trends, process and development in data masking for testing[J]. International Journal of Computer Science, 2011, 8(2): 535-544. http://core.ac.uk/download/pdf/25891670.pdf
|
[4] |
VICTOR N, LOPEZ D, ABAWAJY J H. Privacy models for big data: A survey[J]. International Journal of Big Data Intelligence, 2016, 3(1): 61-75. doi: 10.1504/IJBDI.2016.073904
|
[5] |
VADREVU P K, ADUSUMALLI S K, MANGALAMPLLI V K. Survey: Privacy preserving data publication in the age of big data in IoT era[J]. International Journal of Engineering, Science and Mathematics, 2017, 6(8): 938-944. http://www.researchgate.net/profile/Pavan_Vadrevu2/publication/323166547_Survey_Privacy_Preserving_Data_Publication_in_the_age_of_Big_Data_in_IoT_Era/links/5a83bd3645851504fb3a784b/Survey-Privacy-Preserving-Data-Publication-in-the-age-of-Big-Data-in-IoT-Era.pdf
|
[6] |
陈天莹, 陈剑锋. 大数据环境下的智能数据脱敏系统[J]. 通信技术, 2016, 49(7): 915-922. doi: 10.3969/j.issn.1002-0802.2016.07.023
CHEN T Y, CHEN J F. Intelligent data masking system for big data productive environment[J]. Communications Technology, 2016, 49(7): 915-922(in Chinese). doi: 10.3969/j.issn.1002-0802.2016.07.023
|
[7] |
MACHANAVAJJHALA A, GEHRKE J, KIFER D, et al. l-diversity: Privacy beyond k-anonymity[C]//IEEE 22nd International Conference on Data Engineering. Piscataway: IEEE Press, 2006: 24.
|
[8] |
LI N, LI T, VENKATASUBRAMANIAN S. t-closeness: Privacy beyond k-anonymity and l-diversity[C]//IEEE 23rd International Conference on Data Engineering. Piscataway: IEEE Press, 2007: 106-115.
|
[9] |
SARADA G, ABITHA N, MANIKANDAN G, et al. A few new approaches for data masking[C]//International Conference on Circuits, Power and Computing Technologies. Piscataway: IEEE Press, 2015: 15295632.
|
[10] |
GUJJARY V A, SAXENA A. A neural network approach for data masking[J]. Neurocomputing, 2011, 74(9): 1497-1501. doi: 10.1016/j.neucom.2011.01.002
|
[11] |
ZHOU Y, LOUIS T A. A smoothing approach for masking spatial data[J]. Annals of Applied Statistics, 2010, 4(3): 1451-1475. doi: 10.1214/09-aoas325
|
[12] |
吴克河, 朱海, 李为, 等. 基于敏感信息度量的t-保密脱敏技术改良[J]. 信息技术, 2019(11): 5-9. https://www.cnki.com.cn/Article/CJFDTOTAL-HDZJ201911002.htm
WU K H, ZHU H, LI W, et al. An improvement of t-closeness technology based on sensitive information metric[J]. Information Technology, 2019(11): 5-9(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-HDZJ201911002.htm
|
[13] |
SANTOS R J, BERNARDINO J, VIEIRA M. A data masking technique for data warehouses[C]//Proceedings of the 15th Symposium on International Database Engineering & Applications, 2011: 61-69.
|
[14] |
张琦颖. 大数据脱敏系统的设计与实现[D]. 北京: 北京邮电大学, 2018: 19-33.
ZHANG Q Y. The design and implementation of big data anonymity system[D]. Beijing: Beijing University of Posts and Telecommunications, 2018: 19-33(in Chinese).
|
[15] |
邵华西. 基于T-Closeness的大数据脱敏系统的设计与实现[D]. 北京: 北京邮电大学, 2019: 44-52.
SHAO H X. Design and implementation of T-Closeness based big data anonymization system[D]. Beijing: Beijing University of Posts and Telecommunications, 2019: 44-52(in Chinese).
|
[16] |
王鑫, 王电钢, 母继元, 等. 基于机器学习的数据脱敏系统研究与设计[J]. 电力信息与通信技术, 2018, 16(1): 33-38. https://www.cnki.com.cn/Article/CJFDTOTAL-DXXH201801007.htm
WANG X, WANG D G, MU J Y, et al. Research and implementation of data masking system based on machine learning[J]. Electric Power ICT, 2018, 16(1): 33-38(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-DXXH201801007.htm
|
[17] |
邓雪, 李家铭, 曾浩健, 等. 层次分析方法权重计算方法分析及其应用研究[J]. 数学的实践与认知, 2012, 42(7): 93-100. https://www.cnki.com.cn/Article/CJFDTOTAL-SSJS201207013.htm
DENG X, LI J M, ZENG H J, et al. Research on computation methods of AHP weight vector and its applications[J]. Mathematics in Practice and Theory, 2012, 42(7): 93-100(in Chinese). https://www.cnki.com.cn/Article/CJFDTOTAL-SSJS201207013.htm
|