Effective detection of security compromises in enterprises using feature engineering
[摘要] We present a method to effectively detect malicious activity in the data of enterprise logs. Our method involves feature engineering, or generating new features by applying operators on the features of the raw data. We apply the Fourier expansion of Boolean functions to generate parity functions on feature subsets, or parity features. We also investigate a heuristic method of applying Boolean operators to raw data features, generating propositional features. We demonstrate with real data sets that the engineered features enhance the performance of classifiers and clustering algorithms. As compared to classification of raw data features, the engineered features achieve up to 50.6% improvement in malicious recall while sacrificing no more than 0.47% in accuracy. Clustering with respect to the engineered features finds up to 6 "pure" malicious clusters, as compared to 0 "pure" clusters with raw data features. In one case, exactly one (1) engineered feature could achieve higher performance than 91 raw data features. In general, a small number (<10) of engineered features achieve higher performance than raw data features.
[发布日期] [发布机构]
[效力级别] [学科分类]
[关键词] Feature engineering [时效性]