Statistical classification techniques in high energy physics (SDDT algorithm)
[摘要] We present our proposal of the supervised binary divergence decision tree with nested separation method based on the generalized linear models. A key insight we provide is the clustering driven only by a few selected physical variables. The proper selection consists of the variables achieving the maximal divergence measure between two different classes. Further, we apply our method to Monte Carlo simulations of physics processes corresponding to a data sample of top quark-antiquark pair candidate events in the lepton+jets decay channel. The data sample is produced in pp collisions at √S = 1.96 TeV. It corresponds to an integrated luminosity of 9.7 fb-1recorded with the D0 detector during Run II of the Fermilab Tevatron Collider. The efficiency of our algorithm achieves 90% AUC in separating signal from background. We also briefly deal with the modification of statistical tests applicable to weighted data sets in order to test homogeneity of the Monte Carlo simulations and measured data. The justification of these modified tests is proposed through the divergence tests.
[发布日期] [发布机构] Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Trojanova 13, Praha 2; 12000, Czech Republic^1
[效力级别] 数学 [学科分类]
[关键词] Different class;Divergence measures;Fermilab Tevatron collider;Generalized linear model;Integrated luminosity;Physical variables;Separation methods;Statistical classification techniques [时效性]