Efficient Mixed-Order Hidden Markov Model Inference
[摘要] Higher-order Markov models are more powerful than first-order models, butsuffer from an exponential increase in model parameters with order, which leads to data scarcity problems during training. A more efficient approach is to usemixed-order Markov models, which model data sequences with contexts of differentlengths.This study proposes two algorithms for inferring mixed-order Markov chainsand hidden Markov models (HMMs), respectively. The basis of these algorithmsis the prediction suffix tree (PST), an efficient representation of a mixed-orderMarkov chain.The smallest encoded context tree (SECT) algorithm constructs PSTs fromdata, based on the minimum description length principle. It has no user-specifiableparameters to tune, and will expand the depth of the resulting PST as far asthe data set allows it, making it a self-bounded algorithm. It is also faster thanthe original PST inference algorithm.The hidden SECT algorithm replaces the underlying Markov chain of anHMM with a prediction suffix tree, which is inferred using SECT. The algorithmis efficient and integrates well with standard techniques.The properties of the SECT and hidden SECT algorithms are verified on syntheticdata. The hidden SECT algorithm is also compared with a fixed-orderHMM training algorithm on an automatic language recognition task, where theresulting mixed-order HMMs are shown to be smaller and train faster than thefixed-order models, for similar classification accuracies.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]