A comparison of Gaussian mixture variants with application to automatic phoneme recognition
[摘要] The diagonal covariance Gaussian Probability Density Function (PDF) has been a verypopular choice as the base PDF for Automatic Speech Recognition (ASR) systems. Theonly choices thus far have been between the spherical, diagonal and full covariance GaussianPDFs. These classic methods have been used for some time, but no single document could befound that contains a comparative study on these methods in the use of Pattern Recognition(PR).There also is a gap between the complexity and speed of the diagonal and full covarianceGaussian implementations. The performance differences in accuracy, speed and size betweenthese two methods differ drastically. There is a need to find one or more models that coverthis area between these two classic methods.The objectives of this thesis are to evaluate three new PDF types that fit into the areabetween the diagonal and full covariance Gaussian implementations to broaden the choicesfor ASR, to document a comparative study on the three classic methods and the newlyimplemented methods (from previous work) and to construct a test system to evaluate thesemethods on phoneme recognition.The three classic density functions are examined and issues regarding the theory, implementationand usefulness of each are discussed. A visual example of each is given to showthe impact of assumptions made by each (if any).The three newly implemented PDFs are the Sparse-, Probabilistic Principal ComponentAnalysis- (PPCA) and Factor Analysis (FA) covariance Gaussian PDFs. The theory, implementationand practical usefulness are shown and discussed. Again visual examples areprovided to show the difference in modelling methodologies.The construction of a test system using two speech corpora is shown and includes issuesinvolving signal processing, PR and evaluation of the results. The NTIMIT and AST speechcorpora were used in initialisation and training the test system. The usage of the system toevaluate the PDFs discussed in this work is explained.The testing results of the three new methods confirmed that they indeed fill the gapbetween the diagonal and full covariance Gaussians. In our tests the newly implementedmethods produced a relative improvement in error rate over a similar implemented diagonalcovariance Gaussian of 0.3–4%, but took 35–78% longer to evaluate. When compared relativeto the full covariance Gaussian the error rates were 18–22% worse, but the evaluation timeswere 61–70% faster. When all the methods were scaled to approximately the same accuracy,all the above methods were 29–143% slower than the diagonal covariance Gaussian (excluding the spherical covariance method).
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]