Measuring, refining and calibrating speaker and language information extracted from speech
[摘要] ENGLISH ABSTRACT: We propose a new methodology, based on proper scoring rules, for the evaluationof the goodness of pattern recognizers with probabilistic outputs. Therecognizers of interest take an input, known to belong to one of a discrete setof classes, and output a calibrated likelihood for each class. This is a generalizationof the traditional use of proper scoring rules to evaluate the goodnessof probability distributions. A recognizer with outputs in well-calibrated probabilitydistribution form can be applied to make cost-effective Bayes decisionsover a range of applications, having di fferent cost functions. A recognizerwith likelihood output can additionally be employed for a wide range of priordistributions for the to-be-recognized classes.We use automatic speaker recognition and automatic spoken languagerecognition as prototypes of this type of pattern recognizer. The traditionalevaluation methods in thesefields, as represented by the series of NIST Speakerand Language Recognition Evaluations, evaluate hard decisions made by therecognizers. This makes these recognizers cost-and-prior-dependent. The proposedmethodology generalizes that of the NIST evaluations, allowing for theevaluation of recognizers which are intended to be usefully applied over a widerange of applications, having variable priors and costs.The proposal includes a family of evaluation criteria, where each memberof the family is formed by a proper scoring rule. We emphasize two membersof this family: (i) A non-strict scoring rule, directly representing error-rateat a given prior. (ii) The strict logarithmic scoring rule which representsinformation content, or which equivalently represents summarized error-rate,or expected cost, over a wide range of applications.We further show how to form a family of secondary evaluation criteria,which by contrasting with the primary criteria, form an analysis of the goodnessof calibration of the recognizers likelihoods.Finally, we show how to use the logarithmic scoring rule as an objectivefunction for the discriminative training of fusion and calibration of speakerand language recognizers.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]