已收录 272962 条政策
 政策提纲
  • 暂无提纲
Clustering and cluster inference of complex data structures
[摘要] Finite mixtures provide a flexible and powerful tool for fitting univariate and multivariate distributions that cannot be captured by standard statistical distributions. In particular, multivariate mixtures have been widely used to perform modelling and cluster analysis of high-dimensional data in a wide range of applications. Modes of mixture densities have been used with great success for organizing mixture components into homogenous groups. But the results are limited to normal mixtures. Beyond the clustering application existing research in this area has provided fundamental results regarding the upper bound of the number of modes, but they too are limited to normal mixtures.This thesis provides new modality theorems and important analytical results on the upper bound of the number of modes for multivariate t-mixtures and compares them with existing results on normal mixtures. Graphical tools for merging t-mixtures and the effect of degrees-of-freedom are also thoroughly examined.The most important contribution of this thesis are a set of fundamental results on the modality of skewed normal and skewed normal mixtures. First, we show that the topography of high-dimensional skew normal mixtures can be analyzed rigorously in lower dimensions by defining the corresponding ridgeline manifold that contains all critical points, as well as the ridges of the density. But unlike the normal or t-mixtures we need to solve an implicit equation to obtain this manifold. The plot of the elevations on the ridgeline can still be used to develop tools to explore the number of modes and for merging mixture components. Though analytical results on the number of modes cannot be explored any more, the elevation plots lead to a new conjecture on the upper bound on the number of modes of skew normal mixture. Unlike the normal and t-distribution, for skew normal distributions even the one-component counterpart have very interesting modal features. Firstly, as the modes cannot be written in closed form, we design and provide software tools to calculate the modes in any dimensions. We also provide a thorough study exploring the relationship between the means and modes of skew normals and provide fundamental results on the limiting behaviour of the mean and mode as the skewness parameter increases. We also provide another new result showing that though the mean can vary widely as the skewness parameter varies, the mode is a much more robust measure of the central tendency as the mode of skew distribution only varies within a smaller range. Two R-package available on github containing the numerical tools for calculating the modes of skew normals and function specific to merging of skew normal components is provided as part of this thesis. Additionally, application of the merging tool developed of skew normal mixtures is demonstrated using flow-cytomtery data.
[发布日期]  [发布机构] University:University of Glasgow;Department:School of Mathematics and Statistics
[效力级别]  [学科分类] 
[关键词] Modality, number of modes, multivariate t-mixture, skew normal distribution, multivariate skew normal distribution, merging skew normal mixture. [时效性] 
   浏览次数:26      统一登录查看全文      激活码登录查看全文