Networks and multivariate statistics as applied to biological datasets and wine-related omics
[摘要] ENGLISH ABSTRACT: Introduction: Wine production is a complex biotechnological process aimingat productively coordinating the interactions and outputs of several biologicalsystems, including grapevine and many microorganisms such as wine yeastand wine bacteria. High-throughput data generating tools in theelds ofgenomics, transcriptomics, proteomics, metabolomics and microbiomics arebeing applied both locally and globally in order to better understand complexbiological systems. As such, the datasets available for analysis and mininginclude de novo datasets created by collaborators as well as publicly availabledatasets which one can use to get further insight into the systems under study.In order to model the complexity inherent in and across these datasets it isnecessary to develop methods and approaches based on network theory andmultivariate data analysis as well as to explore the intersections between thesetwo approaches to data modelling, mining and interpretation.Networks: The traditional reductionist paradigm of analysing single componentsof a biological system has not provided tools with which to adequatelyanalyse data sets that are attempting to capture systems-level information.Network theory has recently emerged as a new discipline with which to modeland analyse complex systems and has arisen from the study of real and oftenquite large networks derived empirically from the large volumes of datathat have collected from communications, internet,nancial and biologicalsystems. This is in stark contrast to previous theoretical approaches to understandingcomplex systems such as complexity theory, synergetics, chaostheory, self-organised criticality, and fractals which were all sweeping theoreticalconstructs based on small toy models which proved unable to address thecomplexity of real world systems.Multivariate Data Analysis: Principle components analysis (PCA) andPartial Least Squares (PLS) regression are commonly used to reduce the dimensionality of a matrix (and amongst matrices in the case of PLS) in whichthere are a considerable number of potentially related variables. PCA and PLSare variance focused approaches where components are ranked by the amountof variance they each explain. Components are, by de nition, orthogonal toone another and as such, uncorrelated.Aims: This thesis explores the development of Computational Biology toolsthat are essential to fully exploit the large data sets that are being generatedby systems-based approaches in order to gain a better understanding of winerelatedorganisms such as grapevine (and tobacco as a laboratory-based plantmodel), plant pathogens, microbes and their interactions. The broad aim ofthis thesis is therefore to develop computational methods that can be used inan integrated systems-based approach to model and describe di erent aspectsof the wine making process from a biological perspective. To achieve thisaim, computational methods have been developed and applied in the areas oftranscriptomics, phylogenomics, chemiomics and microbiomics.Summary: The primary approaches taken in this thesis have been the use ofnetworks and multivariate data analysis methods to analyse highly dimensionaldata sets. Furthermore, several of the approaches have started to explore theintersection between networks and multivariate data analysis. This would seemto be a logical progression as both networks and multivariate data analysis arefocused on matrix-based data modelling and therefore have many of their rootsin linear algebra.
[发布日期] [发布机构] Stellenbosch University
[效力级别] [学科分类]
[关键词] [时效性]