Exploring optimal log-ratio representations for high-dimensional compositional data with applications to regression analysis in metabolomics
High-throughput technologies are used in biological research to obtain a comprehensive account of molecules in a sample. Because of chemical/physical noise and technical limitations, the raw data require intensive pre-processing which often results in a normalised data set carrying only relative information. Hence, compositional data analysis methods which exploit the structure of relative variation in data are meaningful in this context. We review some recent developments to deal with high-dimensional compositional data in the context of an investigation of the association of rumen metabolite spectral profiles with greenhouse gas emissions in ruminants. In particular, we consider alternatives to determine optimal log-ratio representations of the metabolomic profiles which facilitate regression analysis and identification of the most relevant signals.
Keywords: Compositional data high-dimensional data log-ratio analysis metabolomics.
Other papers in the same session
Latest news
-
7/4/19
Full scientific program available -
5/31/19
INE Award (2019) -
4/13/19
Registration is open