E. del Barrio, H. Inouzhe Valdes, J. Loubes, C. Matrán, A. Mayo-Íscar
We present a strategy for classifying a test sample $X_T$ using a database $X_1,...,X_N$ of classified samples where the high intrinsic variability of the data makes part of the information in the database not suitable. We cluster the database in homogeneous groups, extract a representative template of each group and use it as an initialization for an unsupervised clustering procedure on $X_T$. The resulting partition of $X_T$ is assigned to the closest template and the information of the template or/and the corresponding group of the database is used to classify $X_T$. To implement this strategy we use optimal transport techniques and introduce novel ideas for consensus clustering and optimal relabelling of a cluster based on optimal transport. As an application of our ideas we develop a tool for automated flow cytometry analysis called floWasserTclust.
Keywords: Optimal transport, consensus clustering, flow cytometry, transfer labelling
Scheduled
GT4-1 Multivariate Analysis and Classification
September 3, 2019 3:30 PM
I3L10. Georgina Blanes building