E. del Barrio, H. Inouzhe Valdes, J. Loubes, C. Matrán, A. Mayo-Íscar

We present a strategy for classifying a test sample $X_T$ using a database $X_1,...,X_N$ of classified samples where the high intrinsic variability of the data makes part of the information in the database not suitable. We cluster the database in homogeneous groups, extract a representative template of each group and use it as an initialization for an unsupervised clustering procedure on $X_T$. The resulting partition of $X_T$ is assigned to the closest template and the information of the template or/and the corresponding group of the database is used to classify $X_T$. To implement this strategy we use optimal transport techniques and introduce novel ideas for consensus clustering and optimal relabelling of a cluster based on optimal transport. As an application of our ideas we develop a tool for automated flow cytometry analysis called floWasserTclust.

Keywords: Optimal transport, consensus clustering, flow cytometry, transfer labelling

Scheduled

GT4-1 Multivariate Analysis and Classification
September 3, 2019  3:30 PM
I3L10. Georgina Blanes building


Other papers in the same session

Estimando el centro de un conjunto de Datos Composicionales en sus unidades originales

J. A. Martín-Fernández, V. Pawlowsky-Glahn, J. J. Egozcue

On aggregation of groups and categories in contingency tables

E. Carrizosa, V. Guerrero, D. Romero Morales


Cookie policy

We use cookies in order to be able to identify and authenticate you on the website. They are necessary for the correct functioning of it, and therefore they can not be disabled. If you continue browsing the website, you are agreeing with their acceptance, as well as our Privacy Policy.

Additionally, we use Google Analytics in order to analyze the website traffic. They also use cookies and you can accept or refuse them with the buttons below.

You can read more details about our Cookie Policy and our Privacy Policy.