P. Urruchi Mohino, J. López Fidalgo

We present a framework for dimension reduction applied to text corpora coming from tweets in where we expect to keep predictive power in regression tasks. The main objective will be finding optimal sub-samplings and key dimensions in an unstructured high volume data problem.

The sufficient dimension reduction is obtained through projecting x, which is the sparse matrix formed by the tokens (units) and their frequencies from text corpora, through beta, which is a sparse vector of coefficients coming from an L1 regularised linear model. Such coefficients are estimated through the Maximum A Posteriori method instead of Maximum Likelihood as we propose prior distributions for them. This optimisation problem constricted by the Lasso leads to a simpler model in where we expect to find beta to be sparse.

Regarding data reduction, we expect to be able to formulate an Optimal Experimental Design for achieving maximum informative data points.

Keywords: Big Data, Optimal Experimental Design, Text Mining, Bayesian MAP

Scheduled

GT7-1 Design of Experiments
September 3, 2019  3:30 PM
I2L5. Georgina Blanes building


Other papers in the same session

Robustez del diseño para modelos de tiempo de fallo acelerado con Censura tipo I

M. J. Rivas López, R. Martín Martín, I. García-Camacha Gutiérrez

Diseño Óptimo de Experimentos para la Ecuación de Antoine en experimentos de destilación

C. de la Calle Arroyo, J. López-Fidalgo, L. Rodríguez-Aragón

Model-Robust Classification in Active Learning

J. López Fidalgo, J. A. Moler Cuiral, D. P. Wiens


Cookie policy

We use cookies in order to be able to identify and authenticate you on the website. They are necessary for the correct functioning of it, and therefore they can not be disabled. If you continue browsing the website, you are agreeing with their acceptance, as well as our Privacy Policy.

Additionally, we use Google Analytics in order to analyze the website traffic. They also use cookies and you can accept or refuse them with the buttons below.

You can read more details about our Cookie Policy and our Privacy Policy.