Sufficient Dimensional and Data Reduction applied to Twitter text

P. Urruchi Mohino, J. López Fidalgo

We present a framework for dimension reduction applied to text corpora coming from tweets in where we expect to keep predictive power in regression tasks. The main objective will be finding optimal sub-samplings and key dimensions in an unstructured high volume data problem.

The sufficient dimension reduction is obtained through projecting x, which is the sparse matrix formed by the tokens (units) and their frequencies from text corpora, through beta, which is a sparse vector of coefficients coming from an L1 regularised linear model. Such coefficients are estimated through the Maximum A Posteriori method instead of Maximum Likelihood as we propose prior distributions for them. This optimisation problem constricted by the Lasso leads to a simpler model in where we expect to find beta to be sparse.

Regarding data reduction, we expect to be able to formulate an Optimal Experimental Design for achieving maximum informative data points.

Palabras clave: Big Data, Optimal Experimental Design, Text Mining, Bayesian MAP

Programado

GT7-1 Diseño de Experimentos
3 de septiembre de 2019 15:30
I2L5. Edificio Georgina Blanes

Otros trabajos en la misma sesión

Robustez del diseño para modelos de tiempo de fallo acelerado con Censura tipo I

M. J. Rivas López, R. Martín Martín, I. García-Camacha Gutiérrez

Diseño Óptimo de Experimentos para la Ecuación de Antoine en experimentos de destilación

C. de la Calle Arroyo, J. López-Fidalgo, L. Rodríguez-Aragón

Model-Robust Classification in Active Learning

J. López Fidalgo, J. A. Moler Cuiral, D. P. Wiens

Últimas noticias

04/07/19
Programa científico completo disponible

31/05/19
Convocado Premio INE 2019

13/04/19
Inscripción ya abierta

Organizan

Colaboran