Instance-based learning for tweet monitoring and categorization

Gobeill, Julien (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Gaudinat, Arnaud (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Ruch, Patrick (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale)

The CLEF RepLab 2014 Track was the occasion to investigate the robustness of instance-based learning in a complete system for tweet monitoring and categorization based. The algorithm we implemented was a k-Nearest Neighbors. Dealing with the domain (automotive or banking) and the language (English or Spanish), the experiments showed that the categorizer was not affected by the choice of representation: even with all learning tweets merged into one single Knowledge Base (KB), the observed performances were close to those with dedicated KBs. Interestingly, English training data in addition to the sparse Spanish data were useful for Spanish categorization (+14% for accuracy for automotive, +26% for banking). Yet, performances suffered from an overprediction of the most prevalent category. The algorithm showed the defects of its virtues: it was very robust, but not easy to improve. BiTeM/SIBtex tools for tweet monitoring are available within the DrugsListener Project page of the BiTeM website (http://bitem.hesge.ch/).


Type de conférence:
full paper
Faculté:
Economie et Services
Ecole:
HEG GE Haute école de gestion de Genève
Institut:
CRAG - Centre de Recherche Appliquée en Gestion
Classification:
Informatique
Sciences de l’information
Adresse bibliogr.:
Berlin, Springer
Date:
Berlin
Springer
2015
Pagination:
6 p.
Publié dans
Experimental IR meets multilinguality, multimodality, and interaction
DOI:
ISSN:
ISBN 978-3-319-24026-8
Ressource(s) externe(s):
Le document apparaît dans:



 Notice créée le 2015-11-30, modifiée le 2018-08-31

Fichiers:
Télécharger le document
PDF

Évaluer ce document:

Rate this document:
1
2
3
 
(Pas encore évalué)