Instance-based learning for tweet monitoring and categorization

Gobeill, Julien (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Gaudinat, Arnaud (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale) ; Ruch, Patrick (Haute école de gestion de Genève, HES-SO // Haute Ecole Spécialisée de Suisse Occidentale)

The CLEF RepLab 2014 Track was the occasion to investigate the robustness of instance-based learning in a complete system for tweet monitoring and categorization based. The algorithm we implemented was a k-Nearest Neighbors. Dealing with the domain (automotive or banking) and the language (English or Spanish), the experiments showed that the categorizer was not affected by the choice of representation: even with all learning tweets merged into one single Knowledge Base (KB), the observed performances were close to those with dedicated KBs. Interestingly, English training data in addition to the sparse Spanish data were useful for Spanish categorization (+14% for accuracy for automotive, +26% for banking). Yet, performances suffered from an overprediction of the most prevalent category. The algorithm showed the defects of its virtues: it was very robust, but not easy to improve. BiTeM/SIBtex tools for tweet monitoring are available within the DrugsListener Project page of the BiTeM website (http://bitem.hesge.ch/).


Conference Type:
full paper
Faculty:
Economie et Services
School:
HEG - Genève
Institute:
CRAG - Centre de Recherche Appliquée en Gestion
Subject(s):
Informatique
Sciences de l'information
Publisher:
Berlin, Springer
Date:
Berlin
Springer
2015
Pagination:
6 p.
Published in:
Experimental IR meets multilinguality, multimodality, and interaction
Series Statement:
Lecture Notes in Computer Science, 2015, vol. 9283, pp. 235-240
DOI:
ISSN:
ISBN 978-3-319-24026-8
External resources:
Appears in Collection:



 Record created 2015-11-30, last modified 2019-06-11

Fulltext:
Download fulltext
PDF

Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)