Filipino text classification by Universal Language Model Fine-tuning (ULMFiT)

Authors

Mary June A. Ricaña; Francis N. C. Paraan

Source

Proceedings of the 41st Samahang Pisika ng Pilipinas Physics Conference, Siargao, SPP-2023-PB-06 (2023).

Abstract

One of the major obstacles in natural language processing is the scarcity of labeled data for some languages. To tackle this issue, transfer learning techniques like Universal Language Model Fine-tuning (ULMFiT) have emerged as effective solutions. This research paper explores the utilization of ULMFiT for addressing text classification challenges in the Filipino language. We follow the ULMFiT approach, involving pretraining a language model, fine-tuning it, and developing a text classifier. We independently reproduce previous results for a binary text classification task on a dataset of text in Filipino. Additionally, we demonstrate the promising performance of the ULMFiT model on a multi-label classification task, achieving hamming losses as low as ~0.10, which are comparable to previous benchmark results obtained with transformer models.

Structure and Dynamics Group

National Institute of Physics

University of the Philippines Diliman