Text classification is a supervised learning task for assigning text document to one or more predefined classes/topics. These topics are determined by a set of training documents. In order to construct a classification model, a machine learning algorithm was used. The training model is used to predict a class for new coming document. In this paper, we propose a text classification approach based on automatic keywords extraction with different thresholes. We use 3000 Vietnamese text documents, which belong to ten topics, downloaded from two electronic magazines vnexpress.net and vietnamnet.vn to create ten sets of the keywords. These keywords are used to predict the topic of new text document. The experimental results confirm the feasibility of proposed model.
Tạp chí khoa học Trường Đại học Cần Thơ
Lầu 4, Nhà Điều Hành, Khu II, đường 3/2, P. Xuân Khánh, Q. Ninh Kiều, TP. Cần Thơ
Điện thoại: (0292) 3 872 157; Email: tapchidhct@ctu.edu.vn
Chương trình chạy tốt nhất trên trình duyệt IE 9+ & FF 16+, độ phân giải màn hình 1024x768 trở lên