Evaluation on Classifier Performance Across Varied Data Types in Public Datasets Analysis

Sang Vu; Nguyen  Hoang Thong; Vo  Van Tu; Nguyen Duy Tan

Articles

Vol. 3 No. 1 (2024): Applied Data Science & AI - Applications

Evaluation on Classifier Performance Across Varied Data Types in Public Datasets Analysis

Sang Vu⁺⁻
Nguyen Hoang Thong
Vo Van Tu
Nguyen Duy Tan

PDF (English)

Soumise: November 18, 2023
Publié-e: 2024-07-17

Résumé

In the rapidly evolving field of machine learning, the performance of classifiers is often influenced by the nature of the data they are applied to. This manuscript presents a comprehensive evaluation of the performance of various machine learning classifiers—Random Forest (RF), XGBoost (XGB), k-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Artificial Neural Networks (ANN)—across three distinct data types: categorical, mixed, and numerical. Utilizing publicly available datasets and default classifier parameters, the study employs the F1-score as the key performance metric. The findings reveal that ensemble methods like RF and XGB excel in handling both categorical and mixed data types, while XGB stands out for numerical data. On the other hand, classifiers like KNN and SVM face challenges with categorical and mixed data types, respectively. Given that the study was conducted using default parameters, future work should focus on hyperparameter tuning to optimize classifier performance across different data types. This research serves as a valuable resource for machine learning practitioners, offering insights into effective model selection based on data characteristics.