Home AI Exploring the Impact of Gen-AI Tools on Text Data Augmentation: A Case...

Exploring the Impact of Gen-AI Tools on Text Data Augmentation: A Case Study in Lithuanian Educational Data Classification

0

The literature review highlights the effectiveness of various machine learning models for classification tasks, focusing on traditional algorithms such as multi-layer perceptron (MLP), random forest (RF), gradient-boosted trees (GBT), k-nearest neighbors (kNN), decision trees (DT), and naive Bayes (NB). This research investigates the impact of text augmentation on classification accuracy, utilizing techniques like bag of words (BoW) and sBERT embeddings for vectorization. By employing a stratified k-fold cross-validation technique, 15,296 models were tested, with hyperparameter optimization enhancing performance metrics such as accuracy, precision, recall, and F1 score.

Primary experiments revealed that dimensionality reduction using Latent Semantic Analysis (LSA) improved accuracy for MLP and kNN models. Overall, the highest accuracy (94.16%) emerged from the kNN algorithm with augmented datasets, particularly when utilizing Gen-AI tools like chatGPT and Copilot. Comparatively, the sBERT method yielded lower accuracies, reaffirming the importance of text augmentation in enhancing model performance.

Source link

NO COMMENTS

Exit mobile version