Exploring the Impact of Gen-AI Tools on Text Data Augmentation: A Case Study in Lithuanian Educational Data Classification

July 18, 2025

The literature review highlights the effectiveness of various machine learning models for classification tasks, focusing on traditional algorithms such as multi-layer perceptron (MLP), random forest (RF), gradient-boosted trees (GBT), k-nearest neighbors (kNN), decision trees (DT), and naive Bayes (NB). This research investigates the impact of text augmentation on classification accuracy, utilizing techniques like bag of words (BoW) and sBERT embeddings for vectorization. By employing a stratified k-fold cross-validation technique, 15,296 models were tested, with hyperparameter optimization enhancing performance metrics such as accuracy, precision, recall, and F1 score.

Primary experiments revealed that dimensionality reduction using Latent Semantic Analysis (LSA) improved accuracy for MLP and kNN models. Overall, the highest accuracy (94.16%) emerged from the kNN algorithm with augmented datasets, particularly when utilizing Gen-AI tools like chatGPT and Copilot. Comparatively, the sBERT method yielded lower accuracies, reaffirming the importance of text augmentation in enhancing model performance.

Source link

{{post_title}}

NO COMMENTS

LEAVE A REPLY Cancel reply

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

Correction: Update on NetGain Systems

Bringing Karnataka’s AI Vision to Life

Imagen Network Enhances Peer Interactions with On-Chain AI Tools Powered by...

NO COMMENTS

LEAVE A REPLY Cancel reply