University of Twente Student Theses
The ChatGPT Effect: An Analysis of Topic Modeling and User Interaction on StackOverflow
Niculae, T.F. (2024) The ChatGPT Effect: An Analysis of Topic Modeling and User Interaction on StackOverflow.
PDF
2MB |
Abstract: | This thesis investigates the evolving dynamics of user interactions on Stack Overflow (SO) with the help of advanced topic modeling techniques. By employing methods such as Latent Dirichlet Allocation (LDA), BERTopic, BERTopic fine-tuned with KeyBERT and POS, and BERTopic quantized with LLaMA-3-8B, this thesis analyzed shifts in discussion topics across two distinct periods: April 2021-2022 and April 2023-2024. This research highlights the superiority of BERTopic quantized with LLaMA-3-8B, which greatly improves the coherence and diversity of topics compared to traditional models like LDA. The findings reveal that topics shifted from data manipulation and web development in 2021-2022 to cloud services, deployment strategies and modern JavaScript frameworks in 2023-2024. Additionally, the thesis investigates the impact of generative AI, specifcally ChatGPT, on user interactions and content quality on SO. The analysis reveals a notable decrease in overall activity on SO, with fewer questions being posted and answered, slower response time and less average view counts in the later period. Despite the decline in activity, there was an increase in the complexity and detail of the posts. The study also found a shift in the popularity of certain technologies, with newer tools and frameworks gaining traction over traditional ones, such as tags related to AI, large language models and ChatGPT that saw an increase, reflecting the impact of these technologies on the types of questions asked. Through comprehensive empirical analysis, the study addresses research questions related to the evolving landscape of SO discussions. Beyond the empirical analysis between the two periods of time and comparing the different models for extracting the topics, this thesis serves also as a replicable pipeline that includes data gathering, preprocessing, and the application of novel large language model to improve BERTopic for automatic topic extraction. This pipeline offers a practical solution for enhancing SO’s tagging system, which currently relies on simple tags like programming languages or high-level tasks. By improving content discoverability, this approach could help SO regain user engagement on the platform. |
Item Type: | Essay (Master) |
Faculty: | EEMCS: Electrical Engineering, Mathematics and Computer Science |
Subject: | 54 computer science |
Programme: | Computer Science MSc (60300) |
Link to this item: | https://purl.utwente.nl/essays/103803 |
Export this item as: | BibTeX EndNote HTML Citation Reference Manager |
Repository Staff Only: item control page