Integrating Social Media and Large Language Models for Real-Time Traffic Incident Detection

Author(s): Ece, Ceylin (2025)

Abstract:
This paper presents a novel methodology for detecting and classifying traffic incidents from X, formerly known as Twitter, posts and extracting their geographical information in a geocodable format using the Large Language Model (LLM) Meta-LLama-3-8B-Instruct. The prompt-based methodology consists of three phases. First, the X posts were classified as either ‘Traffic Incident (TI)’ or ‘Not Traffic Incident (NTI)’ and further categorized as ‘Ongoing (O)’ or ‘Past (P)’ by the LLM and evaluated against a manually labeled dataset. Second, the LLM was used to geo-parse the TI posts. The geo-parsed posts were geocoded using the HERE Geocoding & Search API. Third, the geocoded X posts were validated against traffic incident reports from the HERE Traffic Incident API. This methodology does not require preprocessing and training the classifiers. Instead, it presents a cost-effective, efficient, and scalable approach. The methodology achieved 98.2% accuracy in traffic incident classification and 91.6% in categorizing temporal status. Additionally, it geo-parsed 100% of TI X posts, achieving a geocoding accuracy of 98.5%. The methodology identified 61% of the traffic incidents earlier than HERE. The results demonstrate that LLMs are effective tools for detecting traffic incidents from unstructured social media data.

Document(s):

Ece_BA_EEMCS.pdf