Integrating Social Media and Large Language Models for Real-Time Traffic Incident Detection

Ece, Ceylin

This paper presents a novel methodology for detecting and classifying traffic incidents from X, formerly known as Twitter, posts and extracting their geographical information in a geocodable format using the Large Language Model (LLM) Meta-LLama-3-8B-Instruct. The prompt-based methodology consists of three phases. First, the X posts were classified as either ‘Traffic Incident (TI)’ or ‘Not Traffic Incident (NTI)’ and further categorized as ‘Ongoing (O)’ or ‘Past (P)’ by the LLM and evaluated against a manually labeled dataset. Second, the LLM was used to geo-parse the TI posts. The geo-parsed posts were geocoded using the HERE Geocoding & Search API. Third, the geocoded X posts were validated against traffic incident reports from the HERE Traffic Incident API. This methodology does not require preprocessing and training the classifiers. Instead, it presents a cost-effective, efficient, and scalable approach. The methodology achieved 98.2% accuracy in traffic incident classification and 91.6% in categorizing temporal status. Additionally, it geo-parsed 100% of TI X posts, achieving a geocoding accuracy of 98.5%. The methodology identified 61% of the traffic incidents earlier than HERE. The results demonstrate that LLMs are effective tools for detecting traffic incidents from unstructured social media data.

Integrating Social Media and Large Language Models for Real-Time Traffic Incident Detection

Author(s): Ece, Ceylin (2025)

Abstract:

Document(s):