From Twitter to traffic predictor: Next-day morning traffic prediction using social media data
Team: Sean Qian (PI, CMU), Weiran Yao (CMU)
Funding source: NSF, US DOT
Start/End time: 2021-2022
The effectiveness of traditional traffic prediction methods, such as autoregressive or spatio- temporal models, is often extremely limited when forecasting traffic dynamics in early morn- ing. The reason is that traffic can break down drastically during the early morning commute, and the time and duration of this break-down vary substantially from day to day. Early morning traffic forecast is crucial to inform morning-commute traffic management, but they are generally challenging to predict in advance, particularly by midnight (called ‘next-day morning traffic prediction’ thereafter). In this paper, we propose to mine Twitter messages as a probing method to understand the impacts of people’s work and rest patterns in the evening/midnight of the previous day to the next-day morning traffic. The model is tested on freeway networks in Pitts- burgh as experiments. The resulting relationship is surprisingly simple and powerful. We find that, in general, the earlier people rest as indicated from Tweets, the more congested roads will be in the next morning. The occurrence of big events in the evening before, represented by higher or lower tweet sentiment than normal, often implies lower travel demand in the next morning than normal days. Besides, people’s tweeting activities in the night before and early morning (by 5am) are statistically associated with congestion in morning peak hours. We make use of such re- lationships to build a predictive framework which forecasts morning commute congestion using people’s tweeting profiles extracted by 5am. In most cases, the tweet information collected by the midnight before is sufficient to make good prediction for next-day morning traffic. The Pittsburgh study supports that this framework can precisely predict morning congestion, particularly for some road segments upstream of roadway bottlenecks with large day-to-day congestion variation, while its prediction performance being no worse than baseline methods on other roads. Through experiments, we demonstrate our approach considerably outperforms those existing methods without Twitter message features, and it can learn meaningful representation of demand from tweeting profiles that offer managerial insights. The proposed social media empowered frame- work can be a promising tool for real-time traffic management and potentially extended for traffic prediction at other times of day.