Twitter-based traffic information system based on vector representations for words

Dabiri, Sina; Heaslip, Kevin

计算机科学 > 信息检索

arXiv:1812.01199 (cs)

[提交于 2018年12月4日 ]

标题：基于词向量表示的基于 Twitter 的交通信息系统

标题： Twitter-based traffic information system based on vector representations for words

Authors:Sina Dabiri, Kevin Heaslip

摘要：近期，研究人员对利用Twitter数据进行交通状况动态监测表现出了更大的兴趣。词袋表示法是文献中用于推文建模和提取交通信息的一种常见方法，但它受到维度灾难和稀疏性的影响。为了解决这些问题，我们的具体目标是在词嵌入的基础上提出一个简单而稳健的框架，以区分与交通相关的推文和非交通相关的推文。在我们提出的模型中，如果一个推文中的单词与其一小组交通关键词之间的语义相似度超过阈值，则将其分类为与交通相关。词之间的语义相似度通过词嵌入模型捕获，这是一种无监督学习工具。所提出的模型非常简单，只有一个可训练参数。该模型利用了突出的优点，并通过多个评估步骤进行了验证。我们提出的模型的最佳测试准确率为95.9%。

摘要： Recently, researchers have shown an increased interest in harnessing Twitter data for dynamic monitoring of traffic conditions. Bag-of-words representation is a common method in literature for tweet modeling and retrieving traffic information, yet it suffers from the curse of dimensionality and sparsity. To address these issues, our specific objective is to propose a simple and robust framework on the top of word embedding for distinguishing traffic-related tweets against non-traffic-related ones. In our proposed model, a tweet is classified as traffic-related if semantic similarity between its words and a small set of traffic keywords exceeds a threshold value. Semantic similarity between words is captured by means of word-embedding models, which is an unsupervised learning tool. The proposed model is as simple as having only one trainable parameter. The model takes advantage of outstanding merits, which are demonstrated through several evaluation steps. The state-of-the-art test accuracy for our proposed model is 95.9%.

评论：	17页，4个图，7个表格
主题：	信息检索 (cs.IR) ; 机器学习 (cs.LG); 机器学习 (stat.ML)
引用方式：	arXiv:1812.01199 [cs.IR]
	(或者 arXiv:1812.01199v1 [cs.IR] 对于此版本)
	https://doi.org/10.48550/arXiv.1812.01199

提交历史

来自： Sina Dabiri [查看电子邮件]
[v1] 星期二， 2018 年 12 月 4 日 03:28:28 UTC (788 KB)

计算机科学 > 信息检索

标题：基于词向量表示的基于 Twitter 的交通信息系统

标题： Twitter-based traffic information system based on vector representations for words

提交历史

获取论文：

参考文献与引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

计算机科学 > 信息检索

标题： 基于词向量表示的基于 Twitter 的交通信息系统 显示英文标题

标题： Twitter-based traffic information system based on vector representations for words

提交历史

获取论文：

参考文献与引用

BibTeX 格式的引用

收藏

文献和引用工具

与本文相关的代码，数据和媒体

演示

推荐器和搜索工具

arXivLabs：与社区合作伙伴的实验项目

标题：基于词向量表示的基于 Twitter 的交通信息系统