论文标题

Urban2VEC:将街道视图图像和POI纳入多模式城市社区嵌入

Urban2Vec: Incorporating Street View Imagery and POIs for Multi-Modal Urban Neighborhood Embedding

论文作者

Wang, Zhecheng, Li, Haoyuan, Rajagopal, Ram

论文摘要

了解固有的模式和预测城市的时空特征需要全面代表城市社区。现有作品依靠区域内或区域内连接性来生成邻里表示形式,但未能充分利用社区内的信息丰富但异质的数据。在这项工作中,我们提出了Urban2Vec,这是一个无监督的多模式框架,其中包含街道视图图像和利益点(POI)数据以学习邻里的嵌入。具体来说,我们使用卷积神经网络从街道视图图像中提取视觉特征,同时保持地理空间相似性。此外,我们将每个POI建模为包含其类别,评分和审查信息的单词袋。类似于文档嵌入自然语言处理中的模拟,我们建立了邻域(“文档”)与矢量空间中周围pois的单词之间的语义相似性。通过将视觉,文本和地理空间信息共同编码在邻里表示中,Urban2VEC可以比基线模型更好地实现性能,并且可以与下游预测任务中的完全监督的方法相媲美。在美国三个大都市地区进行的广泛实验也证明了模型的解释性,概括能力及其在邻里相似性分析中的价值。

Understanding intrinsic patterns and predicting spatiotemporal characteristics of cities require a comprehensive representation of urban neighborhoods. Existing works relied on either inter- or intra-region connectivities to generate neighborhood representations but failed to fully utilize the informative yet heterogeneous data within neighborhoods. In this work, we propose Urban2Vec, an unsupervised multi-modal framework which incorporates both street view imagery and point-of-interest (POI) data to learn neighborhood embeddings. Specifically, we use a convolutional neural network to extract visual features from street view images while preserving geospatial similarity. Furthermore, we model each POI as a bag-of-words containing its category, rating, and review information. Analog to document embedding in natural language processing, we establish the semantic similarity between neighborhood ("document") and the words from its surrounding POIs in the vector space. By jointly encoding visual, textual, and geospatial information into the neighborhood representation, Urban2Vec can achieve performances better than baseline models and comparable to fully-supervised methods in downstream prediction tasks. Extensive experiments on three U.S. metropolitan areas also demonstrate the model interpretability, generalization capability, and its value in neighborhood similarity analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源