CN 11-5366/S     ISSN 1673-1530
“风景园林,不只是一本期刊。”

基于大语言模型和社交媒体数据的城市公园公众活动丰富度测度——以上海为例

Measurement of the Public Activity Richness of Urban Park Based on Large Language Models and Social Media Data: A Case Study of Shanghai

  • 摘要:
    目的 基于社交媒体数据的公园研究已成为热点。然而,既有研究依赖单模态数据和自然语言处理(natural language processing, NLP)技术,研究结果的精确度有待提升。随着大语言模型(large language models, LLM)的发展,分析社交媒体数据可实现更精确的城市公园公众活动丰富度解析。
    方法 先利用LLM解析包含文本、图像和视频的多模态社交媒体数据,再运用聚类算法探究用户的情感倾向和活动丰富度,生成活动热力图,构建公园公众活动丰富度的量化方法。
    结果 以传统问卷方法为参照标准,对比分析发现基于多模态数据的LLM分析法的准确性远优于单模态数据分析法,证实了研究方法的有效性。并将LLM分析法应用于上海外环内的20个城市公园,构建出大规模、高精度的公园公众活动丰富度的全景测度方法。
    结论 创新性地利用LLM和多模态社交媒体数据分析城市公园公众活动丰富度,有利于推动人工智能在城市研究领域的学术发展和应用。

     

    Abstract:
    Objective Urban parks are one of the most vital carriers of public services. Public perception and usage of urban parks can significantly impact their management and planning. In recent years, social media data has emerged as a critical source for understanding public interaction within urban spaces, making park analysis based on social media a research hotspot. However, the current research typically focuses on single-mode data analysis (such as text or image), and relies on traditional machine learning and natural language processing (NLP) techniques, which may limit the comprehensiveness and accuracy of research results. Advancements in artificial intelligence, particularly in large language models (LLM), have made significant breakthroughs in language understanding, reasoning, and image recognition, providing the technical foundation for using multi-modal social media data, including image and text, to analyze the rich urban park activities. This research aims to explore the methods for quantitative analysis of multi-modal social media big data to build a more accurate measurement system for park public activity richness.
    Methods Taking Shanghai Gongqing National Forest Park, the most popular and discussed urban park on the social media platform “Xiaohongshu”, as an example, this research employs a combination of classical questionnaire methods, LLM analysis, and traditional classical analysis methods. First, through the design and implementation of a semantic analysis questionnaire, multiple uniform surveys are conducted at the 43 most popular spots in Gongqing National Forest Park to understand public activity preferences and perceptions of different scenes. Descriptive statistical methods are used for analyzing activity intention data. Respondents are presented with images of various park scenes and their locations, and are required to detail their expected activities such as walking, running, or picnicking. The semantic differential (SD) method is used to analyze site perception data. Through statistical analysis of respondents’ ratings on different perception dimensions, a comprehensive perception evaluation of each scene is conducted to help construct quantitative indicators of activity preferences and emotional tendencies. And GIS technology is adopted to visualize public activity richness. Second, for the LLM analysis method, multi-modal data (text, image, video, etc.) from the 43 most popular spots in Gongqing National Forest Park on the Xiaohongshu platform are mined. For text data analysis, the application programming interface (API) of China’s leading LLM, Wenxin Yiyan, was used to extract activity information and calculate sentiment values. This helped identify activities and emotions of “Xiaohongshu” users in the park. For image data analysis, the API of ChatGPT-4 was used to extract activity information. Since LLM can’t directly process videos, the videos were first converted into frames and then analyzed using the same method as for images. The Shannon’s diversity index formula is adopted to calculate activity diversity in combination with the type and quantity of activities extracted from the multi-modal data, based on which a quantitative image of urban park public activity richness is constructed. Third, in the traditional classical analysis method, text data from the multi-modal data (all text portions of “Xiaohongshu” notes) are extracted as original data. The latent dirichlet allocation (LDA) model is adopted for topic modeling analysis, and NLP technology for calculation of sentiment values for each topic. Additionally, the diversity of various activities and sentiment values are combined to construct single-modal data indicators.
    Results This research explores various measurement methods for public activity richness. Using traditional questionnaire perception measurement as a benchmark, correlation analysis is conducted to compare the accuracy of traditional classical analysis and LLM analysis. Statistical results show that LLM analysis can significantly outperforms traditional classical analysis in terms of accuracy for public activity richness and emotional perception data, demonstrating high consistency with the benchmark questionnaire method. And LLM analysis proves superior in evaluating public activity richness. Based on these findings, LLM technology and multi-modal social media data are used to conduct large-scale data retrieval and analysis of the 20 largest urban parks within Shanghai’s Outer Ring, and public activity richness and sub-indicators for these parks are calculated, forming activity portrait for each park, including activity heat data, activity type, and emotional perception data. Moreover, specific suggestions for urban park improvement strategies are provided, achieving a panoramic and high-precision analysis of park public activity richness.
    Conclusion This research innovatively adopts LLM and multi-modal social media data for urban analysis, supporting comprehensive and rapid monitoring of urban park activities and user perceptions from the city scale to a larger scale. This can not only improve research efficiency and accuracy, but also provide scientific evidence for urban park planning and management. The successful application of this method indicates a scholarly transformation and deepening development of artificial intelligence in urban research, holding significant importance for promoting smart city construction and management.

     

/

返回文章
返回