摘 要
首先,本文提出了结合中文分词技术、TF-IDF(Term Frequency-Inverse Document Frequency)算法和LDA(Latent Dirichlet Allocation)主题模型算法的地铁施工隐患文本主题挖掘方法,基于真实历史数据,识别出地铁施工过程中客观存在的隐患信息;其次,本文以某地地铁2016-2018三年间的隐患文本记录为数据源,利用提出的文本主题挖掘方法,识别出了实际施工过程中存在的隐患类别和各类隐患的排查要点;最后,本文利用字段抽取和人工复核的方式将数据源中的每一条隐患排查记录与隐患类别进行匹配,进一步利用社会网络分析的方法,发现了在不同年份复现率较高的隐患、分布部位较广的隐患以及各类隐患在不同施工部位的发生次数,并将结果可视化展示。
As the direct cause of safety accident, metro construction safety hazards are highly appreciated by our country and metro enterprises. In the context of informatization management, metro enterprises in different areas successively set their own hazards troubleshooting system. With the deep development of metro construction, the system has accumulated numerous unstructured text records on its safety hazards. However, currently, these records are used for store and querys only. Valuable information underlying these records haven’t been found out, which can reflect the law of metro construction safety hazards.
Firstly, combining Chinese Word Segmentation Technology, Term Frequency-Inverse Document Frequency (TF-IDF), and Latent Dirichlet Allocation (LDA), this thesis figures out a topic mining method to identify hazards in the metro construction progress on the basis with real historical data. Then, taking one city’s 2016-2018 metro construction hazards text records as data sources, this thesis recognizes the hazard categories and troubleshooting points during construction progress by using the proposed method. Finally, using field sampling and manual review, this thesis matches the screening hazard records in the data source with identified hazards catagories. Furthermore, with the help of social network analysis technology, the thesis reveals the various hazards occurring conditions of different years and constructing parts, and points out the corresponding occurrence times of various hazards in constructing parts, which are displayed visually as images.
The thesis aims to analyse the massive unstructured texts of metro construction hazards. The proposed method combines text mining and visualization technique, which can realize the transition from text data to visualized information. This method can provide data support for hazards screening in the future. It is also available for metro enterprises to make hazard troubleshooting almanac and use the visualized analysis results on workers safety training, which has important application value.
Keywords: Safety management; Metro construction safety hazards; Text Mining; Latent Dirichlet Allocation topic model; Data Visualization