建筑工程质量隐患整改单知识建模与信息抽取研究
向然
摘 要
质量隐患整改单为监理单位下发给施工单位对建筑工程项目中出现的质量问题进行整改的过程性表单,质量隐患整改单含有丰富的质量问题信息,对这些质量问题信息的知识获取和利用有助于工程人员提高对建筑工程质量控制水平。由于质量问题信息为非结构化文本信息且分散在不同质量隐患整改单中,工程人员对质量问题信息的获取和分析是一个耗时费力的过程,造成工程人员无法有效利用已有知识为建筑工程质量管理起到充分的借鉴作用,进而影响即时准确的对建筑工程项目进行质量控制与决策。为了提高工程人员对质量隐患整改单质量问题信息的知识获取和分析效率,本文对该工程文本进行了智能化信息管理研究,具体工作如下:
1. 对质量隐患整改单质量问题信息进行了本体知识建模。在其本体视角下的语义模式基础上,明确了质量隐患整改单质量问题信息的知识模板和语义信息图,确定了信息抽取主要内容为对词义信息的识别。
2. 提出了适用于质量隐患整改单质量问题信息抽取模型,并对该模型的信息抽取结果进行了评价和分析。具体内容为:根据质量隐患整改单质量问题信息数据特点进行数据预处理,在数据预处理的基础上进行了基于CBOW(Continius Bag of Words)模型词向量表示与数据标注。针对训练数据缺少的问题采用了交叉组合数据扩增方法,并对各知识模板类型的质量隐患整改单质量问题信息扩增后数据数量不平衡问题采用了重采样和欠采样方法使训练数据相对均衡。通过Bi-LSTM-CRF(Bidirectional Long Short-Term Memory Conditional Random Field)模型对词义信息进行识别,对识别结果进行了内外部评价,内部评价结果为对质量隐患整改单质量问题信息中的4种词义信息整体识别F1值达到了0.884,对11种知识模板类型的质量隐患整改单质量问题信息的整体识别F1值达到了0.780,采用消融分析对Bi-LSTM-CRF模型进行了外部评价,表明该模型的输入层词向量表示、编码层Bi-LSTM和解码层CRF对模型的识别效果都有部分贡献,其中解码层CRF影响最大。最后基于社交网络图对质量隐患整改单质量问题信息进行了可视化分析。
3. 各知识模板类型的质量隐患整改单质量问题信息的识别结果可依据知识模板对应其语义信息图和本体视角下的语义模式,形成的质量隐患整改单质量问题信息本体的实例。
本研究有助于工程人员以更少的时间与精力获取质量隐患整改单质量问题信息的词义信息和语义信息,帮助工程人员更好的对建筑工程质量控制与决策具有重要作用。
关键词: 质量隐患整改单;建筑工程质量管理;本体知识建模;词义信息抽取;命名实体识别;Bi-LSTM-CRF;可视化
Abstract
The Quality Hidden Danger Rectification Form(QHDRF) is a procedural form issued by the Management Unit to the Construction Unit to rectify the quality issues in the construction project. The QHDRF contains a wealth of Quality Issue Information(QII), and the knowledge acquisition and utilization of QII will help engineers to improve the quality control level of construction projects Because QII is unstructured text information and is scattered in different QHDRF, the acquisition and analysis of QII by engineers is a time-consuming and labour-intensive process, resulting in engineers unable to effectively use existing knowledge and plays a full reference role in the quality management construction projects, which in turn affects the timely and accurate quality control and decision-making of construction projects. In order to improve the knowledge acquisition and analysis efficiency of the QII of QHDRF(QII-QHDRF), and conducts intelligent information management research on this Engineering texts. The specific work is as follows:
1. The ontology knowledge modeling is carried out on the QII-QHDRF. On the basis of the semantic pattern from the perspective of its ontology, the knowledge template and semantic information map of the QII-QHDRF are clarified, and the main content of information extraction is identified as the recognition of lexical information.
2. An information extraction model for the QII-QHDRF is proposed, and evaluate and analyze the information extraction results. The specific content is: Data preprocessing is carried out according to the characteristics of the QII-QHDRF. On the basis of data preprocessing, word embedding representation based on the CBOW model and data annotation are carried out. For the problem of scare of training data, a cross-combination data augmentation method is adopted, and for the imbalanced data quantity problem of the QII-QHDRF of each knowledge template type after data augmentation, the methods of re-sampling and under-sampling are used to make the training data relatively balanced. The lexical information is recognized through the Bi-LSTM-CRF model, and the recognition results were evaluated internally and externally, the internal evaluation results were that the F1 score of the overall four types of lexical information in the QII-QHDRF recognition reached 0.884, and the overall recognition F1 score of the QII-QHDRF for 11 types of knowledge template reached 0.780, the Bi-LSTM-CRF model was externally evaluated by ablation analysis, showing that its input layer word embedding representation, encoding layer Bi-LSTM and decoding layer CRF all contribute to the recognition effect of the model, and the decoding layer CRF has the greatest impact. Finally, based on the social network graph, the QII-QHDRF is visually analyzed.
3. The recognition result of the QII-QHDRF of each knowledge template type can correspond to its semantic information map and the semantic pattern from the perspective of ontology according to the knowledge template, which can form the instance of the QII-QHDRFOntology.
This research helps engineers to obtain the lexical and semantic information of the QII-QHDRF with less time and effort, which is important to help engineers to better control and decision-making on the quality of construction projects.
Key words: QHDRF, Construction Engineering Quality Management, Ontology Knowledge Modeling, Lexical Information Extraction, Named Entity Recognition, Bi-LSTM-CRF, Visualization