多模态融合的施工现场工人不安全行为识别方法研究
谢定坤
摘 要
施工现场发生的事故很大程度都是由于人的不安全行为引起的,建筑工人不安全行为是造成建筑安全事故的痼疾。有效识别工人不安全行为并减少工人不安全行为的发生对保障现场施工安全具有重大意义。为此,本文结合计算机视觉与自然语言处理相关技术,探索多模态融合的施工现场工人不安全行为识别方法研究,主要工作包括:
首先,本文阐释了施工现场工人不安全行为与相关理论,梳理了已有研究中关于不安全识别的方法,分析了各类工人不安全行为识别方法的应用效果与存在问题,并据此指出多模态融合的施工现场工人不安全行为识别方法研究的必要性。当前研究主要是基于计算机视觉与深度学习来实现工人不安全行为识别,存在泛化能力弱、缺乏训练数据、数据模态单一、通用性不佳、仍然依赖繁琐的手动操作等问题。
其次,本文提出了多模态融合的施工现场工人不安全行为识别方法,将工人不安全行为图像与安全规则文本中的不安全行为词条进行匹配,以期实现多种不安全行为的自动识别。该方法主要分为三个部分:(1)结合计算机视觉与深度学习等方法,构建基于Faster R-CNN的自底向上注意力机制的目标检测模型,用于自动提取并表示工人不安全行为图像特征;(2)利用自然语言处理与深度学习等方法,自动提取并表示安全规则文本特征;(3)采用堆叠式交叉注意力(Stacked Cross Attention, SCA)的多模态融合方法,对提取的文本特征与图像特征进行多模态融合与相似度计算。
最后,通过实验验证本文提出的多模态融合的施工现场工人不安全行为识别方法的有效性和可行性。实验结果表明,本文提出的多模态融合的施工现场工人不安全行为识别方法,能够自动提取图像语义信息并匹配安全规则文本的对应条目,并较好地同时识别出施工现场图片中存在的多种不安全行为,且其操作过程可以自动化。
本研究提出了多模态融合的施工现场工人不安全行为识别方法,实现了自动识别多种类别的施工现场工人不安全行为,在一定程度上推动泛场景下全自动的、连续的施工现场工人不安全行为识别的发展,对于我国数字建造模式下智慧工地的实现有着积极意义。
关键词:多模态融合;行为识别;不安全行为;计算机视觉;深度学习;注意力网络
Abstract
The accidents in construction sites are largely caused by the unsafe behaviors of people, which is a chronic disease that causes the construction safety problems. It is of great significance to identify and reduce unsafe behavior to ensure the safety of on-site construction. Therefore, this thesis combined computer vision and natural language processing to explore the identification method of workers’ unsafe behavior in construction by multi-mode fusion. The mainly work can be summarized as follows:
Firstly, the unsafe behavior of construction workers and related theories were elucidated, in being methods of unsafe behavior identification were interpreted and application effect and existing problems of various methods for workers’ unsafe behavior identification were analyzed to concluding the necessity of conducting the research of identification method of workers’ unsafe behavior in construction by multi-mode fusion. The present studies of workers' unsafe behavior identification were mainly based on computer vision and deep learning, which have obvious problems, such as weak generalization ability, lack of training data, single data mode, poor versatility, relying on tedious manual operation and so on.
Secondly, a multimodal fusion method for identifying workers' unsafe behavior in construction site was proposed, matching with the text of unsafe behavior lists to realize the automatic identification of variety kinds of unsafe behavior. This method was realized in 3 steps: (1) Combining computer vision and deep learning to establish a target detection model based on the bottom-up attention mechanism of Faster R-CNN, which is used to extract and represent the image features of workers' unsafe behavior automatically. (2) Utilizing natural language processing and deep learning methods to extract and represent security rule text characteristics automatically. (3) Using multimodal fusion method of stacked cross attention (SCA) to realize multimodal fusion and similarity calculation for extracted text features and image features.
Finally, the effectiveness and feasibility of the multi-mode fusion method was verified by experiment. The results shown that the multimodal fusion method proposed in this paper can automatically the extract semantic information of image and match the corresponding entries of safety rules text, and simultaneously identify the unsafe behaviors in images of construction site, and the operation process can be automated.
This research has put forward a multi-modal fusion method to identify the workers' unsafe behavior in construction, realized automatic identification of variety kinds of workers' unsafe behavior in construction site, which promoted the development of automatic and continuous identification of workers’ unsafe behavior in construction under generic scenarios and has positive significance for realization of intelligent construction sites under digital construction mode in China.
Keywords: Multimodal fusion; Behavior identification; Unsafe behavior; Computer vision; Deep learning; Attention network