视觉语言模型在焊接缺陷检测中的应用现状与展望

温浩钰; 王小鹏; 于兴华

doi:10.12073/j.hjxb.20260402002

视觉语言模型在焊接缺陷检测中的应用现状与展望

Status and Prospects of VLM in Welding Defect Detection

摘要

摘要: 围绕视觉语言模型(visual language model，VLM )能否为焊接缺陷检测提供超越传统深度视觉的实质增益这一问题，以表面缺陷、内部缺陷与焊缝成形异常3类检测对象为范围，对相关文献进行了系统分析.结果表明该研究方向仍处于早期探索阶段，与现有的缺陷检测算法互补性，具有较大的发展潜力.现有VLM的优点表现为语义解释、少样本识别引导与结构化报告生成等高层任务；缺点表现为对X射线、超声等非自然图像数据的特征提取能力不足；在细粒度像素定位方面检测精度落后于基于CNN网络的目标检测算法；实时边缘部署场景中的推理延迟大.因此认为当前更合理的工程路径是混合互补架构—由传统视觉方法承担精确定位与实时前置检测，VLM负责上层语义解释与结构化报告输出，两者形成分层协作而非替代关系.

Abstract: Focusing on the problem that whether the visual language model ( VLM ) can provide a substantial gain beyond the traditional depth vision for welding defect detection, the relevant literature was systematically analyzed based on the three types of detection objects : surface defects, internal defects and weld formation abnormalities. The results show that the research direction is still in the early stage of exploration, which is complementary to the existing defect detection algorithms and has great development potential. The advantages of the existing VLM are high-level tasks such as semantic interpretation, few-sample identification guidance, and structured report generation. The disadvantages are as follows : the feature extraction ability of X-ray, ultrasound and other non-natural image data is insufficient ; in terms of fine-grained pixel positioning, the detection accuracy lags behind the target detection algorithm based on CNN network. The inference delay in the real-time edge deployment scenario is large. Therefore, it is considered that the current more reasonable engineering path is a hybrid complementary architecture - the traditional visual method is responsible for precise positioning and real-time pre-detection, and the VLM is responsible for the upper semantic interpretation and structured report output. The two form a hierarchical collaboration rather than an alternative relationship.

HTML全文

参考文献(84)

施引文献

资源附件(0)