基于多模态的输送带撕裂大模型算法设计

王学立; 赵辰燃; 李青; 何显能; 甘梅

doi:10.13347/j.cnki.mkaq.2023.09.027

基于多模态的输送带撕裂大模型算法设计

Algorithm design of large model of belt tearing based on multi-modality

摘要

摘要: AI矿山大模型是一种基于人工智能的矿山智能化解决方案，利用大数据、深度学习、机器学习等技术，可以帮助矿山企业提高生产效率和安全性。在输送带撕裂检测方面，基于AI矿山大模型设计了基于Transformer处理多模态数据的一种网络结构，提出了DETR-Audio模型，将视频和音频的多模态数据拼接、融合，采用DERT模型对视频进行编码，利用短时傅里叶变换对音频信号进行时频谱分析，再对两者的特征向量进行拼接、融合，最后传入解码器进行融合解码。该模型经过3 000张煤矿矿井下输送带的图片以及相应的音频数据训练和测试后，表现良好，比单独使用视频或音频信息的模型具有更高的检测准确度和鲁棒性。

Abstract: The AI mine model is a mine intelligent solution based on artificial intelligence. Using big data, deep learning, machine learning and other technologies, it can help mining companies improve production efficiency and safety. In terms of conveyor belt tear detection, a network structure based on Transformer to process multi-modal data was designed based on the large AI mine model, and the DETR-Audio model was proposed to splicing and fusing multi-modal data of video and audio, using the DERT model to encode the video, use the short-time Fourier transform to analyze the time-spectrum of the audio signal, then splice and fuse the feature vectors of the two, and finally pass them into the decoder for fusion decoding. After being trained and tested on 3,000 pictures of underground conveyor belts in coal mines and corresponding audio data, the model performed well, with higher detection accuracy and robustness than models using video or audio information alone.

HTML全文

参考文献(21)

施引文献

资源附件(0)