摘要

To solve the problems of high redundancy of behavior feature extraction and inaccurate localization of behavior boundary of R-C3D,an improved behavior detection network(RS-STCBD)based on residual shrinkage and spatio-temporal context is proposed. First,the residual shrinkage structure and soft threshold operation are integrated into the residual module of 3D-ResNet,and a unit of 3D residual shrinkage with channel-adaptive soft thresholds(3D-RSST)is designed. Moreover,multiple 3D-RSSTs are cascaded to construct a feature extraction network to adaptively eliminate redundant information such as noise and background in behavioral features. Second,instead of single convolution,multi-layer convolutions are embedded into the proposed subnet to increase the temporal dimension receptive field of the temporal proposal fragments. Finally,a non-local attention mechanism is introduced into the behavior classification subnet to obtain the spatio-temporal context information of behavior by capturing remote dependencies among high-quality behavior proposals. Experimental results on THUMOS14 and ActivityNet 1.2 datasets show that the mAP@0.5 values of the improved network reach 36.9% and 41.6%,which are 8.0% and 14.8% higher than those of R-C3D,respectively. The behavior detection method based on the improved network,which increases the accuracy of behavior boundary localization and behavior classification,is beneficial and enhances the quality of human-robot interaction in natural scenes. ? 2023 Chinese Academy of Sciences.

全文