摘要
Objective The task of action recognition is focused on multi-frame images analysis like the pose of the human body from a given sensor input or recognize the in situ action of the human body through the obtained images. Action recog-nition has a wide range of applications in ground truth scenarios, such as human interaction, action analysis and monitoring. Specifically, some illegal human behaviors monitoring in public sites related to bus interchange, railway stations and airports. At present, most of skeleton-based methods are required to use spatio-temporal information in order to obtain good results. Graph convolutional network (GCN) can combine space and time information effectively. However, GCN-based methods have high computational complexity. The integrated strategies of attention modules and multi-stream fusion will cause lower efficiency in the completed training process. The issue of algorithm cost as well as ensuring accuracy is essential to be dealt with in action recognition. Shift graph convolutional network (Shift-GCN) is applied shift to GCN effectively. Shift-GCN is composed of novel shift graph operations and lightweight point-wise convolutions, where the shift graph operations provide flexible receptive fields for spatio-temporal graphs. Our proposed Shift-GCN has its priority with more than 10 × less computational complexity based on three datasets for skeleton-based action recognition However, the featured network is redundant and the internal structural design of the network has not yet optimized. Therefore, our research analysis optimizes it on the basis of lightweight Shift-GCN and finally gets our own integer sparse graph convolutional network (IntSparse-GCN). Method In order to effectively solve the feature redundancy problem of Shift-GCN, we proposes to move each layer of the network on the basis of the feature shift operation that the odd-numbered columns are moved up and the even-numbered columns are moved down and the removed part is replaced with 0. The input and output of is set to an integer multiple of the joint point. First, we adopt a basic network structure similar to the previous network parameters. In the process of designing the number of input and output channels, try to make the 0 in the characteristics of each joint point balanced and finally get the optimization network structure. This network makes the position of almost half of the feature channel 0, which can express features more accurately, making the feature matrix a sparse feature matrix with strong regularity. The network can improve the robustness of the model and the accuracy of recognition more effectively. Next, we analyzed the mask function in Shift-GCN. The results are illustrated that the learned network mask is distributed in a range centered on 0 and the learned weights will focus on few features. Most of features do not require mask intervention. Finally, our experiments found that more than 80% of the mask function is ineffective. Hence, we conducted a lot of experiments and found that the mask value in different intervals is set to 0. The influence is irregular, so we designed an automated traversal method to obtain the most accurate optimized parameters and then get the optimal network model. Not only improves the accuracy of the network, but also reduces the multiplication operation of the feature matrix and the mask vector. Result Our ablation experiment shows that each algorithm improvement can harness the ability of the overall algorithm. On the X-sub dataset, the Top-1 of 1 stream (s) IntSparse-GCN reached 87. 98%, the Top-1 of 1 s IntSparse-GCN + M-Sparse reached 88. 01%; the Top-1 of 2 stream (s) IntSparse-GCN reached 89. 80%, and the Top-1 of 2 s IntSparse-GCN + M-Sparse’s Top-1 reached 89. 82%; 4 stream (s) IntSparse-GCN’ s Top-1 reached 90. 72%, 4 s IntSparse-GCN + M-Sparse’s Top-1 reached 90. 72% ., Our evaluation is carried out on the NTU RGB + D dataset, X-view’ s 1 s IntSparse-GCN + M-Sparse’s Top-1 reached 94. 89%, and 2 s IntSparse-GCN + M-Sparse’s Top-1 reached 96. 21%, and the Top-1 of 4 s IntSparse-GCN + M-Sparse reached 96. 57% through the ablation experiment, the Top-1 of 1s IntSparse-GCN + M-Sparse reached 92. 89%, the Top-1 of 2 s IntSparse-GCN + M-Sparse reached 95. 26%, and the Top of 4 s IntSparse-GCN + M-Sparse-1 reached 96. 77%, which is 2. 17% higher than the original model through the Northwestern-UCLA dataset evaluation. Compared to other representative algorithms, the multiple data sets accuracy and 4 streams have been improved. Conclusion We first proposed a novel method called IntSparse-GCN. A spatial shift algorithm is introduced based on integer multiples of the channel. Such feature matrix is a sparse feature matrix with strong regularity. The matrix facilitates the possibility to optimize the model pruning. To obtain the most accurate optimization parameters, our research analyzed the mask function in Shift-GCN and designed an automated traversal method. Sparse feature matrix and the mask parameter have potential to pruning and quantification further.
- 单位