摘要

Objective The subjects of video steganography and video steganalysis have been widely studied because video is an ideal cover media for achieving high embedding capacity. The booming deep learning technique has been recently introduced to the area of video steganalysis. A few video steganalysis deep neural networks were published to detect the secret embedding in motion vectors (MVs). However, the current deep neural networks (DNNs) for video steganalysis only report mediocre detection accuracies, compared to the traditional handcrafted feature-based steganalysis approaches. It is conjectured that the performance limitation is due to the inadequate information provided for the network. According to the principle of video encoding, we explore the impact of steganographic embedding on different encoding parameters. Our aim is to extend the detection space by searching for abnormalities in coding parameters raised from steganography, so that we construct multiple input channels to improve detection performance of steganalysis networks. Method We first analyze how the motion vector differences (MVDs) can be influenced by the secret embedding on motion vectors (MVs). It is shown that the histogram of MVDs can exhibit visible changes in bin height after the embedding process of MVs. The MVDs convey critical information for revealing MV alteration, so we propose to consider the MVDs as an extra sampling space of the videos steganalysis network in addition to the existing MV and prediction residual spaces. However, the MVDs are irregularly and sparsely distributed in individual frames and are therefore difficult to calibrate among consecutive frames. We deliberately design a method for constructing the input channels of MVD samples, which can be compatible with the existing network architecture. Specifically, two matrices are adopted to record the vertical and horizontal components of MVD. Since the prediction unit (PU) partition varies from frame to frame, we take the minimum 4 × 4 block as the basic sampling unit. The vertical and horizontal components of the MVD of each 4 × 4 block are recorded as one element in vertical MVD matrix and horizontal MVD matrix, respectively. For H. 265 / HEVC (high efficiency video coding) video format, there are some blocks that do not involve inter-frame prediction and thus have no MVs and MVDs. There are also some blocks that use inter-frame prediction but adopt the Merge and Skip modes instead, and therefore only have MVs but no MVDs. For these two types of blocks, the corresponding elements are set to zeros in the MVD matrices. The newly introduced MVD channels can work alone or together with other channels such as MVs and prediction residuals. By incorporating the MVD channels into current video steganalysis networks, we obtain the improved networks for various tasks, including the improved VSRNet (IVSRNet), selection-channel-aware improved VSRNet (SCA-IVSRNet) and quantitative improved VSRNet (Q-IVSRNet). Result We conduct extensive experiments against 5 target steganographic methods with varying resolutions, bit rates and embedding rates. All embedding and detection are operated on H. 265 / HEVC videos. Two of the classical target methods originally designed for H. 264 videos are transplanted to H. 265 / HEVC videos. The rest three targets are recently published H. 265 / HEVC specific steganographic methods. We first evaluate the performance of the MVD-VSRNet that only uses the MVD and prediction residual channels without the MV channels. Increased accuracies are obtained from the MVD-VSRNet compared to the baseline network VSRNet that employs MV and prediction residual channels. The discriminating capability of MVDs for stego videos is thus verified. The IVSRNet, adopting the MV, prediction residual and MVD channels, achieves an even better result. We then evaluate the SCA-IVSRNet, which integrates the IVSRNet with an embedding probability channel. It is shown that the performance of the SCA-IVSRNet exceeds both the IVSRNet and the SCA-VSRNet. We conduct comparisons with several milestone handcrafted feature-based video steganalysis approaches for MV-based steganography, including the adding or subtracting one (AoSO), motion vector reversion-based (MVRB) and near-perfect estimation for local optimality (NPEFLO) algorithms. We also include the local optimality in candidate list (LOCL), the latest state-of-the-art (SOTA) steganalysis method that employs specific feature of H. 265 / HEVC standard. It is shown that the SCA-IVSRNet surpasses all the other methods against the two transplanted target steganography. As for the H. 265 / HEVC specific steganography, the SCA-IVSRNet loses marginally to the NPEFLO and LOCL methods by less than 2% but exceeds the rest methods by around 10% . Among the five targets, the most challenging one does not directly change the MV values. In this case, the SCA-IVSRNet reports accuracies around 67%, only 0. 3% behind the first place LOCL. It is worth noting that the IVSRNet also reaches 63% in this case, verifying again the important role of the proposed MVD channels. Finally, we assess the performance of the Q-IVSRNet on quantitative steganalysis task. The mean absolute errors (MAEs) obtained with the Q-IVSRNet are consistently less than those with the Q-VSRNet, which can be attributed to the effectiveness of MVD channels. Conclusion In this work we aim at improving the detection accuracy of convolutional neural network (CNN)-based steganalyzers for MV-based video steganography. We point out the current input spaces of MVs and prediction residuals do not convey adequate steganalytic information. To solve this problem, we propose to extend the detection space to MVDs. The newly introduced MVD channel is fully compatible with current CNN-based video steganalyzers, leading to several improved steganalysis networks. Extensive experiments are conducted to evaluate the effectiveness of adopting MVD channels. Results show that the improved detection networks not only surpass their precedent versions by a large margin, but also catch up or even exceed some popular handcrafted feature-based steganalyzers. This work has exhibited how to extend the detection space and handle highly unstructured data in the construction of input matrix for CNN-based video steganalysis, which paves a way of designing more effective deep learning networks for video steganalysis. ? 2023 Journal of Image and Graphics.

全文