摘要

Objective Steganography is a novel of technology that involves the embedding of hidden information into digital carriers, such as text, image, voice, or video data. To embed hidden information into the audio carrier with no audio quality loss, audio-based steganography utilizes the redundancy of human auditory and the statistical-based audio carrier among them. The voice-enhanced and packet-loss compensation, and internet low bit rate codec based (iLBC-based) techniques can maintain network-context high voice quality with high packet loss rate, which develops the steganography for the iLBC speech in the field of information hiding in recent years. However, it is challenged to hide information in iLBC due to the high compression issue. Moreover, human auditory system, unlike the human visual system, is highly vulnerable for identifying minor distortions. Most of the existing methods are focused on the processes of linear spectrum frequency coefficient vector quantization, the dynamic codebook searching or the acquired quantization in iLBC. Although these methods have good imperceptibility, they are usually at the expense of steganography capacity, and it is difficult to resist the detection of the deep learning-based steganalysis technology. Therefore, the mutual benefit issue is challenged for the iLBC speech steganography between steganography capacities, imperceptibility, and anti-detection, in which the steganography capacity is as high as possible, the imperceptibility is as good as possible, and the resistance to steganalysis is as strong as possible. We develop a hierarchical-based method of high-capacity steganography in iLBC speech. Method 1) The structure of iLBC bitstream is analyzed. 2) The influence of steganography processes in the linear spectrum frequency coefficient vector quantization, the dynamic codebook search, and the gain quantization on the voice quality is clarified based on the perceptual evaluation of speech quality-mean opinion score (PESQ-MOS) and Mel cepstral distortion (MCD). A hierarchical-based steganography position method is demonstrated to choose invulnerable layers and reduce distortions via gain quantization and the dynamic codebook searching in terms of the steganography capacity and the hierarchy priority. For the unfilled layer, an embedded position-selected method based on the Logistic chaotic map is also developed to improve the randomness and security of steganography. 3) The quantization index module is to embed the hidden information for steganography security better. Result Our hierarchical steganography method realizes the one time extended steganography capacity. Additionally, we adopt the Chinese and English speech data set steganalysis-speech-dataset (SSD) to make comparative experiments, which includes 30 ms and 20 ms frames and 2 s, 5 s, and 10 s speech samples. The experimental results on 5 280 speech samples show that our method can strengthen imperceptibility and alleviate distortions in terms of embedding more hidden information. To validate our anti-detection performance against the deep learning-based steganalyzer, we generate 4 000 original speech samples and 4 000 steganographic speech samples, of which 75% is used as the training set and 25% as the test set. The detection results show that the steganography capacity is less than or equal to 18 bit on 30 ms frame, and 12 bit on 20 ms frame. It can resist the detection of the deep learning-based audio steganalyzer well. Conclusion A hierarchical steganography method with high capacity is developed in the iLBC speech. It has the steganography potential of the iLBC speech for imperceptibility and anti-detection optimization on the premise of the steganography capacity extension.

全文