摘要

At present, more and more crimes are handled by e-mail. The offender's email often contains traces and evidence of the criminal process. Although it is usually very short, it contains obvious evidence of the criminal process. Therefore, how to use it to be reliable evidence and to identify authors is an urgent problem. In this paper, based on reasonable hypothesis, we try to establish a mathematical model to successfully solve this problem by using the combination of analytic hierarchy process (AHP), the SVM intelligent classification model, and the statistical analysis. According to the extracted feature of textual language, we filter out the message set and some representative samples through MySQL. By analyzing the text, we draw five representative features (i.e., word frequency, syntax structure, sentence length, format, and punctuation), which can be used to make up the linear space vector set. We use the improved term frequency-inverse document frequency (TF-IDF) algorithm to calculate the weight of each word and use AHP to re-weight the five elements. Moreover, the space vector model is used to obtain the feature vector of each message. In order to solve the problem of classification model, we use the previously obtained vector set as experimental samples. Then, the multi-class support vector machine (SVM) is used as the final classification model, and the cross-validation is used to determine the model parameters. By randomly partitioning dataset, 80% is used as training set and 20% is used as test set. Finally, experimental results show that the accuracy is more than 95%.

全文