摘要

Respiratory infectious diseases, such as influenza and tuberculosis, are big threats to human health, making quick and accurate identification of epidemic source of the infectious disease an important issue theoretically and practically in the field of disease spreading and control. Different from the spreading of rumors in social networks or computer viruses in computer networks, epidemic spreading of respiratory infectious diseases relies on human physical interactions, and more complex spreading models. In this review, we first present the formal definition of human contact networks, disease spreading models, and the epidemic source identification problem together with generalization in four dimensions (spreading time, snapshot coverage, quantity of epidemic sources, and candidates of the epidemic source). We also present two evaluation indicators (accuracy and error distance) of source identification algorithms, and algorithm design principles based on Bayesian maximal likelihood evaluation. Consequently we summarize currently existing algorithms, which are based on centrality of epidemic source, belief propagation, Monte-Carlo technique, and minimum description length. The first one of these four categories, whose basis is centrality of epidemic source, employs different kinds of centrality measurements such as rumor centrality, Jordan centrality, dynamical age, and unbiased betweenness centrality. Moreover, algorithms with rumor centrality and Jordan centrality are generalized to more general scenarios, where disease spreads from several sources, or information of the snapshot is incomplete. After the introduction of all these algorithms, we implement and evaluate them on four idealized networks (random tree, scale-free network, small-world network, and random network) as well as two realistic human contact networks (one in a French primary school, and the other in a Chinese university), with various transmission rates. The evaluation results, including accuracy, error distance, and running time, indicate the following four facts: (1) Most source identification algorithms are sensitive to network structures, showing different accuracies, error distances, and running times on different networks; (2) Most algorithms are robust to epidemic parameters; (3) The algorithms based on dynamic message-passing, though time-consuming, locate the epidemic source much more accurately than other algorithms; (4) Among all fast algorithms, the algorithms based on unbiased betweenness centrality have relatively small error distance. Based on experiment results, we recommend different algorithms in different realistic applications: (1) Dynamic message-passing is recommended in the case where running time is not cared; (2) On the contrary, the algorithm based on unbiased betweenness shall be considered if a fast source identification is higly valued, and the Jordan Center Estimation algorithm is much better on random trees; (3) Reverse Greedy and Dynamical Age take both accuracy and running time into consideration on random networks and small-world networks, respectively. Finally, we summarize basic settings of these algorithms, and compare their time and space complexities. The summaries are followed by the discussion of their practical applications, as well as the consequent vaccination strategies. We list research directions of epidemic source identification in the future, including developing more advanced methods of estimating likelihoods to improve the accuracy of source identification, utilizing more information in the spreading process to accelerate existing source identification algorithms, and designing algorithms on dynamic networks to adapt them for realistic scenarios.