摘要

Duplicate Web indexes of Web crawling can be reduced effectively by detecting important changes and determining changes of essential content in Web pages. Therefore, a vision based detection method is proposed to detect changes in different semantic regions of the page and compress the page into a low dimensional vector representation. The proposed method is utilized to understand the difference of semantic importance in different regions from the perspective of users. Compared with the existing methods, the proposed method is independent of the analysis of HTML, and thus it is suitable for new media, such as mobile Internet. Experiments show the effectiveness of the proposed method.

全文