摘要
As a fundamental problem in natural language processing, word representation is always widely concerned by the society. Traditional one-hot representations suffer from the data sparsity in practice due to missing semantic relation between words. Different from the one-hot representations, distributed word representations encode the semantic meaning of words as dense, real-valued vectors in a low-dimensional space. As a result, the distributed word representations can alleviate the data sparsity isses. As the inputs of neural network models, distributed word representations have been widely used in natural language processing along with deep learning. From latent semantic indexing to neural language model, researchers have developed various methods to learn distributed word representations. In this paper, we comb the development of models for learning distributed word representations. Furthermore, we find that all these models are built on the distributional hypothesis but with different contexts. From this perspective, we can group these models into two classes, syntagmatic and paradigmatic. Models like latent semantic indexing using documents as the contexts for words to capture the syntagmatic relations between words. While, models like neural language models capture the paradigmatic relations between words by the contexts surrounding the words.Then,we summarize the key challenges and the latest solutions, like representations for polysemous words and rare words, fine-grained semantic modeling, interpretability for distributed word representations, and evaluation for word representation. At last, we give a future outlook on the research and application directions.
- 单位