Volume 34 Issue 11
Nov.  2008
Turn off MathJax
Article Contents
Yin Jihao, Jiang Zhiguo, Fan Xiaozhonget al. Statistical language model adaptation based on N-gram distribution[J]. Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(11): 1276-1279. (in Chinese)
Citation: Yin Jihao, Jiang Zhiguo, Fan Xiaozhonget al. Statistical language model adaptation based on N-gram distribution[J]. Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(11): 1276-1279. (in Chinese)

Statistical language model adaptation based on N-gram distribution

  • Received Date: 29 Nov 2007
  • Publish Date: 30 Nov 2008
  • N-gram distribution can represent the characters of corpus correctly. So an approach was proposed for statistical language modeling adaptation, which is based on N-gram distribution. Given a large set of out-of-task training data, called training set, and a small set of task-special training data, called seed set, one statistical language modeling was adapted towards the special domain by adjusting the N-gram distribution in the training set to that in the seed set. The experiment results show prominent improvements over conventional methods. Compared with the simple interpolation method, the perplexity and word error rate decreases 11.1% and 6.9% respectively.

     

  • loading
  • [1] Jelinek F. Self-organized language modeling for speech recognition // Readings in Speech Recognition. Morgan-Kaufmann, CA: San Mateo, 1990: 450-506 [2] Miller D, Leek T, Schwartz RM. A hidden Markov model information retrieval system //Proceedings of 22nd International Conference on Research and Development in Information Retrieval. USA: Berkeley, 1999: 214-221 [3] Kim W. Language model adaptation for automatic speech recognition and statistical machine translation . USA: Department of Computer Science, Johns Hopkins University, 2004 [4] Iyer R, Ostendorf M, Gish H. Using out-of-domain data to improve in-domain language models[J]. IEEE Signal Processing Letters, 1997, 4(8): 221-223 [5] Rosenfeld R. A maximum entropy approach to adaptive statistical language model[J]. Computer Speech & Language, 1996, 10: 187-228 [6] Jurafsky D, Martin J H. Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition[M]. USA: Prentice Hall, 2000 [7] Katz S M. Estimation of probabilities from sparse data for the language model component of a speech recognizer[J]. IEEE Transaction Acoustics, Speech and Signal Processing, 1987, 35(3): 400-401 [8] Wu Genqing, Zheng Fang, Wu Wenhu. Improved Katz smoothing for language modeling in speech recognition //International Conference on Spoken Language Processing 2002. USA: Colorado, 2002: 925-928 [9] Manning C D, Sch tze H. Foundation of statistical natural language processing[M]. USA: The MIT Press, 1999 [10] Katz S M. Distribution of content words and phrases in text and language modeling[J]. Natural Language Engineering, 1996, 2: 15-59
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views(2832) PDF downloads(1937) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return