Statistical language model adaptation based on <i>N</i>-gram distribution

Yin Jihao; Jiang Zhiguo; Fan Xiaozhong

Volume 34 Issue 11

Nov. 2008

Turn off MathJax

Article Contents

Journal of Beijing University of Aeronautics and Astronautics > 2008 > 34(11): 1276-1279.

Yin Jihao, Jiang Zhiguo, Fan Xiaozhonget al. Statistical language model adaptation based on N-gram distribution[J]. Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(11): 1276-1279. (in Chinese)

Citation:

Yin Jihao, Jiang Zhiguo, Fan Xiaozhonget al. Statistical language model adaptation based on N-gram distribution[J]. Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(11): 1276-1279. (in Chinese)

Citation:

PDF( 333 KB)

Statistical language model adaptation based on N-gram distribution

1.
School of Astronautics, Beijing University of Aeronautics and Astronautics, Beijing 100191, China
2. School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China

Received Date: 29 Nov 2007
Publish Date: 30 Nov 2008

Abstract

Abstract

N-gram distribution can represent the characters of corpus correctly. So an approach was proposed for statistical language modeling adaptation, which is based on N-gram distribution. Given a large set of out-of-task training data, called training set, and a small set of task-special training data, called seed set, one statistical language modeling was adapted towards the special domain by adjusting the N-gram distribution in the training set to that in the seed set. The experiment results show prominent improvements over conventional methods. Compared with the simple interpolation method, the perplexity and word error rate decreases 11.1% and 6.9% respectively.
- N-gram distribution,
- seed set,
- training set,
- adaptation

FullText(HTML)

References(1)

References

[1] Jelinek F. Self-organized language modeling for speech recognition // Readings in Speech Recognition. Morgan-Kaufmann, CA: San Mateo, 1990: 450-506 [2] Miller D, Leek T, Schwartz RM. A hidden Markov model information retrieval system //Proceedings of 22nd International Conference on Research and Development in Information Retrieval. USA: Berkeley, 1999: 214-221 [3] Kim W. Language model adaptation for automatic speech recognition and statistical machine translation . USA: Department of Computer Science, Johns Hopkins University, 2004 [4] Iyer R, Ostendorf M, Gish H. Using out-of-domain data to improve in-domain language models[J]. IEEE Signal Processing Letters, 1997, 4(8): 221-223 [5] Rosenfeld R. A maximum entropy approach to adaptive statistical language model[J]. Computer Speech & Language, 1996, 10: 187-228 [6] Jurafsky D, Martin J H. Speech and language processing: an introduction to natural language processing, computational linguistics and speech recognition[M]. USA: Prentice Hall, 2000 [7] Katz S M. Estimation of probabilities from sparse data for the language model component of a speech recognizer[J]. IEEE Transaction Acoustics, Speech and Signal Processing, 1987, 35(3): 400-401 [8] Wu Genqing, Zheng Fang, Wu Wenhu. Improved Katz smoothing for language modeling in speech recognition //International Conference on Spoken Language Processing 2002. USA: Colorado, 2002: 925-928 [9] Manning C D, Sch tze H. Foundation of statistical natural language processing[M]. USA: The MIT Press, 1999 [10] Katz S M. Distribution of content words and phrases in text and language modeling[J]. Natural Language Engineering, 1996, 2: 15-59

Relative Articles

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views(2880) PDF downloads(1939)

Statistical language model adaptation based on N-gram distribution

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Statistical language model adaptation based on N-gram distribution

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Proportional views

Related

Export File

Citation

Format

Content