The fast generation method based on lattice segmentation for high-quality confusion network

被引:0
|
作者
Wang H. [1 ,2 ]
Han J. [1 ]
机构
[1] School of Computer Science and Technology, Harbin Institute of Technology
[2] School of Information Science and Technology, Qingdao University of Science and Technology
来源
关键词
Confusion network; Lattice; Multi-candidates; Speech recognition;
D O I
10.3772/j.issn.1002-0470.2010.05.006
中图分类号
学科分类号
摘要
Aimed at the problem that the existing confusion network generating methods cannot keep a tradeoff between the network generation speed and the quality of confusion network, the paper investigates two major lattice segmentation methods with the purpose of using them to reduce the impacts of segmentation to the quality of confusion networks, and based on this, presents a high-quality method for fast generating confusion networks based on lattice segmentation. The method segments the large-scale lattice from automatic speech recognition (ASR) into sequences of smaller sub-lattices and then generates the confusion networks from these sub-lattices, thus remarkably decreasing the computation scale and increasing the network generating speed. The balance between the generation speed and the network quality is controlled by the segmentation number. The experimental results show that the proposed method can significantly improve the speed of confusion network generation while hold almost the same quality compared with the traditional word-clustering method without lattice segmentation. At the same speed, the proposed method can obtain a lower tonal syllable error rate than the word-clustering method with lattice pruning.
引用
收藏
页码:473 / 480
页数:7
相关论文
共 12 条
  • [1] Mangu L., Brill E., Stolcke A., Finding consensus in speech recognition: word error Minimization and other applications of confusion networks, Computer Speech and Language, 14, 4, pp. 373-400, (2000)
  • [2] Goel V., Kumar S., Byrne W., Segmental minimum Bayes-risk decoding for automatic speech recognition, IEEE Transactions on Speech and Audio Processing, 12, 3, pp. 234-249, (2004)
  • [3] Hakkani-Tur D., Bechet F., Riccardi G., Et al., Beyond ASR 1-best: using word confusion networks in spoken language understanding, Computer Speech and Language, 20, 4, pp. 495-514, (2006)
  • [4] Bertoldi N., Zens R., Federico M., Speech translation by confusion network decoding, Proceedings of the 2007 International Conference on Acoustics, Speech, and Signal Processing, pp. 1297-1300, (2007)
  • [5] Hillard D., Ostendorf M., Stolcke A., Et al., Improving automatic sentence boundary detection with confusion networks, Proceedings of the 2004 Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, pp. 69-72, (2004)
  • [6] Shao J., Zhao Q.W., Zhang P.Y., Et al., A fast fuzzy keyword spotting algorithm based on syllable confusion network, Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp. 2405-2408, (2007)
  • [7] Hillard D., Ostendorf M., Compensation forward posterior estimation bias in confusion networks, Proceedings of the 2006 International Conference on Acoustics, Speech, and Singnal Processing, pp. 1153-1156, (2006)
  • [8] Quiniou S., Anquetil E., Use of a confusion network to detect and correct errors in an on-line handwritten sentence recognition system, Proceedings of the 9th International Conference on Document Analysis and Recognition, pp. 382-386, (2007)
  • [9] Allauzen A., Error detection in confusion network, Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp. 1749-1752, (2007)
  • [10] Hakkani-Tur D., Riccardi G., A general algorithm for word graph matrix decomposition, Proceedings of the 2003 International Conference on Acoustics, Speech, and Singnal Processing, pp. 596-599, (2003)