The base unit in mandarin speech recognition is phoneme, semi-syllable or syllable. Semi-syllable system has fewer HMM models and needs less computation, thus it's suitable for real-time systems. But due to poor description for the acoustic properties of the speech signal, it generally shows a low performance compared with syllable system. While the system based on syllable or phoneme (tri-phone or di-phone) has much more HMM models, and needs massive computation in training and recognition, which goes against to real-time implementation. The new scheme is a compromised one. The new system is based on semi-syllable system, but the parameters of the entire syllable are used in training phase, so smoothing between two semi-syllable units is introduced. The transition probability between semi-syllables is calculated, and the two semi-syllable HMMs are connected into a full syllable HMM in recognition phase. This can increase the system performance without increasing HMM models, and it's fit for real-time systems with DSP kernel.