Content-based audio analysis has become an interesting direction for many researchers. Deep analysis on audio signal segmentation was reviewed. Conventionally, automatic segmentation can be implemented by calculating some audio features like short-term energy, amplitude, fundamental frequency or others, in time-domain or frequency-domain, via referencing to several constant thresholds established in advance. But these methods were found lack of reliability in such applications, because of the complexity of real-time audio signals, together with the fluky changing of environment and various models of acquiring devices. An adaptive threshold adjusting method based on background learning was introduced. On condition of real-time environment, a so-called environment factor was computed iteratively through background learning, and then it was used as a measure to control the fluctuating of real thresholds. To make a balance between efficiency and precision, a state table was introduced to help judging on the types of audio clips. Validity of the methods was proved by a group of experiments.