Similarity measure is one of the keys of content-based audio analysis. Traditionally, statistic methods are used for accomplishing the aim. But it is hard to represent the results in a visual way and it shows very weak relationship with semantic layer. An image segmentation-based method was introduced to partly solve the problem. Features were extracted respectively to form a so-called feature space. The distance correlation images were calculated by comparing the feature vectors. A maximal similar direction was estimated by segmentation on the image to calculate local similarity and global similarity. Experiment was also provided. The method was proved to be suitable for applications like clip searching in digital broadcasting stream.