纠错输出编码的留一误差界估计

薛爱军; 王晓丹

doi:10.13700/j.bh.1001-5965.2017.0031

纠错输出编码的留一误差界估计

doi: 10.13700/j.bh.1001-5965.2017.0031

薛爱军,
王晓丹^,

空军工程大学防空反导学院, 西安 710051

基金项目:

国家自然科学基金 61273275

国家自然科学基金 61703426

详细信息

作者简介:
薛爱军男, 博士研究生。主要研究方向:模式识别

王晓丹女, 教授, 博士生导师。主要研究方向:机器学习

通讯作者:
王晓丹, E-mail: wang_afeu@126.com

中图分类号: TP391
计量
- 文章访问数: 836
- HTML全文浏览量: 73
- PDF下载量: 361
- 被引次数: 3
出版历程
- 收稿日期: 2017-01-17
- 录用日期: 2017-05-12
- 网络出版日期: 2018-01-20

Leave-one-out error bounds estimation for error correcting output codes

XUE Aijun,
WANG Xiaodan^,

Air and Missile Defense College, Air Force Engineering University, Xi'an 710051, China

Funds:

National Natural Science Foundation of China 61273275

National Natural Science Foundation of China 61703426

More Information

Corresponding author: WANG Xiaodan, E-mail: wang_afeu@126.com

摘要

摘要:
纠错输出编码（ECOC）作为分解框架，将多类分类问题转化为二类分类问题，是解决多类分类问题的有效手段。为了提高ECOC的泛化性能，对ECOC基分类器的设计问题进行了研究。解决这一问题的关键是对ECOC的泛化性能进行估计。留一（LOO）误差作为泛化性能的无偏估计，研究了ECOC留一误差界的估计问题。先给出了ECOC留一误差的定义，基于此定义，再给出了基分类器为支持向量机（SVM），解码方法为线性损失函数解码时，ECOC留一误差的上界和下界。在人工数据集和UCI数据集上的实验表明，ECOC留一误差的上界可以指导基分类器的参数选择，通过基分类器设计可以提高ECOC的泛化性能。此外，ECOC的训练误差可以作为ECOC留一误差的下界，对ECOC留一误差下界的研究可以作为未来的研究方向。
- 模式识别 /
- 多类分类 /
- 纠错输出编码(ECOC) /
- 泛化性能(LOO) /
- 留一误差
Abstract:
Error correcting output codes (ECOC) is a decomposition framework, which can transform a complex multiclass classification problem into a series of two-class classification problems. It can complete one multiclass classification task efficiently. To improve its generalization performance, we studied the design of its base classifier, which is also known as model selection in ECOC. The key point is how to estimate the generalization error of ECOC. Leave-one-out (LOO) error is an almost unbiased estimator of generalization error, so we studied how to estimate the LOO error bounds for ECOC. First, we provided the definition of LOO error for ECOC. And then, based on this definition, upper bound and lower bound of LOO error for ECOC was given under the condition that base classifiers were support vector machines (SVM) and decoding method was linear loss function. The experiments on synthetic dataset and UCI dataset show that the upper bound of LOO error for ECOC leads to good estimates of parameters in base classifiers, and designing base classifiers can improve the generalization performance of ECOC. Furthermore, we also report that training error is one lower bound of LOO error for ECOC, and the application of this lower bound should be studied in the future.
- pattern recognition /
- multiclass classification /
- error correcting output codes (ECOC) /
- generalization performance /
- leave-one-out (LOO) error

HTML全文

图 1 4种常见的ECOC

Figure 1. Four commonly-used ECOCs

下载: 全尺寸图片幻灯片

图 2 人工数据集的数据分布

Figure 2. Data distribution of synthetic dataset

下载: 全尺寸图片幻灯片

图 3 人工数据集上不同核参数和正则化参数对应的留一误差和留一误差上下界

Figure 3. LOO error and LOO error's upper and lower bounds with different kernel parameters andregularization parameters on synthetic dataset

下载: 全尺寸图片幻灯片

图 4 UCI数据集上不同核参数下20重交叉验证的结果及留一误差上下界

Figure 4. 20-fold cross validation results and LOO error's upper and lower bound withdifferent kernel parameters on UCI datasets

下载: 全尺寸图片幻灯片

图 5 UCI数据集上不同正则化参数下20重交叉验证的结果及留一误差上下界

Figure 5. 20-fold cross validation results and LOO error's upper and lower bound withdifferent regularization parameters on UCI datasets

下载: 全尺寸图片幻灯片

表 1 人工数据集的参数设置

Table 1. Parameter setting for synthetic dataset

类别先验概率平均值向量协方差矩阵

C₁ P(C₁)= μ₁=(0, 0)^T Σ₁=

C₂ P(C₂)= μ₂=(0, 5)^T Σ₂=

C₃ P(C₃)= μ₃=(5, 0)^T Σ₃=

C₄ P(C₄)= μ₄=(5, 5)^T Σ₄=

C₅ P(C₅)= μ₅=(2, 3)^T Σ₅=

下载: 导出CSV

表 2 实验中用到的UCI数据集

Table 2. UCI datasets used in experiment

数据集样本个数特征维数类别数

vowel 990 13 11

balance 625 4 3

glass 214 10 6

vehicle 846 18 4

letter 1 214 16 26

segmentation 2 310 19 7

下载: 导出CSV

参考文献(20)

[1]	NI J, XU X Z, DING S F, et al.An adaptive extreme learning machine algorithm and its application on face recognition[J].International Journal of Computing Science and Mathmatics, 2015, 6(6):611-619. doi: 10.1504/IJCSM.2015.073601
[2]	QURESHI M S, QURESHI M B, NABI M G, et al.Handwritten digit recognition system using neural network[J].Energy Procedia, 2011, 13:4326-4336. doi: 10.1016/S1876-6102(14)00454-8
[3]	BERKAYA S K, GUNDUZ H, OZSEN O, et al.On circular traffic sign detection and recognition[J].Expert System with Applications, 2016, 48:67-75. doi: 10.1016/j.eswa.2015.11.018
[4]	NITHYA R, SANTHI B.Decision tree classifiers for mass classification[J].International Journal of Signal and Imaging System Engineering, 2015, 8(1/2):39-45. doi: 10.1504/IJSISE.2015.067068
[5]	边肇祺, 张学工.模式识别[M].2版.北京:清华大学出版社, 2000:296-303. BIAN Z Q, ZHANG X G.Pattern recognition[M].2nd ed. Beijing:Tsinghua University Press, 2000:296-303.
[6]	FREUND Y, SHAPIRE R E.A decision-theoretic generalization of online learning and an application to boosting[J].Journal of Computer and System Sciences, 1997, 55(1):119-139. doi: 10.1006/jcss.1997.1504
[7]	DIETTERICH T G, BAKIRI G. Solving multiclass learning problems via error-correcting output codes[J].Journal of Artificial Intelligence Research, 1995, 2(1):263-286.
[8]	BAI X L, NIWAS S I, LIN W S, et al.Learning ECOC code matrix for multiclass classification with application to glaucoma diagnosis[J].Journal of Medical Systems, 2016, 40(4):78. doi: 10.1007/s10916-016-0436-2
[9]	LIU K H, ZENG Z H, NG V T Y.A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data[J].Information Sciences, 2016, 349-350:102-118. doi: 10.1016/j.ins.2016.02.028
[10]	BAUTISTA M A, ESCALERA S, BARO X, et al.On the design of an ECOC-compliant genetic algorithm[J].Pattern Recognition, 2014, 47(2):865-884. doi: 10.1016/j.patcog.2013.06.019
[11]	雷蕾, 王晓丹, 罗玺, 等.基于特征空间变换的纠错输出编码[J].控制与决策, 2015, 30(9):1597-1602. LEI L, WANG X D, LUO X, et al.Error-correcting output codes based on feature space transformation[J].Control and Decision, 2015, 30(9):1597-1602.
[12]	雷蕾, 王晓丹, 罗玺, 等.基于SVDD的层次纠错输出编码研究[J].系统工程与电子技术, 2015, 37(8):1916-1921. doi: 10.3969/j.issn.1001-506X.2015.08.30 LEI L, WANG X D, LUO X, et al.Hierarchical error-correcting output codes based on SVDD[J].Systems Engineering and Electronics, 2015, 37(8):1916-1921. doi: 10.3969/j.issn.1001-506X.2015.08.30
[13]	周进登, 周红建, 杨云, 等.基于神经网络的纠错输出编码方法研究[J].电子学报, 2013, 41(6):1114-1121. doi: 10.3969/j.issn.0372-2112.2013.06.012 ZHOU J D, ZHOU H J, YANG Y, et al.Coding design for error correcting output codes based on neural network[J].Acta Electronica Sinica, 2013, 41(6):1114-1121. doi: 10.3969/j.issn.0372-2112.2013.06.012
[14]	ISMAILOGLU F, SPRINGHUIZEN I G, SMIRNOV E, et al.Fractional programming weighted decoding for error-correcting output codes[J].Lecture Note in Computer Science, 2015, 9132:38-50. doi: 10.1007/978-3-319-20248-8
[15]	PASSERINI A, PONTIL M, FRASCONI P.New results on error correcting output codes of kernel machines[J].IEEE Transactions on Neural Networks, 2004, 15(1):45-54. doi: 10.1109/TNN.2003.820841
[16]	ZHOU J D, WANG X D, ZHOU H J, et al.Decoding design based on posterior probabilities in ternary error-correcting output codes[J].Pattern Recognition, 2012, 45(4):1802-1818. doi: 10.1016/j.patcog.2011.10.009
[17]	雷蕾, 王晓丹, 罗玺, 等.ECOC多类分类研究综述[J].电子学报, 2014, 42(9):1794-1800. doi: 10.3969/j.issn.0372-2112.2014.09.020 LEI L, WANG X D, LUO X, et al.An overview of multi-classification based on error-correcting output codes[J].Acta Electronica Sinica, 2014, 42(9):1794-1800. doi: 10.3969/j.issn.0372-2112.2014.09.020
[18]	CRAMMER K, SINGER Y.On the learnability and design of output codes for multiclass problems[J].Machine Learning, 2002, 47(2-3):201-233.
[19]	ASUNCION A, NEWMAN D. UCI machine learning repository[D]. Irvine: University of California, 2007.
[20]	张海, 徐宗本.学习理论综述(Ⅰ):稳定性与泛化性[J].工程数学学报, 2008, 25(1):1-9. http://www.cnki.com.cn/Article/CJFDTOTAL-GCSX200801004.htm ZHANG H, XU Z B.A survey on learning theory(Ⅰ):Stability and generalization[J].Chinese Journal of Engineering Mathematics, 2008, 25(1):1-9. http://www.cnki.com.cn/Article/CJFDTOTAL-GCSX200801004.htm

施引文献

期刊类型引用(1)
1. 尚涛，孙海正，刘建伟. 基于斐波那契编码的测量设备无关量子密钥分发方案. 航空科学技术. 2021(03): 71-78 . 百度学术
其他类型引用(2)

资源附件(0)

访问统计

点击查看大图

图(5) / 表(2)

计量

文章访问数: 836
HTML全文浏览量: 73
PDF下载量: 361
被引次数: 3

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

纠错输出编码的留一误差界估计

doi: 10.13700/j.bh.1001-5965.2017.0031

作者简介:
薛爱军男, 博士研究生。主要研究方向:模式识别

王晓丹女, 教授, 博士生导师。主要研究方向:机器学习

通讯作者:
王晓丹, E-mail: wang_afeu@126.com

计量

Leave-one-out error bounds estimation for error correcting output codes

Corresponding author: WANG Xiaodan, E-mail: wang_afeu@126.com

期刊类型引用(1)

其他类型引用(2)

计量

目录

类别	先验概率	平均值向量	协方差矩阵
C₁	P(C₁)=	μ₁=(0, 0)^T	Σ₁=
C₂	P(C₂)=	μ₂=(0, 5)^T	Σ₂=
C₃	P(C₃)=	μ₃=(5, 0)^T	Σ₃=
C₄	P(C₄)=	μ₄=(5, 5)^T	Σ₄=
C₅	P(C₅)=	μ₅=(2, 3)^T	Σ₅=

数据集	样本个数	特征维数	类别数
vowel	990	13	11
balance	625	4	3
glass	214	10	6
vehicle	846	18	4
letter	1 214	16	26
segmentation	2 310	19	7

留言板

纠错输出编码的留一误差界估计

doi: 10.13700/j.bh.1001-5965.2017.0031

作者简介: 薛爱军 男, 博士研究生。主要研究方向:模式识别 王晓丹 女, 教授, 博士生导师。主要研究方向:机器学习

通讯作者: 王晓丹, E-mail: wang_afeu@126.com

计量

出版历程

Leave-one-out error bounds estimation for error correcting output codes

Corresponding author: WANG Xiaodan, E-mail: wang_afeu@126.com

期刊类型引用(1)

其他类型引用(2)

计量

出版历程

目录

作者简介:
薛爱军男, 博士研究生。主要研究方向:模式识别

王晓丹女, 教授, 博士生导师。主要研究方向:机器学习

通讯作者:
王晓丹, E-mail: wang_afeu@126.com