留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

成分数据典型相关分析的增量算法

孔博傲 卢珊 王惠文

孔博傲,卢珊,王惠文. 成分数据典型相关分析的增量算法[J]. 北京航空航天大学学报,2023,49(10):2851-2858 doi: 10.13700/j.bh.1001-5965.2021.0765
引用本文: 孔博傲,卢珊,王惠文. 成分数据典型相关分析的增量算法[J]. 北京航空航天大学学报,2023,49(10):2851-2858 doi: 10.13700/j.bh.1001-5965.2021.0765
KONG B A,LU S,WANG H W. Incremental computing methods of canonical correlation analysis for compositional data streams[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(10):2851-2858 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0765
Citation: KONG B A,LU S,WANG H W. Incremental computing methods of canonical correlation analysis for compositional data streams[J]. Journal of Beijing University of Aeronautics and Astronautics,2023,49(10):2851-2858 (in Chinese) doi: 10.13700/j.bh.1001-5965.2021.0765

成分数据典型相关分析的增量算法

doi: 10.13700/j.bh.1001-5965.2021.0765
基金项目: 国家自然科学基金(72021001,72001222)
详细信息
    通讯作者:

    E-mail:shan.lu@cufe.edu.cn

  • 中图分类号: O212.4

Incremental computing methods of canonical correlation analysis for compositional data streams

Funds: National Natural Science Foundation of China (72021001,72001222)
More Information
  • 摘要:

    成分数据典型相关分析(CCAI)是一种研究多个成分数据变量之间线性相关关系的方法,在经济、管理、地质、化学等多个领域应用广泛。在海量数据背景下,研究如何针对成分数据流展开典型相关建模分析,具有重要的理论意义和实用价值。为此,提出了成分数据典型相关分析的增量方法,通过对增量成分数据的协方差分解,实现对成分数据流典型相关性的精确计算。同时,给出序贯式和并行式2种分块增量算法,可处理多组成分数据的数据流建模问题,序贯式分块增量算法,按照数据流的先后顺序进行计算,并行式分块增量算法可以达到提高计算效率的目的。通过对不同概率分布和样本规模的成分数据流的仿真研究及微博假新闻的实例分析,验证了所提算法相比于传统的非增量算法,在保证计算准确性的前提下,具有提高运算效率的优势。

     

  • 图 1  对成分数据采用3种增量算法的时间对比

    Figure 1.  Comparison of running time of CCA for compositional data with three different incremental methods

    图 2  并行式分块增量算法得到的协方差矩阵的相对误差平均值

    Figure 2.  Average of relative error for estimation of cross-covariance matrix calculated by parallel block incremental method

    图 3  对微博假新闻数据进行典型相关分析的4种算法运行时间对比

    Figure 3.  Comparison of running time of CCA with four different incremental methods of fake news data from Weibo

    表  1  增量算法的运行时间

    Table  1.   Running time of incremental methods s

    ${D}$${n}$$ \mathrm{\theta } $一次性增量算法序贯式分块增量算法并行式分块增量算法非增量算法
    410 0000.01$ 0.001\;7\left(2.91\times {10}^{-7}\right) $$ 0.002\;4\left(2.71\times {10}^{-7}\right) $$ 0.003\;0\left(1.79\times {10}^{-7}\right) $$ 0.309\;5\left(1.60\times {10}^{-4}\right) $
    100 0000.01$ 0.013\;5\left(1.32\times {10}^{-6}\right) $$ 0.018\;0\left(3.31\times {10}^{-6}\right) $$ 0.010\;1\left(3.37\times {10}^{-7}\right) $$ 3.115\;4\left(9.69\times {10}^{-3}\right) $
    0.1$ 0.130\;7\left(3.16\times {10}^{-5}\right) $$ 0.174\;1\left(1.21\times {10}^{-4}\right) $$ 0.101\;4\left(2.14\times {10}^{-5}\right) $$ 3.307\;5\left(5.98\times {10}^{-3}\right) $
    200 000 0.1$ 0.268\;8\left(2.06\times {10}^{-3}\right) $$ 0.349\;6\left(4.08\times {10}^{-4}\right) $$ 0.193\;6\left(4.36\times {10}^{-5}\right) $$ 6.671\;6\left(1.84\times {10}^{-2}\right) $
    510 0000.01$ 0.002\;5\left(2.65\times {10}^{-7}\right) $$ 0.004\;1\left(2.22\times {10}^{-7}\right) $$ 0.003\;7\left(1.34\times {10}^{-7}\right) $$ 0.553\;4\left(4.35\times {10}^{-4}\right) $
    100 0000.01$ 0.023\;9\left(1.59\times {10}^{-5}\right) $$ 0.032\;9\left(5.41\times {10}^{-4}\right) $$ 0.018\;7\left(5.89\times {10}^{-7}\right) $$ 5.399\;8\left(2.71\times {10}^{-2}\right) $
    0.1$ 0.220\;4\left(1.17\times {10}^{-4}\right) $$ 0.297\;0\left(2.88\times {10}^{-4}\right) $$ 0.168\;5\left(2.27\times {10}^{-5}\right) $$ 5.753\;9\left(1.56\times {10}^{-2}\right) $
    200 000 0.1$ 0.443\;0\left(5.27\times {10}^{-4}\right) $$ 0.600\;9\left(1.52\times {10}^{-3}\right) $$ 0.336\;3\left(2.31\times {10}^{-4}\right) $$ 11.652\;3\left(5.86\times {10}^{-1}\right) $
    610 0000.01$ 0.003\;7\left(1.05\times {10}^{-6}\right) $$ 0.006\;0\left(3.95\times {10}^{-7}\right) $$ 0.005\;2\left(1.69\times {10}^{-7}\right) $$ 0.839\;9\left(9.31\times {10}^{-4}\right) $
    100 0000.01$ 0.033\;4\left(5.38\times {10}^{-6}\right) $$ 0.045\;8\left(2.08\times {10}^{-5}\right) $$ 0.028\;5\left(2.47\times {10}^{-6}\right) $$ 8.338\;8\left(6.95\times {10}^{-2}\right) $
    0.1$ 0.334\;1\left(4.51\times {10}^{-4}\right) $$ 0.446\;9\left(5.04\times {10}^{-4}\right) $$ 0.258\;8\left(8.46\times {10}^{-5}\right) $$ 8.929\;5\left(4.47\times {10}^{-2}\right) $
    200 000 0.1$ 0.666\;4\left(8.13\times {10}^{-4}\right) $$ 0.901\;8\left(2.18\times {10}^{-3}\right) $$ 0.516\;4\left(3.06\times {10}^{-4}\right) $$ 17.848\;1\left(2.16\times {10}^{-1}\right) $
    下载: 导出CSV

    表  2  微博假新闻中不同情感色彩与主题之间的典型主轴及典型相关系数

    Table  2.   Canonical variables and canonical correlations between different emotions and topics in fake news of Weibo

    $ h $$ {\rho }_{h} $
    1$ 0.451\;9 $
    $ 2 $$ 0.245\;9 $
    $ 3 $$ 0.131\;9 $
    下载: 导出CSV
  • [1] 卢珊, 王惠文. 成分数据因变量的混合数据回归及在股市情绪构成分析中的应用[J]. 计量经济学报, 2021, 1(2): 469-478.

    LU S, WANG H W. Mixed data regression with compositional response and an application in shock market sentiment analysis[J]. China Journal of Econometrics, 2021, 1(2): 469-478(in Chinese).
    [2] WANG H W, LU S, ZHAO J C. Aggregating multiple types of complex data in stock market prediction: A model-independent framework[J]. Knowledge-based Systems, 2019, 164: 193-204. doi: 10.1016/j.knosys.2018.10.035
    [3] WEI Y G, WANG Z C, WANG H W, et al. Compositional data techniques for forecasting dynamic change in China’s energy consumption structure by 2020 and 2030[J]. Journal of Cleaner Production, 2021, 284: 124702. doi: 10.1016/j.jclepro.2020.124702
    [4] ZUZOLO D, CICCHELLA D, LIMA A, et al. Potentially toxic elements in soils of Campania region (Southern Italy): Combining raw and compositional data[J]. Journal of Geochemical Exploration, 2020, 213: 106524. doi: 10.1016/j.gexplo.2020.106524
    [5] EBRAHIMI P, ALBANESE S, ESPOSITO L, et al. Coupling compositional data analysis (CoDA) with hierarchical cluster analysis (HCA) for preliminary understanding of the dynamics of a complex water distribution system: The Naples (South Italy) case study[J]. Environmental Science:Water Research & Technology, 2021, 7(6): 1060-1077.
    [6] JANSSEN I, CLARKE A E, CARSON V, et al. A systematic review of compositional data analysis studies examining associations between sleep, sedentary behaviour, and physical activity with health outcomes in adults[J]. Applied Physiology Nutrition and Metabolism, 2020, 45(10): S248-S257.
    [7] AITCHISON J. Principal component analysis of compositional data[J]. Biometrika, 1983, 70(1): 57-65. doi: 10.1093/biomet/70.1.57
    [8] WANG H W, SHANGGUAN L Y, GUAN R, et al. Principal component analysis for compositional data vectors[J]. Computational Statistics, 2015, 30(4): 1079-1096. doi: 10.1007/s00180-015-0570-1
    [9] AITCHISON J. The statistical analysis of compositional data[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1982, 44(2): 139-160. doi: 10.1111/j.2517-6161.1982.tb01195.x
    [10] WANG H W, SHANGGUAN L Y, WU J J, et al. Multiple linear regression modeling for compositional data[J]. Neurocomputing, 2013, 122: 490-500.
    [11] 龙文, 王惠文. 成分数据偏最小二乘 Logistic 回归模型及其应用[J]. 数量经济技术经济研究, 2006, 23(9): 156-161.

    LONG W, WANG H W. PLS Logistic regressionon compositional data and its application[J]. The Journal of Quantitative & Technical Economics, 2006, 23(9): 156-161(in Chinese).
    [12] WANG H W, WANG Z C, WANG S S. Sliced inverse regression method for multivariate compositional data modeling[J]. Statistical Papers, 2021, 62(1): 361-393. doi: 10.1007/s00362-019-01093-z
    [13] TANG Z Z, CHEN G H. Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis[J]. Biostatistics, 2019, 20(4): 698-713. doi: 10.1093/biostatistics/kxy025
    [14] 夏棒, 王惠文, 周荣刚. 成分数据路径分析模型[J]. 数学的实践与认识, 2019, 49(14): 191-199.

    XIA B, WANG H W, ZHOU R G. Path modeling for compositional data[J]. Mathematics in Practice and Theory, 2019, 49(14): 191-199(in Chinese).
    [15] WANG Z C, WANG H W, WANG S S. Linear mixed-effects model for multivariate longitudinal compositional data[J]. Neurocomputing, 2019, 335: 48-58. doi: 10.1016/j.neucom.2019.01.043
    [16] GREENACRE M. Variable selection in compositional data analysis using pairwise logratios[J]. Mathematical Geosciences, 2019, 51(5): 649-682. doi: 10.1007/s11004-018-9754-x
    [17] HARDOON D R, SZEDMAK S, SHAWE-TAYLOR J. Canonical correlation analysis: An overview with application to learning methods[J]. Neural Computation, 2004, 16(12): 2639-2664. doi: 10.1162/0899766042321814
    [18] HOTELLING H. Relations between two sets of variates[J]. Biometrika, 1936, 28(3-4): 321-377.
    [19] 龙文. 带约束条件的数据表分析预测方法及应用研究[D]. 北京: 北京航空航天大学, 2007: 14-20.

    LONG W. Method study based on constrained data and application [D]. Beijing: Beihang University, 2007: 14-20(in Chinese).
    [20] ARTAC M, JOGAN M, LEONARDIS A. Incremental PCA for on-line visual learning and recognition[C]//Proceedings of the 2002 International Conference on Pattern Recognition. Piscataway: IEEE Press, 2002: 781-784.
    [21] LI Y M. On incremental and robust subspace learning[J]. Pattern Recognition, 2004, 37(7): 1509-1518. doi: 10.1016/j.patcog.2003.11.010
    [22] ZENG X, LI G Z. Incremental partial least squares analysis of big streaming data[J]. Pattern Recognition, 2014, 47(11): 3726-3735. doi: 10.1016/j.patcog.2014.05.022
    [23] WEI Y, WANG H G, WANG S S, et al. Incremental modelling for compositional data streams[J]. Communications in Statistics-Simulation and Computation, 2019, 48(8): 2229-2243. doi: 10.1080/03610918.2018.1455870
    [24] EGOZCUE J J, PAWLOWSKY-GLAHN V, MATEU-FIGUERAS G, et al. Isometric logratio transformations for compositional data analysis[J]. Mathematical Geology, 2003, 35(3): 279-300. doi: 10.1023/A:1023818214614
    [25] AITCHISON J, EGOZCUE J J. Compositional data analysis: Where are we and where should we be heading?[J]. Mathematical Geology, 2005, 37(7): 829-850. doi: 10.1007/s11004-005-7383-7
    [26] AITCHISON J. The statistical analysis of compositional data[M]. London: Chapman and Hall, 1986.
    [27] MEEL P, VISHWAKARMA D K. Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities[J]. Expert Systems with Applications, 2020, 153(1): 112986.
    [28] LU S, ZHAO J C, WANG H W. MD-MBPLS: A novel explanatory model in computational social science[J]. Knowledge-based Systems, 2021, 223: 107023.
    [29] CHUAI Y W, ZHAO J C. Anger makes fake news viral online[EB/OL]. (2020-08-27) [2021-12-01]. https://arxiv.org/abs/2004.10399.
  • 加载中
图(3) / 表(2)
计量
  • 文章访问数:  217
  • HTML全文浏览量:  41
  • PDF下载量:  21
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-12-20
  • 录用日期:  2022-05-06
  • 网络出版日期:  2022-05-16
  • 整期出版日期:  2023-10-31

目录

    /

    返回文章
    返回
    常见问答