Fault detector of fault-tolerant distributed systems based on self-adaptive heartbeat algorithm
-
摘要: 故障检测是容错分布式系统中的关键技术之一.为了提高故障检测的性能,提出一种新型的故障检测器——自适应心跳检测器(SA-HD, Self-Adaptive Heartbeat Detector).SA-HD采用了基于拉式(pull)的自适应心跳算法,在考虑故障检测性能的同时也考虑了心跳检测所占用的网络资源对网络性能的影响.SA-HD能够根据网络负载调节自身发送心跳消息的频率,提高了心跳检测的网络环境适应能力,尤其是在高负载的环境下,能够有效改善心跳检测的性能.建立了SA-HD的模型,对其性能进行了仿真分析,并通过试验验证了SA-HD性能要优于传统推式(push)的心跳检测器.Abstract: The detection of failures is one of the fundamental issues for fault tolerant distributed systems. In order to improve the performance of the fault detection, a novel failure detector called self-adaptive heartbeat detector (SA-HD) was proposed. SA-HD employed a self-adaptive heartbeat detecting algorithm based on pull mode, in which the performance of the failure detection and the network resource caused by heartbeat detecting were both considered. The algorithm could change the frequency of transmitting the heartbeat messages and enhance the adaptability for the network environment, especially in a heavy network payload. The model of SA-HD was built and simulations were carried out to compare the performance between SA-HD and the approach based on push mode. The simulation analysis and experiment results show that the performance of SA-HD is better than the latter’s.
-
[1] Xiong Naixue,Yang Yan.A survey on fault-tolerance in distributed network systems //Proceedings of IEEE International Conference on Computational Science and Engineering.New York:IEEE,2009:1065-1070 [2] Felber P,Defago X,Guerraoui R,et al.Failure detectors as first class objects //Proceedings of IEEE International Symposium on Distributed Objects and Applications.New York:IEEE,1999:132-141 [3] Wiesmann M,Urban P,Defago X.An SNMP based failure detection service //Proceedings of the 25th IEEE International Symposium on Reliable Distributed Systems.New York:IEEE,2006:365-374 [4] Zhu Hao,Chen Haopeng.Adaptive failure detection via heartbeat under hadoop //Proceedings of IEEE Asia-Pacific Services Computing Conference.Jeju:IEEE,2011:231-238 [5] Roberto B,Jean M H,Sara T P.A methodology to design arbitrary failure detectors for distributed protocols[J].Journal of Systems Architecture,2008,54(7):619-637 [6] Chen W,Sam T,Marcos K A.On the quality of service of failure detectors[J].IEEE Transactions on Computers,2002,51(1):13-32 [7] Naohiro H,Xavier D,Rami Y,et al.The φ accrual failure detector //Proceedings of the 23th IEEE International Symposium on Reliable Distributed Systems.New York:IEEE,2004:66-78 [8] Benjamin S,Andreas P,Wolfgang T,et al.A lazy monitoring approach for heartbeat-style failure detectors //Proceedings of the 3th International Conference on Availability,Reliability and Security.New York:IEEE,2008:404-409 [9] Chandra T D,Toueg S.Unreliable of failure detectors for reliable distributed systems[J].Journal of the ACM,1996,43(2):225-267 [10] Fetzer C,Raynal M,Tronel F.An adaptive failure detection protocol //Proceedings of the 8th Pacific Rim Symposium on Dependable Computing.New York:IEEE,2001:146-153 [11] Kleinrock L.Queueing systems,volume 1:theory[M].New York:John Wiley,1962
点击查看大图
计量
- 文章访问数: 2142
- HTML全文浏览量: 250
- PDF下载量: 845
- 被引次数: 0