High Availability Issues in DSM Systems: Research Opportunities
[摘要] This report documents a first-cut understanding of the HA issues in DSM systems. We discuss the general HA strategy, advocate for minimizing fault propagation, system reconfiguration time and performance degradation as the distinctive goals for the three stages that the system goes through after the occurrence of a fault till full recovery. We show the possibility of estimating the impact of a fault through hierarchical component dependency analysis. We point out that coherent protocols should be extended and transactions be made closed in order to detect the fault and maintain data integrity. In particular, we propose source-buffering to augment dirty data transfer protocol in preparing for possible data loss and corruption. N+1 stand-by system is suggested as the ultimate HA solution. Further research opportunities are discussed. This report skims through a broad range of issues, but it does not attempt to treat each of them in depth. 19 Pages
[发布日期] [发布机构] HP Development Company
[效力级别] [学科分类] 计算机科学(综合)
[关键词] shared memory multiprocessors;high availability [时效性]