ConCORD: Easily Exploiting Memory Content Redundancy through the Content-Aware Service Command

HPDC '14

Abstract

We argue that memory content-tracking across the nodes of a parallel machine should be factored into a distinct platform service on top of which application services can be built. ConCORD is a proof-of-concept system that we have developed and evaluated to test this claim. Our core insight is that many application services can be described as a query over memory content. This insight leads to a core concept in ConCORD, the content-aware service command architecture, in which an application service is implemented as a parametrization of a single general query that ConCORD knows how to execute well. ConCORD dynamically adapts the execution of the query to the amount of redundancy available and other factors. We show that a complex application service (collective checkpointing) can be implemented in only hundreds of lines of code within ConCORD, while performing well.

Kyle C. Hale
Kyle C. Hale
Associate Professor of Computer Science

Hale’s research lies at the intersection of operating systems, HPC, parallel computing, computer architecture.