An oft-repeated adage among telecommunication providers goes, “There are ve things that matter: reliability, reliability, reliability, time to market, and cost. If you can’t do all ve, at least do the rst three. ” Yet, designing and operating reliable networks and services is a Herculean task. Building truly reliable components is unacceptably expensive, forcing us to c- struct reliable systems out of unreliable components. The resulting systems are inherently complex, consisting of many different kinds of components running a variety of different protocols that interact in subtle ways. Inter-networkssuch as the Internet span multiple regions of administrative control, from campus and cor- rate networks to Internet Service Providers, making good end-to-end performance a shared responsibility borne by sometimes uncooperative parties. Moreover, these networks consist not only of routers, but also lower-layer devices such as optical switches and higher-layer components such as rewalls and proxies. And, these components are highly con gurable, leaving ample room for operator error and buggy software. As if that were not dif cult enough, end users understandably care about the performance of their higher-level applications, which has a complicated relationship with the behavior of the underlying network. Despite these challenges, researchers and practitioners alike have made trem- dous strides in improving the reliability of modern networks and services.