Computing Community Consortium Blog

The goal of the Computing Community Consortium (CCC) is to catalyze the computing research community to debate longer range, more audacious research challenges; to build consensus around research visions; to evolve the most promising visions toward clearly defined initiatives; and to work with the funding organizations to move challenges and visions toward funding initiatives. The purpose of this blog is to provide a more immediate, online mechanism for dissemination of visioning concepts and community discussion/debate about them.

Understanding the Google computer, and making it better

August 26th, 2015 / in Research News / by Helen Wright

markhill2006-mediumThe following is a special contribution to this blog by CCC Executive Council Member Mark D. Hill of the University of Wisconsin-Madison.

Internet-based services that we have all come to love (e.g., search, email, social networks, video/photo sharing) are all powered by large back-end data centers, designed and managed as large warehouse-scale computers. Emerging cloud computing workloads also use such warehouse-scale computers, making it even more important to understand and optimize this class of computer systems.

But until now, such warehouse-scale computers have (true to their name!) been big black boxes, with very little insight about detailed performance characteristics of deployments at scale: What is the nature of workloads that run on these large computers? How well are they served by the underlying microarchitecture of current processors? Where are the next opportunities for improving this important class of systems?

Engineers at Google, in collaboration with researchers at Harvard University, have recently presented some answers to these questions. Their recent paper at this summer’s International Symposium on Computer Architecture, titled “Profiling a warehouse-scale computer” presents results from a longitudinal study spanning tens of thousands of servers in actual Google data centers, examining detailed microarchitectural characteristics of thousands of different applications when serving live traffic across billions of user requests.

So, what did these researchers find? One key nugget is that Google workloads demonstrate significant diversity (that has been increasing over the years). Another is that they differ in some significant ways from the traditional SPEC benchmarks that we are used to seeing in common architectural studies. Notably, workloads running on the Google computer have significantly lower useful work (instructions per cycle) done on their processors than the typical SPEC benchmark, and also suffer from a significantly larger fraction of front-end pipeline pressure from instruction stalls.

The paper is also chock-full of other interesting nuggets of information, around other bottlenecks in the CPU pipeline and cache/memory hierarchies. Notably, while there are no significant hotspots at the individual workload level, across all the workloads at the warehouse-scale computer level, a few common low-level functions in the software stack account for nearly one third of the total cycles!

This paper is a great start, shedding light on some of the mysteries around large warehouse-scale computers in the wild, and the opportunities to optimize the software and hardware stack. But there is more to do. It is a great opportunity for our community to write the sequel.

Understanding the Google computer, and making it better