For the latest version of this document with full details, go to

Dave Henderson

Passionate about observability, reliable services, cloud infrastructure, containers, monitoring, and writing the code that ties it all together. | @hairyhenderson | hairyhenderson | dhenderson


Grafana Labs

Senior Software Engineer, Grafana Cloud

January 2021 - present

I help build the services that run Grafana on Grafana Cloud.


SRE Technical Lead

January 2019 - January 2021

I lead the technical direction of the SRE team and continue to help people design, deploy, scale, and observe distributed services.

Highlights: - Worked with various development teams to help encourage more focus on reliability while writing new services and delivering new features - Continued to review and evolve our SLO practices, including helping teams to write useful SLO Documents - Wrote a cluster-internal caching HTTP proxy for proxying Docker image pulls and Helm chart retrieval to save both cost and network bandwidth usage, as well as increase availability during outages of our upstream artifact repository - Helped lead direction on, and contributed to a cross-functionally-shared library to help teams implement metrics, tracing, and simplify writing reliable servers and clients in Go - Led the org-wide deployment of distributed tracing infrastructure using Jaeger and helped teams onboard through providing training and contributing code to various internal services - Led the org-wide migration from Jaeger client to OpenTelemetry, including deployment of OpenTelemetry Collector - Worked with management to identify gaps in SRE coverage and develop plans for team growth - Worked with our architects to plan replacement of our monolithic Prometheus multi-region/multi-cluster metric federation infrastructure with Cortex - Helped improve auto-scaling configurations, including enabling teams to use custom metrics for informing HPA - Continued to participate in a regular rotating on-call schedule

Senior Site Reliability Engineer, SRE

October 2017 - January 2019

I helped people design, deploy, scale, and observe distributed services.

Highlights: - Deployed and configured a number of Prometheus servers to gather metrics across regions and environments, including global per-environment federation - Worked with leadership to help the team implement various SRE best practices recommended by [Google's SRE book]( - Introduced Service Level Indicators and Objectives (SLIs/SLOs) to the development organization, and provided guidance on finding reasonable SLOs - Worked with management and team to implement (and participate in) a regular rotating on-call schedule - Helped to plan and deploy a set of Kubernetes clusters to run Qlik's next-generation cloud services - Developed and open-sourced a library to expose common metrics to Prometheus - - Wrote a Prometheus exporter to expose more metrics from Docker Swarm - Helped establish a Launch Coordination process, and collaborated continually with service owners and developers - Established a regular "Metrics Office Hours" forum for cross-team sharing and discussion around our metrics strategy - Helped establish patterns for auto-scaling of services - _[hackathon]_ Built a Raspberry Pi-based LED GitHub notification indicator

Senior Software Engineer, Qlik Cloud

June 2015 - October 2017

I helped transform with infrastructure-as-code, immutable containers, and distributed microservices.

Highlights: - Helped launch the early iterations of - Brought 19-hour production deploy cycles down to a few minutes, with an immutable Docker container-based approach - Simplified and solidified our cloud infrastructure by introducing declarative infrastructure-as-code to the site (Terraform, shell scripts) - Helped to interview and assess dozens of candidates - Migrated the single-node Elasticsearch deployment to a multi-node AWS-hosted ES cluster, deployed GitHub-SSO-protected Kibana (ELK) - Created a simple container management system with Packer, Terraform, and shell scripts - Helped split the monolithic service into separate microservices. Extracted packages and libraries for common concerns. (Node.js) - Helped to deploy and manage a highly-available Vault cluster for secret management. Integrated Vault support into a number of services. (Vault) - Deployed Prometheus for monitoring, and started instrumenting services with common and custom metrics and alerting - Deployed and managed Docker Swarm (with _Docker for AWS_) and started migrating services - Advocated Go to Qlik R&D as a viable alternative to Node.js for new services and tooling - _[hackathon]_ Built a Raspberry Pi-based Slack-enabled physical "traffic light" to display the current state of the production site


Software Developer

May 2008 - May 2015

I joined IBM when Cognos was purchased in 2008.

I worked on building the cloud architecture and technical foundation for IBM Watson Analytics.

Highlights: - Worked on building the cloud architecture and technical foundation for IBM Watson Analytics (WA) - Introduced log aggregation and analysis technologies, and developed dashboards to help other developers diagnose and correct software issues - Integrated a 3rd-party monitoring service to help other developers and operators get a handle on performance, availability, and be notified during outages - Introduced Docker as a development and deployment technology - Socialized Docker and associated tools to other developers, and encouraged a grass roots project-wide effort to "Dockerize" all components - Transformed the deployment strategy to enable the team to do zero-downtime continuous deployment using rolling upgrades instead of the previous strategy where upgrades could only be done during pre-planned maintenance windows - Contributed code and bug fixes across a number of the WA components - Contributed to various open source projects to provide bug fixes or features useful to WA

I worked on Test as a Service, an internal project.

Highlights: - Worked on the next generation of test automation software for the cloud. Designing and implementing multiple services with RESTful APIs, implementing fully-automated infrastructure using Chef.

I worked on internal automated test tools.

Highlights: - Helped to lead development of an Eclipse RCP-based test automation tool for test authoring, execution, and analysis - Worked on technology-agnostic keyword-driven test automation engines - Continued development of software written at Cognos to automate installation, configuration, and testing of Cognos BI products. - Helped to bring in Agile and DevOps principles to our team, creating and maintaining a continuous integration and deployment system based on Jenkins


Quality Control Analyst

Apr 2005 - May 2008

I worked on internal automated test tools.

Highlights: - developed and executed automated tests for Cognos 8 BI releases - created frameworks to automate installation, configuration, and testing of Cognos 8 BI releases

The IT Department

Senior Internetworking Engineer

1999 – Apr 2005

I helped small and medium-size businesses with their IT needs.

Highlights: - built firewall appliances - installed and supported networking solutions for small to medium-sized businesses across Canada and the US - deployed inter-site VPNs for customers - developed software and processes to allow IT Department help desk staff to remotely manage and troubleshoot customer issues - managed the help desk

Tech I've Worked With


Some Projects I've Contributed To

Some Projects I've Created

Volunteer Experience

Docker Ottawa Meetup Group

Community Leader/Co-organizer

Apr 2016 – present

I'm one of the organizers of Ottawa's Docker Meetup group. I've helped organize most of the events and often deliver talks on various Docker- and containerization-related topics.

Slides from many of my talks are available here.