How to Frame Metric Collection

- October 28, 2019

Depending on the type of software development you’re doing, it can be tough to figure out what metrics you need to collect. An iterative process (aka fancy words for trial-and-error) will work to get you to where you need to be eventually, but along the way the MTTR of outages will suffer and you might lose users/revenue.

There’s a simple way to think about metrics that will help you build an intuition on what to monitor and what to measure. It’s this:

Measure the business, measure the software. But don’t conflate the two.

Measuring the business is critical to being able to notify and escalate to the proper personnel when there is an outage. Business metrics include things that impact your bottom line - user signins per minute, items added to the cart per second, items sold per day. Anything that directly and immediately affects customers is a business metric.

Software metrics are signals about how your software is running. There are 3 categories of software metrics: OS metrics, generic server metrics and application metrics. OS metrics are things that can be measured at the OS level without knowing anything about the process(es) running into top of the OS - CPU, memory, network connections, etc. These will allow you to tune the software to measure performance and are necessary for debugging the hardiest of issues. Generic server metrics are things you’d be able to collect from any web server, application container, DB, message queue, cache server etc. - things like web requests per second, DB transactions per second or cache hit ratio. And lastly, application metrics are things you can collect that are specific to your application - whatever you want to publish to tell you the state and/or current operations of your application - it could mean you measure how long it took one important function to run per request.

This is a good place to start when creating a holistic set of metrics to monitor your system, and if you start here, you’ll have a good chance at getting all the details right as you go.

Search This Blog

masudio - tech

Uber's Michelangelo vs. Netflix's Metaflow

How to Frame Metric Collection

Comments

Post a Comment

Popular posts from this blog

ChatGPT - How Long Till They Realize I’m a Robot?

Architectural Characteristics - Transcending Requirements

Laws of Software Architecture