Measuring Remote Team Member Performance (yes, I went there)

COVID forced many organizations to explore how to have team members work remotely. It was already a part of their team's work for many organizations, especially those with many remote offices or divisions in other continents or countries. While some companies have decided that this remote working setup is a benefit and made remote working the default, some companies are outright "get back to the office or leave" (Elon Musk, for example). This article looks at how we might coach and measure the impact of people working remotely from home and ways to ensure that employees and teams remain healthy and productive.

I teach that performance is a six-dimension tug-o-war between competing forces (see the image above). Our managers and coaches help teams and individuals balance those competing forces and know the warning signs of too-much and too-little performance for each dimension. The job of the coaches and managers is to help EVERYONE notice and react when the balance is compromised. I see remote workers no differently. We need to coach and manage people's ability to balance competing forces. Some stressors change, where in the office, people get distracted by drive-by questions and "emergencies," remote home workers might get distracted by hungry children or re-runs of Days of our lives (or is this just me?). We need to trust that our staff can cope with these distractions, and the companies that are afraid the most are those who don't have a way to measure outcomes and rely instead on visual proximity as their only measurement tool. They are lazy. Their inability to understand results means they assume people will goof off unless actively forced to "do work."

Intro to the Six Dimensions of Performance of Flow-Based Systems (as applied to an individual)

The six dimensions of performance of flow systems (in no particular order) are

How much. The amount of raw work product delivered over time
How consistently. The consistency of delivering work product over short bursts of time.
How valuable. The amount of value or urgency of work product delivered over time.
How fast. The amount of time it takes to provide raw work over time, or the response time.
How well. The amount of quality work delivered or the inverse, the amount of rework required.
How sustainable. The ability to continue adequate levels of the other five metrics over long periods.

Performance in each of these six dimensions must be adequate. And they work against each other, so being exceptional at any one risks being sub-par at another. And these six dimensions work at all levels of observable flow-based delivery systems, from a whole organization to teams, to individuals. This article will explore how we might use these six dimensions at an individual employee level who is recently working remotely at home.

Home-Based/Remote Employee Performance

The title you just read, "home-based employee performance," will likely send shivers down your spine. It can indeed lead to misuse and abuse if performed poorly and lazily. Comparing people without context is a sure-fire way to screw up any measurement. I'm going to suggest you avoid that!

Instead, I suggest using our performance measurements to identify areas where more support is necessary to allow individuals to perform sustainably and productively. Any decline over time in these measurement dimensions is more likely a system issue than an individual performance issue. The most urgent issue organizations need to solve is the ability to detect that multiple (not individual) employees are declining in one of these performance dimensions. Managers who think individual performance is critical must realize that multiplying declining individual performances by 100 odd developers is more costly. Second, suppose any development system is so brittle that any single declining performance is measurable and significant. In that case, that system is so brittle that you have an organizational sustainability problem that needs fixing.

How much. The amount of raw work product delivered over time

A naïve measure of how much performance is likely the amount of code written. We got past this in the 1990s when we dispensed with lines of code as a productivity measure (didn't we?). We need to detect changes in how much work is delivered by individual remote workers. NOT to chastise the employee, but to see WHY. It won't always be negative. An unexplained increase in delivered work product might mean rushing, working too many hours, or skipping security, performance, or other steps. The change is the critical signal, not the up, down, or sideways.

What we want to avoid when people work from home is -

They don't get stuck and ask for help as much as they do when in the same office space.
They don't work too many hours or feel pressured to
They deliver work other than that measured as the raw product here (for example, helping others, training, production support.)

I'm not going even to consider working too few hours. I will trust that the measure of value delivered will feel that loss.

What would I measure here to detect the avoidable issues -

Commits per week - are they sustaining the same commit pace?
Is the average size of commits per week (for example, the number of files changed per changeset?) similar?
Reviews per week - are they reviewing a similar number of other team members' commits per week?
Number of Blockers per day or week. Are they blocked more than usual and not getting help?
The number of standups attended? Are they available at least during the critical team daily touchpoint?

The raw measures don't matter in any of these cases. The signal is in change. Any decline or an increase is alarming if we don't know why. Why is a seeming improvement alarming? We don't yet know how this improvement impacted the other performance dimensions. Let's check.

How consistently. The consistency of delivering raw work product over short bursts of time.

This dimension diagnoses delivery feasts or famines. There are two leading causes of inconsistency. The first is arrival inconsistency; work arrives at a non-linear pace, meaning that delivery is at an inconsistent rate. The second is that work suffers from deviating focus. For example, the work item you want to deliver (maximum customer value) gets bumped due to production issues. It would be hard to fault any single developer for dealing with production before new features work; the problem is the system allowing production defects in the first place. Our goal with these six balanced performance dimensions is to detect that inconsistent delivery is an issue, not that the individual is the issue. I mostly see when workers are remote that some people take on too much random work as told by others is "urgent." This measure helps ensure that all team members are aligned and share the workload, not favoring one source. If there is a decline in individual consistency, the right intervention question is "what is causing this?" rather than "why can't you keep up?" Nine times out of ten, it will be someone trying to do too much rather than too little.

What we want to avoid when people work from home is -

They don't get stuck and ask for help as much as they do when in the same office space
They don't pushback when new work arrives and end up getting behind

What would I measure here to detect the avoidable issues -

The raw work item arrival rate per day or week.
The raw work item departure rate per day or week.

From these two raw measures, these diagnostic measures can be calculated and observed-

The difference between arrival rate and departure rate (called net flow) of work items per day or week - does the individual get lumbered with inconsistent incoming work that they cant deliver (can anyone humanly keep up?)
Work in Progress by day - does the individual have a similar amount of in-progress work? Or are they being blocked or taking on too much?

How valuable. The amount of value or urgency of work product delivered over time.

This performance dimension might seem beyond an individual's control. At the individual level, it comes down to their ability to use their time and effort where it's the most valuable. In a shared office, drive-by or email requests for assistance are often visible to other team members. When remote, the risk is that other people find the team members most likely to say "yes." Everyone is trying to do the "right" thing, and the goal of metrics at this level is to guide them on balance between competing streams of value.

Our goal for observing remote workers are doing the "right work" is to categorize the work they are doing in some way and make sure the percentages of those categories match some expectations. For example, the work category types might be, planned feature work, production support, helping other developers, meetings, and coordination. Every individual will have a different percentage mix, there isn't any right or wrong, and the intention is to discuss if someone is doing (or being compelled to do) too much or too little of any one type.

What would I measure here to detect the avoidable issues -

The percentage of work items in progress allocated to 4 or 5 categories of work type

There isn't any automatic failure or success signal in these metrics, just a regular review, and conversation if someone is doing too much or too little of one type of work. It will be the change that helps sense degradation. Someone is doing 60% feature work, which drops to less than 10%.

How fast. The amount of time it takes to provide raw work over time, or the response time.

Working faster seems ideal until you measure the cost. One way to reduce the time it takes to do something is to skip some steps, like testing, security, accessibility, and performance, to name a few. There can be pressure when working remotely to do more faster. The downsides of moving more quickly than completeness and sustainability allow means any improvement in delivery speed is a mirage.

Even given the risks of incentivizing individuals to deliver and react faster, it is vital to see if the time taken to do similar work is getting longer or shorter. System changes and pressures can cause systems to change delivery performance, and "how fast" signals degradation or improvement.

What I would measure here is -

Cycle time distribution for the SAME types used in the "how valuable." - how long does it take once committed.
How long does it take to triage new work with urgency and priority (is it too long to prioritize) - how long does it take to assess incoming work.

The goal is to detect the change in delivery or reaction time for similar types of work and then discuss the know causes of that change. If the known reasons aren't identifiable, then suspect a change from some other performance metric, probably quality!

How well. The amount of quality work delivered or the inverse, the amount of rework required.

A measure of quality seems prudent. Unless what you deliver " works, you can't take credit for any improvement in how much, how fast, or how valuable. But this is just the start of "how well." You can achieve an improvement in all of the other performance dimensions and not deliver anything useful.

The first quality measure that comes to mind is some measure of defects or production rollbacks—quality measures as a failure. But it's time to discuss leading and lagging indicators. Measuring product failure on the customer (production) is as "lagging" as possible.

Let me rephrase the goal of "how well." It ensures that individuals and teams deliver at a sustainable rate without skipping any steps. It is a measure that detects known causes of increasing errors or rollbacks. It isn't the rollback or mistake that is the signal; the signal is that conditions will likely cause those to grow in the future.

Production failures and mistakes are a part of doing bold things. Controlling risk while doing those bold things is the purpose of the "how well" measures.

What I measure here is -

The number of code review comments per file/day/week - are code reviews being performed and causing feedback?
The amount of time any continuous integration system spends broken (red) versus clean (green)
Are tests growing in areas of code under development or where defects are prevalent?

The key to finding suitable measures for how well is to ask the team after each failure, "what do we regret not doing before this failure?" and measure that activity. Don't take the easy way out and look at after-the-fact failure measures; search for metrics that are leading indicators of future failures.

How sustainable. The ability to continue adequate levels of the other five metrics over long periods.

Although it seems like many measures so far are important, this is the FIRST dimension I suggest tackling. If you don't have a sustainable product development process, it doesn't matter what you measure because it will decline in the future.

There is a risk when we encourage people through measurement to improve - "They do." Ironically, this improvement pushes past the level that the system can handle. Small shortcuts compound and the system collapses in a heap before you know it. "How sustainable" measures help you detect how close to that collapse point you are.

There are a few ways product development systems creep towards collapse -

Technical debt. Small sacrifices grow.
Product complexity grows beyond capability.
People burn-out.

All three are significant risks and need to be measured. But they are all subjective without clear lines of success or failure. How much technical debt is too much? Large products are complex; how complex is too complex?

A key question everyone from the individual to teams to organizations needs to answer is -

"Can you/we continue to operate the pace as we are without sacrificing something we will eventually regret?"

The answer most often will be "No." Development systems are constantly under pressure, and this pressure breaks something. These metrics warn you that something is becoming sure to fail soon(er) rather than later.

The reasons systems are heading towards failure will depend on context. And once you fix one, another will move to the forefront. Success is buying time, not solving every problem.

Rather than a metric here, I look for agreement of the top 3 reasons. During retrospectives or 1:1 meetings with teams and individuals, I ask them to offer their causes of risk for sustainability and vote (at the individual level, we talk and discuss). After we have the top 3 list, we estimate how long until system failure and how long until involuntary action is needed.

Conclusion

Performance is a trade-off between six competing dimensions at all levels of a flow-based product delivery system, from an individual to the whole organization. Beyond the raw numbers, it is the trend of each performance dimension that tells the story. The closer you are to the individual, the less the simple quantified number matters. If every individual is trending poorly on "how well," for example, it is a system issue - stop yelling at the people! Use metrics at an individual level to detect and fix these system issues. The development process likely needs to change for individuals remotely working, and this article aims to give you the tools to find those system changes faster.

June Data/Forecasting Newsletter:
Measuring Remote Team Member Performance (yes, I went there)

Article: Remote Team and Remote Team Member Performance Metrics

Intro to the Six Dimensions of Performance of Flow-Based Systems (as applied to an individual)

Home-Based/Remote Employee Performance

How much. The amount of raw work product delivered over time

How consistently. The consistency of delivering raw work product over short bursts of time.

How valuable. The amount of value or urgency of work product delivered over time.

How fast. The amount of time it takes to provide raw work over time, or the response time.

How well. The amount of quality work delivered or the inverse, the amount of rework required.

How sustainable. The ability to continue adequate levels of the other five metrics over long periods.

Conclusion

Agile 2022 - I’ll be there with Swag

June Data/Forecasting Newsletter: Measuring Remote Team Member Performance (yes, I went there)

Article: Remote Team and Remote Team Member Performance Metrics

Intro to the Six Dimensions of Performance of Flow-Based Systems (as applied to an individual)

Home-Based/Remote Employee Performance

How much. The amount of raw work product delivered over time

How consistently. The consistency of delivering raw work product over short bursts of time.

How valuable. The amount of value or urgency of work product delivered over time.

How fast. The amount of time it takes to provide raw work over time, or the response time.

How well. The amount of quality work delivered or the inverse, the amount of rework required.

How sustainable. The ability to continue adequate levels of the other five metrics over long periods.

Conclusion

Agile 2022 - I’ll be there with Swag

June Data/Forecasting Newsletter:
Measuring Remote Team Member Performance (yes, I went there)