Copy

REA Newsletter

Latest News about Resilience Engineering
Issue # 1
October 2019

Welcome to our Newsletter

By Ivonne Herrera, President, Resilience Engineering Association


In line with the Resilience Engineering Association aim to connect experiences and capabilities involving practitioners and academia across the world, I am very happy to welcome a new “Resilience Engineering Newsletter Initiative”. This is a kind of experiment and the success of our newsletter depends on the contributions from resilience engineering practitioners and academia, diverse disciplines and industries. We look for people who are passionate about bringing forward science or practice. Therefore, if you would like to share knowledge or experience related to Resilience Engineering, your contribution is more than welcome. Please do not hesitate to contact our communications team at rea-communications-team@googlegroups.com to contribute.  We hope you enjoy our newsletter.

OOPS! Learning from Surprise

By Lorin Hochstein


Some people think it's odd that software engineers at Netflix take inspiration from the work done by the resilience engineering community. After all, Netflix is an entertainment company, it doesn't deal in safety-critical systems. Unlike domains such as transportation, energy, or healthcare, there's little risk of physical harm involved in doing the work of running a video streaming service.
 
However, like the safety-critical domains, all of the work that Netflix software engineers do happens within the context of a socio-technical system. If you walked the halls, you might even see a scene that wouldn't look out of place in the control room of a power plant: software engineers looking at graphs on an operational dashboard in order to diagnose an operational issue.
 
In particular, when an incident occurs that results in a service interruption, we can apply the same kinds of accident investigation models as those used in the safety-critical domains to help us understand how the incident occurred.
 
We strive to use incidents as an opportunity to learn as much as can about how work actually happens at the organization. However, the amount that we might learn from an incident isn't proportional to its severity.  In fact, we might be able to learn just much from an operational surprise where there was no customer or business impact at all as we would from an incident with significant customer impact. Any time an engineer is surprised by the operational behavior of the system, there's a mismatch between their mental models and the actual system behavior, and there's an opportunity to learn how that mismatch came to be.
 
At Netflix, we have a project called OOPS: learning from surprise. When an engineering team encounters an operational surprise, we encourage them to file an "OOPS": a writeup of the surprise. This writeup contains a narrative description of the events that led to surprise, and identifies contributors, mitigators, risks, and challenges in handling.
 
By sharing these writeups out to the organization, we hope to build shared understanding around how the overall system behaves, demonstrate expertise in action, and encourage discussion around signals of risk that these OOPSies reveal.

Evolutions of Resilience thinking within Resilience Engineering

By Ivonne Herrera


Resilience has become hyper-popular, used in diverse disciplines with different meaning. In safety science, resilience is not new, it was introduced by Wildavsky in 1988 in his book “Searching for Safety”. Yet we may be relatively “young” in our design for Resilience Engineering and evolutions in the understanding of resilience.

Resilience Engineering meeting 2004 (Nancy Levenson is missing)
From the 2004 meeting, resilience is seen as “ability of an organization or system to keep or recover to stable state allowing to continue operations during or after major mishap…”.  Erik Hollnagel´s discusses his current view of resilience “A system is resilient if it can adjust its functioning prior to, during, or following events (changes, disturbances, and opportunities), and thereby sustain required operations under both expected and unexpected conditions “. In 2015, David Woods clarified diverse understandings of resilience as 1) rebound – returning to a stable state; 2) robustness – ability to absorb rather than recover (note here he argues that confounding robustness and resilience can be erroneous); 3) graceful extensibility extending the adaptive capacity when facing surprises, how system reorganize and escape constraints to continue operation and 4) sustain adaptability – managing and regulating adaptation in a layered network. Woods argues the scope of RE lies in resilience 3 and resilience 4. There are many recent publications on diverse views and understanding of resilience (e.g. Special issue, Safety Science, High reliability organizations and resilience engineering).
 
8th Resilience Engineering Symposium in Kalmar 2019
Since resilience has become so popular there are also some traps e.g. those highlighted by Sidney Dekker (2019):  1) reductionist trap on either targeting operations or wider societal systems; 2) moral trap where operators flexibility and adaptation are promoted by regulations having the risk of making accountable operators for safety, 3) normative trap where resilience embraces the idea of something positive and safety is achieved by ensuring local adaptation and strategies, while in some industries like fisheries the argument for resilience is not safety but the acceptance of danger.

Our purpose of the Resilience Engineering Association lies in being open to and building bridges among different views and perspectives rather than dictating a unique view and fixed definition of what resilience is and can be.

Only by integrating disciplines, sharing different experiences and perspectives that address increased complexity, emergence and non-linearity will we gain momentum, grow our knowledge and increase our comprehension in how resilience engineering can bring benefit to our society in coping with everyday work as well as surprises in an ever more complex reality.

Wreathall, Chapter 17 Properties of resilient organizations – In Resilience Engineering, Concepts and precepts 
From:  http://erikhollnagel.com/ideas/resilience-engineering.html
Woods, D. D., 2015. Four concepts for resilience and the implications for the future of resilience engineering. RESS, Volume 141, pp. 5-9.
 https://www.sciencedirect.com/science/article/pii/S0925753518304065
Dekker, S. 2019. Foundations of Safety Science – A century of Understanding Accidents and Disasters

What’s happening in research
Work at Safety Science Innovation Lab, Griffith University (Brisbane, Australia)

By David Provan


The Safety Science Innovation Lab is led by Sidney Dekker and Drew Rae.  Our Lab provides Graduate Certificate and Masters courses in Safety Leadership as well as having an average of approximately 10 Masters and PhD students conducting industry-based research.
The primary research aims of the Lab are to understand:
  • ‘Safety as done’ in organisations 
  • The current and future role of safety professionals
  • How safety differently can be applied in practice
We are currently conducting research across a diverse range of industries including: healthcare, utilities, rail, oil and gas, construction, and education. 
Our current research projects include: the identity and daily work of safety inspectors, the effects of ‘safety clutter’, the phenomenon of ‘fantasy planning’ in safety, how safety management is compromised by subcontracting arrangements, the application of hazard registers during safety-critical system design work, the professional identity and practice of safety professionals, the influence of accident narratives on safety recommendations, personal risk-taking behaviour among safety professionals, the purpose and function of safety signage, and the nature of safety improvement journeys.
For further information please contact: David Provan david.provan@griffithuni.edu.au or Drew Rae d.rae@griffith.edu.au

New Course in Resilient Health Care Management

By Tarcisio A. Saurin


Healthcare systems have been a traditional domain of interest for resilience engineering researchers and practitioners. The growing interest in resilient health care has given rise to a unique diploma course on resilient health care management The course is provided by the School of Health and Welfare at Jonkoping University in Sweden in collaboration with the International Society for Quality in Healthcare (ISQua) and supported by Macquaire University, Sydney, Australia.
The course timeline is from October 2019 to December 2020, including residential and online learning. Registration can be made at ju.se/rhcm.


Bridging the Differences
An interview with John Allspaw


Cofounder of:
https://www.adaptivecapacitylabs.com/
https://www.kitchensoap.com/about-me/



By Beth Lay


Imagine a world where the pilot of a Boeing 737 pulls up to the gate and says to his co-pilot “this gage is bugging me; I’m going to make it bigger and move it.” This is exactly what happens in the world of Twitter, Facebook, Amazon, and Google. Web designers are both the pilot and the designer.  Scary? Yes, but also liberating.  Amazon changes code on average every 11 sec. The frequency and pace of change means that change management practices that exist in traditional companies won’t work for internet-facing businesses. How are things different?

John Allspaw, David Woods, Richard Cook, Asher Balkin, Laura Maguire, and Marisa Grayson have formed a consortium of industry leaders and researchers united in the common cause of understanding and coping with the immense levels of complexity involved in the operation of critical digital services.  https://www.snafucatchers.com/  
 
“SNAFU is an acronym for "Situation Normal: All F#$@ed Up". The term implies that the world is normally broken, that conditions are ordinarily chaotic, disordered, and dysfunctional. Use of the term generally indicates the speaker is worldly wise or even jaded enough that this state is not surprising.”  
 
In this interview with John Allspaw, former CTO of Etsy, we explore the work Snafu Catchers has been doing with internet-facing businesses and how they are changing things up in this domain.
 
Which paradigm shifts do people experience?
 
Failure isn’t linear and don’t forget the people. Linear lines of causality are an illusion.  It’s easy to illustrate how multiple contributing factors are each necessary but only jointly sufficient to trigger an incident.  Software engineers are not deeply rooted in causality (typical level of familiarity is 5-whys) so it’s pretty easy to flip this paradigm but it’s more difficult to shift to what to do differently when considering systemic accident models.  Typical post mortems go something like this: at time X, this request went here and the router did this.  Written in passive tense, post mortems miss the stories of the people.  John asks “Don’t you think what you did is important? What was hard?”  He asks “When dealing with an outage, have you ever found yourself completely confused?”  Everyone raises their hand.  “You are about to take an action, finger poised above the button, thinking to yourself I’m going to do this! Knowing there’s an equal chance it will make it worse.  Do you remember that palpable thing, when the hairs on your arm stood up? Show me in this post mortem document where this is!  I’ve been there, I’ve felt it…”  Per John, we want to learn from failure but the trick is to bring attention to how things normally work.  Don’t dismiss what kept it from being as bad as it could have been and what happens in normal work that we can learn from.
 
The world isn’t deterministic.  Software engineers are paid to write and operate code to construct a deterministic i.e. predictable world!  They inhabit a world 1s and 0s.  What’s on the screen is very real to them.  They build remarkably rich mental models with a paucity of information and have a sophisticated understanding of how the system works.   Snafu Catchers brings forth that you can’t see code run; what you really have are little keyhole views of the system, snap shots of representations and everyone has a different mental model, based on their own experiences, of the system.  
 
The real work is cognitive. The vast majority of conversations that software engineers have are about what the machine is doing.  What is actually happening is software engineers are building and continually recalibrating how the system works; they are anticipating, reframing and prioritizing, using pictures of what’s happened in the past to influence designs for the future.   As soon as the idea that the real work is cognitive becomes clear, a light bulb goes off.   They realize how little of their normal dialog is about cognitive work and this opens the possibility to ….
 
A vision of the future.
John has a 13-year-old daughter.  He envisions that when she enters the workforce, we’ll reflect “Oh, I remember those days! That’s when we still saw people as the weak link.” the same as we now think about doctors who did advertisements for cigarettes.

Book Corner: Foundations of Safety Science

By Sidney Dekker


I wrote Foundations of Safety Science, my latest book, foremost for safety practitioners and students. I decided, perhaps foolishly or presumptuously, that all could benefit from a more solid grounding in the foundations of the science of safety (such as it is, I hear Erik Hollnagel justifiably say). 
My main concern was a lack of fluency, of literacy, in the ideas, concepts and theories that make up our field—the genealogy, the interconnections, the historical roots from which even today’s models stem. Much safety education—if it discusses models at all—stops at Swiss cheese (which is thirty years old). Even more safety education is organized around applicable laws, regulations, policies, best practices, methods and techniques, often driven by peer-to-peer influence—inspirations from what others in other organizations have done—and hand-me-down knowledge. 
And actually, not all safety practitioners were educated as safety practitioners. Many have backgrounds in operations, in HR, in engineering or chemistry or a mechanical trade or psychology or something else altogether. To put it crudely, they make up a bunch of happy amateurs, who easily reinvent the wheel, who believe that one model is all there is, who introduce countermeasures, embrace slogans or set targets without understanding the conceptualization of danger and its applicability at all
I’ve chosen an episodic approach to organizing this book. That is, I have divided it up into time slices. Every chapter is founded on the ideas of a particular era—each roughly a decade from the past century and this one. It then explores how these have influenced our thinking in safety in other decades ever since. Of course, the lines and categories of what belongs to which decade, or what inspired what exactly, can always be debated, as they should. They are not in this book to radiate an impression of linear, historical truth. Rather, they are a way for me to organize the ideas, and for a reader to start thinking with them. 
 
How are today’s ‘hearts and minds’ programs, for example, linked to a late-19th century definition of human factors as people’s moral and mental deficits? What do Heinrich’s ‘unsafe acts’ from the 1930’s have in common with the Swiss Cheese Model of the early 1990’s? Why was the reinvention of Human Factors in the 1940’s such an important event in the development of safety thinking? What makes many of our current systems so complex and impervious to Tayloristic safety interventions? I have tried to review the theoretical origins of major schools of safety thinking, and to trace the heritage of, and interlinkages between, the ideas that make up safety science today. Of course, the book concludes with Resilience Engineering as the go-to school of thought today, tracing its roots in the work of Rasmussen, Woods, Hollnagel, Cook and others.

https://www.crcpress.com/Foundations-of-Safety-Science-A-Century-of-Understanding-Accidents-and/Dekker/p/book/9781138481787
Contents: The 1900s and Onward: Beginnings. The 1910s and Onward: Taylor and Proceduralization. The 1920s and Onward: Accident-Prone. The 1930s and Onward: Heinrich and Behavior-Based Safety. The 1940s and Onward: Human Factors and Cognitive Systems Engineering. The 1950s, 1960s and Onward: System Safety. The 1970s and Onward: Man-Made Disasters. The 1980s and Onward: Normal Accidents and High Reliability Organizations. The 1990s and Onward: Swiss Cheese and Safety Management Systems. The 2000s and Onward: Safety Culture. The 2010s and Onward: Resilience Engineering. Postscript.

Strengths and Weaknesses of Resilience Assessment Grid

By Sheuwen Chuang


Resilience engineering has been advocated as an alternative to the management of safety over the last decade in many domains. However, to facilitate metrics for measuring and helping analyze the resilience performance remains a significant challenge. To facilitate an enhanced understanding of what makes resilience performance of a system possible, Prof. Erik Hollnagel developed the Resilience Assessment Grid (RAG). The RAG is different from a retrospective analysis of organization's resilience after an accident has happened. It is a relatively new open-structured questionnaire-based tool that is used to collect data reflecting the four resilience potentials (respond, monitor, anticipate, and learn) in an organization or system. 
The RAG has been tailored and used by a few organizations in practice and researches, such as an offshore oil and gas company, the air traffic management system, rail traffic management, and emergency departments of hospitals. These cases revealed the strengths and weaknesses of RAG as follows.

Strengths
The RAG is composed of four coherent question sets of proxy measurements to each resilience potential for supporting resilience management. Based on the proxy measurement of the four potentials, the applications of RAG indicated that the participants had a better understanding of how resilient performance present, and how the organization supports each of the four potentials to be capable of performing resilient performance. Its visualization charts, i.e. the radar chart for demonstrating each potential and the stat char for presenting the overall system potential, facilitate clear directions to build an organization’s resilience.  
 
Weaknesses
The RAG’s open-ended structured questionnaire inherently has a questionable quality of survey data, which could be generated by interviewers or interviewees or participants, no matter during face-to-face interviews, focus group discussions or surveys. This method might raise concerns about valid scale in determination of scale level between investigation team members and interviewees. Besides, the use of the open-end questionnaire is a time-consuming process for data collection.
 
Implementation of RAG
For wider applications, first, it requires the analyst to adjust the RAG’s structure to the organization or system being studied. Next, selecting interviewees or participants requires a consistent and reasonable qualification criterion when use it and for multiple investigations. Finally, building a data pool of responding answers to each question for improving scale validity. 
 
References:
Hollnagel, E., 2015. RAG – Resilience Analysis Grid. Retrieved from http://erikhollnagel.com/onewebmedia/RAG%20Outline%20V2.pdf.
Patriarca et al., 2018. An Analytic Framework to Assess Organizational Resilience. Safety and Health at Work 9, 265-276.
Sheuwen Chuang, Ju-Chi Ou, Hon-Ping Ma, Measurement of resilience potentials in emergency departments – Applications of a tailored resilience assessment grid. Safety Science, 2020, 121: 385-393

Upcoming events

 

Resilience Engineering webinar series


Sarah Carriere Nov. 26

From cheese to STEW: Incorporating a systems’ approach to critical incident analysis





Systems Thinking for Everyday Work (STEW) are discussion cards that frame team conversations and support systematic group discussion. After a recent medication patient safety incident, the 6 principles of STEW were applied to better understand, analyse, identify opportunities for improvement and sustain team resilience.  Sarah will describe the practitioner and organizational experiences with this framework.

Sarah Carriere is based in Vancouver as a Leader for Health System Improvement for the BC Patient Safety & Quality Council. As a registered nurse, Sarah has worked in surgical and critical care environments, which then morphed into a passion for clinical research, quality improvement, high reliability systems and complexity science. Sarah’s passion lies in supporting people to see and do things differently, to challenge the status quo, and learn from what makes things go right and how many times this can be replicated. Sarah’s main passion lies in how we hold ourselves accountable for ensuring we support teams to practice in a psychologically safe environment that leads to safer and patient-partnered care. She has also supported and led numerous safety-focused quality improvement initiatives, clinical research studies and large-scale provincial quality improvement initiatives.
 

Presented by REA & DARWIN Community of Practitionners

REA & DCoP are a global community of practice creating opportunities to collaborate with others who are leading research and developing practical applications in how to create resilience in complex systems.  Our events allow participants to learn from other domains seeking to engineer resilience into their systems such as healthcare, critical infrastructure, energy, critical digital services, transportation (aviation, marine, rail), automation (self-driving cars, drones), emergency response, finance, and aerospace.

Our webinar series on resilience engineering topics spans theory, practice, cases, research and more!  Watch  the REA website for our calendar of upcoming presentations.


To join: 
https://global.gotomeeting.com/join/976649149 
4-5pm Central European Time
10-11am Eastern US Time (US)
7-8am Pacific Time

Next UP Erik Hollnagel in December
REA Communications team
Resilience Engineering Association newsletter and website blog are brought to you by the Resilience Engineering Association Communications team: 
  • Beth Lay, Lewis Tree Services, US
  • Lorin Hochstein, Netflix, US
  • Matthieu Branlat, SINTEF, Norway
  • Tarcisio Abreu, Universidade Federal do Rio Grande do Sul, Brasil
  • Sheuwen Chuang, Taipei Medical University, Taiwan
Contact any of us directly with news or submissions or send to rea-communications-team@googlegroups.com.
Looking forward to hearing from you!
Resilience Engineering Association © 2018
info@resilience-engineering-association.org  |  www.resilience-engineering-association.org



Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list.
 






This email was sent to <<Email Address*>>
why did I get this?    unsubscribe from this list    update subscription preferences
Resilience Engineering Association · MINES ParisTech - CRC · Sophia Antipolis B.P. 207F-06904 · France

Email Marketing Powered by Mailchimp