The Multi-team Software Delivery Assessment is a simple, easy-to-execute approach to assessing software delivery across many different teams within an organization. Devised by Matthew Skelton of Conflux. The assessment according to Matthew covers ten dimensions in total:
- Team Health – based on the criteria from Spotify Squad Health Check with some additions:
- Easy to release – how easy is it to release a change to the software you work on?
- Suitable process – how suitable is the process for developing and delivering software?
- Tech quality – how healthy is the code base?
- Value – do you work on valuable things as a team?
- Speed – how rapidly do you work as a team?
- Mission – how well do you know why you are working on things?
- Fun – how fun is it to work in your team? How much camaraderie and sense of teamwork?
- Learning – how much do you learn as a team?
- Support – how much support do you get as a team?
- Pawns or players – how much control do you have over what you work on and how?
- Psychological Safety – how safe do you feel to raise concerns?
- Teams Around Us – how well do the teams around you work with you and your team?
- Delivery Platform – how effective and easy to use is the delivery platform underpinning your team’s delivery?
- Management Style – how effective and appropriate are the approaches by management and other senior stakeholders?
- Deployment – based on key questions from the book DevOps for the Modern Enterprise by Mirco Hering:
- Environment Rebuild – what would happen if you decided to: blow away the environment and rebuild it from our stored configuration
- Fresh Config – what would happen if you decided to: delete the application config and redeploy it
- Redeploy App – what would happen if you decided to: redeploy the application even though nothing has changed
- Rerun Tests – what would happen if you decided to: rerun the test suite and then again
- Flow – based on criteria from the book Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim, plus some details from The Principles of Product Development Flow by Don Reinertsen:
- Cycle Time – how long does it take for a code change to go from version control to running in Production? (Minimum, Typical)
- Deployment Frequency – how often does your team deploy to Production?
- MTTR – how long does it take to restore your application or service after an incident?
- Failed Changes – what proportion of changes to your application or service in Production fail or need remediation? (This is typically the number of failed deployments)
- Work in Progress – how many things does your team work on at the same time? (Minimum, Typical)
- Innovation – how well are you able to innovate around delivery approaches?
- Onboarding – how effective is the onboarding process for new teams and new staff?
- Branch Age – how long do your branches live?
- Retrospectives – how effective are your team retrospectives?
- Continuous Delivery – based on selected criteria the book Continuous Delivery by Jez Humble and Dave Farley:
- Release Candidate -every check in leads to a potential release?
- Done – Done means released?
- Automated Config – is configuration performed by automated processes using values taken from your configuration repository?
- Config Options – is it easy for anyone to see what configuration options are available for a particular version of an application across all environments it will be deployed into?
- Broken Builds – do you check In on a Broken Build
- Failing Tests – do you comment on Failing Tests?
- Binaries – build your binaries once?
- Stop The Line – if any part of the pipeline fails, do you stop the line?
- Idempotent Deployment – deployment process is idempotent?
- Stubs – do you use stubs to simulate external systems?
- API Replay – do you record interactions against a Service or Public API?
- Blue-Green – do you have a mechanism that allows you to test a new version alongside an existing version and roll back to the older version if necessary?
- Environment History – is it possible to see a history of changes made to every environment, including deployments?
- DB Changes – decouple application deployment from Database Migration?
- Operability – based on selected criteria from the book Team Guide to Software Operability by Matthew Skelton, Alex Moore, and Rob thatcher
- Collaboration – how often and in what ways do you collaborate with other teams on operational aspects of the system such as operational features?
- Spend on operability – what proportion of product budget and team effort is spent addressing operational aspects? How do you track this?
- Feature Toggles – how do you know which feature toggles are active for this subsystem?
- Config deployment – how do you deploy a configuration change without redeploying the software?
- System health – how do you know that the system is healthy
- Service KPIs – How do you track the main service/system Key Performance Indicators (KPIs)? What are the KPIs?
- Logging working – how do you know that logging is working correctly?
- Testability – how do you show that the software system is easy to test? What do you provide and to whom?
- TLS Certs – how do you know when an SSL/TLS certificate is close to expiry?
- Sensitive data – how do you ensure that sensitive data in logs is masked or hidden?
- Performance – how do you know that the system/service performs within acceptable ranges?
- Failure modes – how can you see and share the different known failure modes for the system?
- Call tracing – how do you trace a call/request end-to-end through the system?
- Service status – how do you display the current service/system status to operations-facing teams?
- Testing and Testability – based on selected criteria from the books Agile Testing by Lisa Crispin and Janet Gregory, Continuous Delivery by Jez Humble and Dave Farley, Growing Object-Oriented Software by Steve Freeman and Nat Price, Working Effectively with Legacy Code by Michael Feathers, Team Guide to Software Testability by Ash Winter and Rob Meaney:
- Test-first (classes) – what proportion of the time do you write the test first for methods and classes?
- Test-first (features) – what proportion of the time do you write the test first for features and behavior?
- Unit Test % – at what code coverage level do you deem your Unit Tests to have succeeded?
- Feature Tests % – at what feature coverage level do you deem your Feature Tests (or Behavior Tests) to have succeeded?
- Feature Coverage – what proportion of the features in your code is covered by a Feature Test (or Behavior Test)?
- Test Data – what proportion of your test data is generated from scripts and automatically injected into data stores?
- Deployment – what proportion of your deployment pipeline code has tests covering the behavior of build and deployment?
- Testability – what proportion of your time is spent on making the software testable?
- CDCs/Pact/SemVer – how much do you use inter-team testing approaches such as Consumer-Driven Contracts (CDCs)/Pact/Semantic Versioning?
- Other Code – how confident are you in the code from other teams in the organization that you work with or consume (but not write)?
- Reliability and SRE – based on selected criteria from the books Site Reliability Engineering by Betsy Beyer, Chris Jones, Jennifer Petoff, & Niall Murphy, The Site Reliability Workbook edited by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, & Stephen Thorne, Seeking SRE edited by David N. Blank-Edelman, Team Guide to Software Operability by Matthew Skelton, Alex Moore, & Rob Thatcher:
- Service Availability – how available (in “nines”) does your service or application need to be and how do you know or decide?
- User Goals and SLIs – what should your service/application do from the viewpoint of the user?
- Understanding users and behavior – who are the users of the software and how do they interact with the software? How do you know?
- SLIs/SLOs – how do you know when users have experienced an outage or unexpected behavior in the software?
- Service Health – what is the single most important indicator or metric you use to determine the health and availability of your software in production/live?
- SLIs – what combination of three or four indicators or metrics do you use (or could/would you use) to provide a comprehensive picture of the health and availability of your software in production/live?
- Error budget and similar mechanisms – how does the team know when to spend time on operational aspects of the software (logging, metrics, performance, reliability, security, etc.)? Does that time actually get spent?
- Alerting – what proportion of your time and effort as a team do you spend on making alerts and operational messages more reliable and relevant?
- Toil and fixing problems – what proportion (approx) of your time gets taken up with incidents from live systems and how predictable is the time needed to fix problems?
- Time to Diagnose – how long does it typically take to diagnose problems in the live/production environment?
- On-call – based on selected criteria from the book Site Reliability Engineering by Betsy Beyer, Chris Jones, Jennifer Petoff, & Niall Murphy, The Site Reliability Workbook edited by Betsy Beyer, Niall Richard Murphy, David K. Rensin, Kent Kawahara, & Stephen Thorne, Team Guide to Software Operability by Matthew Skelton, Alex Moore, & Rob Thatcher:
- Purpose of on-call – how would you define “on-call”?
- Benefits of on-call – what are some ways in which the software benefits by having developers on-call?
- Reward – how are you rewarded for being on-call out of working hours?
- On-call UX – what is the User Experience (UX) / Developer Experience (DevEx) of being on-call at the moment?
- Learning from on-call – what happens to knowledge gained during on-call? How is the software improved based on on-call experiences?
- Attitude to on-call – under what circumstances would on-call not be a burden?
- Future on-call – what would be needed for this team/squad to be happy to be on-call?
- Tooling for on-call – what tooling or process is missing, ineffective, or insufficient at the moment in relation to on-call?
- Improving on-call – how much time do you spend as a team improving the on-call experience? How often do you work on improvements to on-call?
- Flexibility of on-call – how flexible is the on-call rota or schedule? In what ways does the schedule meet the different needs of team members?
- Accessibility of on-call – how accessible is on-call? Specifically, what proportion of your team members are actually on-call regularly?
- Security and Securability – based on selected criteria from the books Agile Application Security by Laura Bell, Michael Brunton-Spall, Rich Smith, Jim Bird; Alice and Bob Learn Application Security by Tanya Janca; Secure by Design by Dan Bergh Johnsson, Daniel Deogun, Daniel Sawano; Continuous Delivery by Jez Humble and Dave Farley; and Threat Modeling: Designing for Security by Adam Shostack:
- OWASP Top Ten – do you check for the OWASP Top Ten security risks?
- Secure Design Principles – what is the approach to security and compliance?
- Threat Modeling – how do you approach threat modeling?
- Domain-driven Security – in what ways do you work with domain experts to help make code secure?
- Input Testing – what kind of input testing do you perform in the deployment pipeline?
- Least Privilege – what approach do you take to access permissions for the accounts used to run your software?
- Supply-Chain Security – how do you verify the quality and safety of the external software components used in your software?
- HTTPS Everywhere – where is HTTPS (HTTP over TLS/SSL) used within your software?
- Automated Security Testing – what kind of automated security testing is performed on your code and when?
- Responsibility for Security – who is responsible for the security and securability of your software?
- Policy as Code – are security policies defined in code (or configuration) and testable?
- Team Topologies team interactions – based on selected criteria from the books Team Topologies by Matthew Skelton and Manuel Pais and Dynamic Reteaming by Heidi Helfand:
- Team Type – is the type of your team clear to you and to other teams around you?
- Long-lived Teams – how durable (long-lived) is your team? When will the team be disbanded?
- Changing Team Members – how are team members added or removed from the team? What informs the process?
- Team API – how do you define the remit and focus of the team?
- Team Interactions – how do you define, plan, and explore the interactions between your team and other teams?
- Inter-team Dependencies – what approach do you take to track dependencies between teams?
- Team Workspace – in what ways does your physical and/or online team space contribute to a sense of team cohesion and empowerment?
- Team Cohesion – how cohesive does your team feel? How much trust is present inside the team?
- Team Cognitive Load – how does team cognitive load affect the team’s work?
The aim of the assessments is to promote and sustain a positive working environment for building and running software systems where:
- Changes to software are built, tested, and deployed to Production rapidly and safely using Continuous Delivery practices
- Processes and practices are optimized for the flow of change toward Production
- Software is designed and built to enable independent, decoupled deployments for separate families of systems
- Software is designed and built in a way that addresses operability, testability, releasability, security, and reliability
- Problems in Production are always detected by teams before customers and users notice
- Responsibility and accountability for software changes lead to empowerment and ownership
- Working with software is rewarding and interesting
- Being on-call and supporting the software is sustainable and valuable
- People feel confident to challenge poor practices and approaches
- Teams have a clear mission and well-defined interaction patterns with other teams
Fundamentally, the assessments should help to unblock and enable teams so they can succeed. The team should feel encouraged and empowered to decide on what actions they want to take to improve their processes and practices based on the discussions.
Many organizations find that running team assessments every 3 months provides a good result.