dgt_sre08 – Chaos Engineering

Course Title: dgt_sre08 – Chaos Engineering

Overview:

Chaos Engineering is a fundamental discipline and practice crucial for testing the resilience of systems in production environments. This course, designed by our Talent Guardians team who developed “KubeInvaders,” offers an immersive experience into the world of chaos engineering. Participants will explore how to proactively identify weaknesses and improve system reliability through controlled experiments.

Course Objectives:

  • Understand the principles and benefits of chaos engineering.
  • Learn methodologies for designing and executing chaos experiments.
  • Explore real-world case studies where chaos engineering has enhanced resilience.
  • Gain hands-on experience with tools like KubeInvaders, designed to simulate chaotic scenarios in Kubernetes environments.
  • Develop strategies for incorporating chaos engineering into your organizations SRE practices.

Key Modules:

  1. Introduction to Chaos Engineering:
  2. History and evolution of the discipline.
  3. Core principles and objectives.

  4. Designing Chaos Experiments:

  5. Identifying key assumptions.
  6. Defining success criteria.
  7. Planning safe and controlled experiments.

  8. Tools and Techniques:

  9. Overview of popular chaos engineering tools.
  10. Deep dive into KubeInvaders: Features, setup, and usage.

  11. Real-World Applications:

  12. Case studies from industry leaders.
  13. Lessons learned and best practices.

  14. Implementing Chaos Engineering in SRE:

  15. Integrating chaos experiments into your development lifecycle.
  16. Building a culture of resilience within teams.

  17. Hands-On Lab: KubeInvaders in Action:

  18. Step-by-step guide to setting up and running experiments.
  19. Analyzing results and iterating on experiments.

Target Audience:

  • Site Reliability Engineers SREs
  • DevOps professionals
  • System architects
  • Software developers interested in improving system resilience

Prerequisites:

  • Basic understanding of SRE principles and practices.
  • Familiarity with Kubernetes environments is beneficial but not mandatory.

Join us to master the art of chaos engineering, ensuring your systems are robust, resilient, and ready for the unexpected.
The students can push their exercises to the Academy DevOps & SRE GIT project. For this module, create a folder with your username as its name in the following subfolder: https://github.com/Garanti-Del-Talento/gdt_academy/tree/main/dgt_sre08__chaos_engineering