Getting started with Site Reliability Engineering (SRE): A guide to improving systems reliability at production

schedule Oct 30th 01:00 - 02:00 PM place 202 A/B people 1 Interested

It’s 3:32am. Your phone goes off. You take a peak, you see a few text messages and a missed call all within the last 10 minutes. Oh boy, looks like it’s another incident in production…

This is a dreaded scenario that many of us have gone through, whether you are a developer, an incident manager, an executive, or part of an operations team supporting customer-facing production systems.

Site Reliability Engineering (SRE) is a term coined by Google that focuses on improving reliability of systems. It’s a collection of techniques and processes to help increase effectiveness of production systems in an effort to minimize outages and to reduce the time to restore a system when there is an outage.

Lots of organizations are adopting SRE principles and are getting good results. It ties well with any organization going through Agile adoption, as the similar thinking of culture change and guided-process enablement can be applied on the operations side.

In this talk, I present some key SRE concepts along with my experience at enabling SREs that can be implemented at various software organizations. We will look at both technical and organizational changes that should be adopted to increase operational efficiency, ultimately benefiting for global optimizations - such as minimize downtime, improve systems architecture & infrastructure.

Whether you are a seasoned technical manager, an architect, an operations engineer or even a developer, the tools and techniques presented in this session can help you to start re-shaping your technology operations to enable better systems reliability… and reduce those dreaded 3am calls!

 
 

Outline/Structure of the Talk

The talk will be presented as a series SRE concepts that relate to reducing systems downtime and improving systems reliability. Following each concept, I look at how the concept can be applied to technical operations and share experience of how I have successfully rolled out at my projects. The goal is help attendees go about implementing the concept at their own organization.

In a 60-minute talk, I will have the following structure:

  • 5 minutes: introduction/who am I, setting the stage for common pains in supporting a system in production
  • 40 minutes: discuss in depth 7~8 key concepts relating to enabling SRE (approx. 5 mins per concept & example):
    • Improving incident response
    • Defining error budgets
    • Better monitoring of systems
    • Getting the best out of systems alerting
    • Eliminating manual, repetitive actions (toils) by automation
    • Designing better on-call shifts/rotations
    • How to design the role of the Site Reliability Engineer (who effectively works between application development teams and operations support teams)
  • 5 minutes: a showcase of a few examples of enabling SRE in a few companies of different sizes
  • 10 minutes: closing remarks & summary of key takeaways, followed by Q&A

Note: I can make this into a 40 minute talk by shuffling around some slides if it better fits the conference schedule.

Learning Outcome

By the end of the presentation, the audience will gain some insight into common challenges that most technology operations teams have and how such challenges can be improved via the concepts of SRE.

In particular:

  • easy takeaway for Operations teams that one can implement without much resistance (e.g., better on-calls)
  • how Operations team can help guide the "ship of success" through effective use of better technology
  • how to Operations teams can influence rest of the organization (i.e., make the organization empathize with the tough & hard job Operations teams have)

Target Audience

Anyone interested in learning how to increase system reliability

Prerequisites for Attendees

No prerequisite required. Having some slight experience and knowledge of technical operations, such as infrastructure management, deployments, the pains of being on-call, etc., will be helpful.

schedule Submitted 1 year ago

Public Feedback

comment Suggest improvements to the Speaker

  • Liked Sriram Natesan
    keyboard_arrow_down

    Sriram Natesan - How adopting an agile approach helped Finance & Risk group deliver a regulatory initiative

    Sriram Natesan
    Sriram Natesan
    Sr. Manager
    Deloitte Consulting
    schedule 1 year ago
    Sold Out!
    40 Mins
    Experience Report
    Intermediate

    CFOs in today's digital economy are looking to invest significant capital on data driven initiatives to deliver strategic analysis to business partners. However this is often reprioritized due to regulatory requirements.

    This session is about how a large European Bank successfully delivered a large regulatory transformation program in 2017 using an agile approach. Driven by Finance & Risk groups and enabled by technology, incremental business value was delivered to Finance and Risk stakeholders.

    The key challenges faced required an approach to handle evolving regulatory requirements, lack of trust and collaboration between Business and Technology, lack of knowledge and experience in the solution domain, integration of new technology assets to automate business requirements and an aggressive timeline enforced by the regulator.

    As an Agile Coach on the project, my role was to help the Finance & Risk groups with the value stream mapping, formation of cross functional teams, developing an agile delivery approach, and provide training and coaching for the teams and leadership on adoption of agile principles and practices.

    The successful delivery was largely due to business foresight to maneuver around typical IT challenges and instead adopt an approach using agile principles that put delivering business value over fixed scope. Through this approach, the clients were able to deliver the solution that addressed the immediate needs but this also position them to leverage for future regulations.

    This talk will elucidate the backdrop, challenges that posed the business, the agile approach, culture and mindset that was adopted, and the resulting outcomes.

    If you have thought of or thinking of adopting Agile mindset in a non-IT environment, this is the session for you. In this session we will share some techniques we developed and hiccups that we managed along the way.

    By the end of this session, you will likely have gained some valuable insights that you can take back to your Organization and adopt agile principles and practices in areas outside of IT.

  • Liked Raj Mudhar
    keyboard_arrow_down

    Raj Mudhar - The Five Habits of Highly Effective Agile Organizations

    60 Mins
    Talk
    Intermediate

    It's the classic leader's lament. Driving organizational performance in a way that delivers on business outcomes while engaging employees.

    Organizations have been deploying Scrum, SAFe, DAD, and a host of other practices in the hope of achieving better business outcomes. We all know that practices alone don't generate the kind of powerful results you need to succeed. The missing ingredient? We hear the word culture a lot. But it is really about operating norms, or habits and behaviors. What I've observed through dozens of transformations within my company and clients are 5 habits that the leading organizations all possess. When these 5 habits are ingrained, the practices fall into place, and performance starts to rocket.

    In this session you'll learn the habits, and why they drive performance. You' also learn about the key questions you can start asking to encourage the habits to take hold in your team, or more broadly, in your organization. The path to performance is paved by changes in behaviors that are reinforced daily. Asking the right questions at the right time can be a powerful way to nudge behaviors in the right direction.

    Having said that, it's not enough to create the conditions for new habits to form. Countless studies, including famous ones by Wolfram Schultz, neuroscientist at the University of Cambridge, have shown that a cue and reward on their own aren't enough to create a lasting habit. Only when your brain starts to anticipate a reward will the habit become automatic.

  • Liked Sriram Natesan
    keyboard_arrow_down

    Sriram Natesan / Kat Lee / Monique Letterio - Business Agility: Lessons from the Trenches

    60 Mins
    Talk
    Advanced

    Agile has been pervasive and proven to be successful for technology product development for more than two decades. Today more organizations are taking agile principles and practices and applying them outside of IT to their business as usual (BAU) activities such as marketing or strategy development. But how easy is this next generational aspect of Business Agility? Can an approach that was rooted in technology product development be successfully applied as an accelerator to achieve overall business efficiency and effectiveness?

    In this session, different case studies, including a large Canadian insurance provider, will demonstrate lessons learned from organizations that have taken agile principles and practices to help them drive commercial impacts, build people and their capabilities, adoption of the right mindset and behaviors, and improve performance. Some of the questions that will be addressed:

    • What does business agility mean and why does it matter?
    • How can Corporate Functions such as HR, Finance, Risk and Marketing, which are often entrenched in traditional ways of working, become agile delivery centers?
    • Do agile practitioners need to “stay true” to the principles and practices they originally learned for technology in order to be effective in the business?
    • How should agile business teams be optimally structured to align with an enterprise agile COE?
    • What can leaders learn from others’ journeys so we can determine whether agile can truly thrive outside IT and be scaled across the organization?

    If you are a Business Leader who is considering next steps on enterprise agility, organizational resilience, and a culture of adaptability, attend this session to learn valuable and pragmatic insights as you begin your own agile journey.

  • Liked Tanvir Ahmed
    keyboard_arrow_down

    Tanvir Ahmed - Agile for Data Analytics: Where Business meets Science and Software Development

    40 Mins
    Experience Report
    Beginner

    Every Business is betting big on Data Science and making it an essential part of the enterprise's decision making mechanism whether it is marketing, risk analysis, fraud prevention, cyber threat management or even investment choices.

    However, the surprising trend we observe is that, organizations are taking old school sequential and silo-ed approach to organize and deliver data driven business decisions despite Data mining being very iterative and non-linear in nature.

    As a result, the time between a derived insight and the organization’s time to action can create large gaps that breed a paradigm of striking oil without knowing how to monetize it. That causes the data driven insights to become useless; as it is too late to execute, or need to rebuild the models to find relevant insights. However, with agile principles that favors incremental changes driven by small cross-functional empowered teams; data science can truly be the driver for building insight driven organizations.

    In this session, we will discuss the need for enterprises to embrace agile principles for their data analytics work and share the story of how we transformed a Data Analytics endeavor at a Big Canadian organization.

  • Liked Sebastien Gelus
    keyboard_arrow_down

    Sebastien Gelus - The Rise of Agile Marketing. Getting it right with TD Bank.

    40 Mins
    Experience Report
    Beginner

    Co-presented with Devin Sawyer, Vice President of Digital, Social, and Content Marketing at TD.

    Through the story of TD’s Agile Marketing transformation, our talk looks to cover the rise of Agile Marketing, its benefits, and how to think about agile transformation and maturity in a non-agile world.

    Despite the fact Agile has existed in the world of software development for over a decade, there has been less uptake in ‘Big A Agile’ outside of IT. Enter marketing, a function currently undergoing immense digital transformation brought by:

    • a growing amount of digital channels on which to find customers,
    • the availability to data in and out of these channels one can use to make decisions,
    • and the potential to target specific customers with personalized messaging in real time.

    This has significantly increased the organization’s expectation of the marketing department to transition from a cost center, to an ROI-based revenue generator. To become this, implementing the latest in marketing technology is simply not enough to unify the customer experience and drive revenue competitively. Siloed departments, cumbersome processes, traditional marketing behaviours, and waterfall hand offs between teams and agencies make it hard to mobilize the marketing function around the customer and quickly go to market with a winning digital customer experiences.

    Enter Agile Marketing. The ability to bring cross-functional teams of marketing experts to tackle marketing objectives together allow organizations to put forward what is best for the customer based on an increasing insight-driven understanding of them. TD Bank is doing just that. The TD Bank Canada’s Marketing organization boasts of a 450 employees that are constantly trying to provide the best experience to prospects and customers. Departments such as strategic planning, marketing analytics, and digital marketing at TD need to work in lock step to continue to provide the legendary customer experience they are known for. To do so, TD is deploying agile marketing and seeing the benefits. Starting with a pilot, then two, and then growing the amount of teams that are practicing agile - they are continuously learning and optimizing the way they enable their employees to drive business value and customer experience using agile marketing.

  • Liked Rachit Shankar
    keyboard_arrow_down

    Rachit Shankar / Kushagra Tripathi - End to End Agility: With(out) Marketing?

    40 Mins
    Experience Report
    Beginner

    The customer of today is interacting with organizations through an ever-evolving ecosystem of channels and expects interactions to be quick and highly personalized. Organizations continue to face the challenge of delivering business and technology products that accurately solve very specific problems for their customers.

    While the majority of organizations focus their efforts on technology driven agile transformations that try to solve the ‘cost center’ problem, very few manage to close the loop on customer feedback, often relegating these duties to the “business”. But what about marketing? How does an organization integrate marketing functions in a way that not only closes the feedback loop, but also informs the organization on what challenges are being faced by customers? What if you were to start transforming the way you interact with the customer first, and use that information to prioritize which product lines go down the agile path? In essence, build your feedback loop first and let the market inform your organizations transformation.

    Through our work with various organizations, we will share our experience (the good AND the bad!) of how marketing can lead the charge for a customer-centric agile transformation and operate as an end-to-end Agile shop. We will provide an overview of how an organization in the midst of a broader Agile transformation strategically placed the marketing component at the forefront in order to accelerate customer acquisition and reducing customer acquisition cost by 300% through the use of marketing experiments for large scale campaigns.

  • Liked Raj Mudhar
    keyboard_arrow_down

    Raj Mudhar / Anik R Somani - Why your Agile CoE is going to fail.

    40 Mins
    Experience Report
    Beginner

    Many organizations are trying to adopt Agile as a methodology across their enterprises through standing up Agile Centre of Excellence (CoE). This entity is often revered as the governing body for “all things agile” across the organization and if you don’t play nice, you’re often greeted with a “sorry, you’re not allowed to be agile”. Sound familiar?

    If not, maybe some of these attributes might ring a bell:

    • Enable change Force change on the organization, tomorrow
    • Provide training and coaching Sorry teams, fend for yourselves
    • Enable process creation Nope, its our way or the highway
    • Remove impediments Let’s create some new problems
    • Create awareness Shhh don’t tell anyone

    So we’re going to fail. Great. Now what do we do?

    Having run CoE’s that have made their hair go grey and having set up several CoEs for fortune 500 companies in healthcare, banking, insurance and telecommunications, we’ve seen and made a lot of mistakes when it comes to Agile CoEs. And in the process, developed a backlog of war stories for “What to do if you want your Agile CoE to fail”, or more importantly “Things you might want to consider if you want your CoE to have a shot at success”.

    Come learn more about all the things not to do when setting up an Agile CoE…and hey, we guarantee you’ll find some nuggets to enable your CoE to succeed and deliver value as part of your transformational journey.

  • Liked Raj Mudhar
    keyboard_arrow_down

    Raj Mudhar / Kat Lee - Oh, Behave! The art of nudging your way to a better workplace

    90 Mins
    Workshop
    Advanced

    Imagine a truly agile workplace. What operating norms, habits, or behaviours permeate the workplace? What do people say and do that is different? We hear terms like collaboration and self-organization, but do your colleagues truly understand what that looks like in action? Do they know how to change their stance and behavior to make these skills a part of how they work?

    What if we could enumerate the behaviours we want to see - the organizational habits? And then, what if we could use nudges to make these behaviours come alive in the workplace? And further, what if you could arm every employee with the tools to apply nudges themselves, as part of a new meta-skill that reinforces continuous improvement and evolution of your organizational norms?

    In this workshop, we will walk you through the tools and techniques you can use to start shaping your organization of the future. Whether you are a seasoned executive, or a team member, the tools and techniques are universal and can be applied at all levels.

  • Liked Wesley
    keyboard_arrow_down

    Wesley - The United States Marines - a case study in Enterprise Agility

    40 Mins
    Experience Report
    Beginner

    The talk is on how the Marine core builds cross functional agile teams to accomplish amazing outcomes. In this talk we will discuss five main areas:

    1. A purpose value driven organization

    2. Deep empathy for your colleagues

    3. Disciplined and rigorous approach to quality

    4. A belief in optimizing the whole (system thinking)

    5. Empowerment of people

    The Marine Corps is able to transform individuals from various walks of life into high performing agile teams through their approach. In this discussion I will discuss examples from my career as a Marine and the lessons we can apply to our jobs.