DevOps 2.0: Evidence-based evolution of serverless architecture through automatic evaluation of “infrastructure as code” deployments

schedule May 15th 03:10 - 03:40 PM place Wesley Theatre people 178 Interested

The scientific approach teaches us to formulate hypotheses and test them experimentally in order to advance systematically. DevOps and software architecture in particular, do not traditionally follow this approach. Here decisions like “scaling up to more machines or simply employing a batch queue” or “using Apache Spark or sticking to a job scheduler across multiple machines” are worked out theoretically rather than implemented and tested objectively. Furthermore, the paucity of knowledge in unestablished systems like serverless cloud architecture hampers the theoretical approach.

We therefore partnered with James Lewis and Kief Morris to establish a fundamentally different approach for serverless architecture design that is based on scientific principles. For this, the serverless architecture stack needs to firstly be fully defined through code/text, e.g. AWS CloudFormation, so that it can easily and consistently be deployed. This “architecture as text”-base can then be modified and re-deployed to systematically test hypotheses, e.g. is an algorithm faster or a particular autoscaling group more efficient. The second key element to this novel way of evolving architecture is the automatic evaluation of any newly deployed architecture without manually recording runtime or defining interactions between services, e.g. Epsagon’s monitoring solution.

Here we describe the two key aspects in detail and showcase the benefits by describing how we improved runtime by 80% for the bioinformatics software framework GT-Scan, which is used by Australia’s premier research organization to conduct medical research.

 
2 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Talk

Quick intro and benefits of serverless architecture with focus on burstable web-services that conduct data- and compute-intensive tasks (GT-Scan as example).

Run audience through traditional way of improving architecture: 1) manually evaluating the runtime of all Lambda functions using Xray to identify bottlenecks. 2) Devising a better solution by testing runtime offline and then deploying manually a faster Lambda function 3) manually evaluating the new architecture.

Outline the limitations of this manual approach (error prone, labour intensive), which leads to less architectures being possible to evaluate.

Introducing the hypothesis-driven architecture principle and running the audience through each of the necessary components for this, namely “architecture as text” and “automatic evaluation of any serverless architecture”.

Provide the specific GT-Scan2 example and walk audience through the Cloudformation stack. Outline how this allows us to tweak a single component and deploy our entire architecture at the click of a button.

Showcase Epsagon as the example that allows to monitor our deployment. We can compare our original deployment to our new one, over time, to gather data on how they compare. If the change we made improves performance, we can implement it in production.

Learning Outcome

Take away is how to find the optimal architectural solution using currently available deployment and monitoring tools.

Target Audience

People working on serverless, software engineers, data scientists.

schedule Submitted 1 year ago

Public Feedback

comment Suggest improvements to the Speaker

  • Liked Cameron Joannidis
    keyboard_arrow_down

    Cameron Joannidis - Machine Learning Systems for Engineers

    30 Mins
    Talk
    Intermediate

    Machine Learning is often discussed in the context of data science, but little attention is given to the complexities of engineering production ready ML systems. This talk will explore some of the important challenges and provide advice on solutions to these problems.

  • Liked Dana Bradford
    keyboard_arrow_down

    Dana Bradford - How to Save a Life: Could Real-Time Sensor Data Have Saved Mrs Elle?

    Dana Bradford
    Dana Bradford
    Sr. Research Scientist
    CSIRO
    schedule 1 year ago
    Sold Out!
    30 Mins
    Case Study
    Intermediate

    This is the story of Mrs Elle*, a participant in a smart home pilot study. The pilot study was aimed to test the efficacy of sensors to capture in-home activity data including meal preparation, attention to hygiene and movement around the house. The in-home monitoring and response service associated with the sensors had not been implemented, and as such, data was not analyzed in real time. Sadly, Mrs Elle suffered a massive stroke one night, and was found some time after. She later died in hospital without regaining consciousness. This paper looks at the data leading up to Mrs Elle’s stroke, to see if there were any clues that a neurological insult was imminent. We were most interested to know, had we been monitoring in real time, could the sensors have told us how to save a life?

    *pseudonym

  • Liked Elaina Hyde
    keyboard_arrow_down

    Elaina Hyde - What happens when Galactic Evolution and Data Science collide?

    Elaina Hyde
    Elaina Hyde
    Consultant
    Servian
    schedule 1 year ago
    Sold Out!
    30 Mins
    Case Study
    Intermediate

    This talk will cover a short trip around our Milky Way Galaxy and a discussion of how data science can be used to detect faint and sparse objects such as the dwarf satellites and streams that helped form the galaxy we live in. The data science applications and algorithms used determine the accuracy with which we can make detections of these mysterious bodies and with the advent of greater cloud computing capability the sky is no longer the limit when it comes to programming or Astronomy

  • Liked Holden Karau
    keyboard_arrow_down

    Holden Karau - Testing & Validating Big Data Pipelines with examples in Apache Spark & Apache BEAM

    Holden Karau
    Holden Karau
    Developer Advocate
    Google
    schedule 1 year ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    As distributed data parallel systems, like Spark, are used for more mission-critical tasks, it is important to have effective tools for testing and validation. This talk explores the general considerations and challenges of testing systems like Spark & BEAM through spark-testing-base and other related libraries. Testing isn't enough though, the real world will always find a way to throw a new wrench in the pipeline, and in those cases the best we can hope for is figuring out that something has gone terribly wrong and stopping the deployment of a new model before we get woken up with a 2am page asking why we are recommending sensitive products to the wrong audience*.

    With over 40% of folks automatically deploying the results of their Spark jobs to production, testing is especially important. Many of the tools for working with big data systems (like notebooks) are great for exploratory work, and can give a false sense of security (as well as additional excuses not to test). This talk explores why testing these systems are hard, special considerations for simulating "bad" partioning, figuring out when your stream tests are stopped, and solutions to these challenges.

    *Any resemblance to real pager alert the presenter may have received in the past is coincidence.

  • Liked Graham Polley
    keyboard_arrow_down

    Graham Polley - Look Ma, no servers! Building a petabyte scale data pipeline on Google Cloud with less than 100 lines of code.

    30 Mins
    Demonstration
    Intermediate

    In this talk/demo, I'll describe the Google Cloud Platform architecture that we've used on several client projects to help them easily ingest large amounts of their data into BigQuery for analysis.

    Its zero-ops and petabyte scale features unburden the team from managing any infrastructure, and ultimately frees them up to focus on more important things - like analysing, understanding, and actually drawing insights from the data.

    Forming a conga line of Cloud Storage, Cloud Functions, Cloud Dataflow (templates) and BigQuery in less than 100 lines of code, I'll show how to wire up each component of the data pipeline. Finally, if the Demo Gods are shining down on me that day, I'll even attempt a live demo (I usually regret saying saying that).