YOW! Data 2021 Day 1

Wed, May 12
Timezone: Australia/Sydney (AEST)
08:45

    Session Overviews and Introductions - 15 mins

09:00
  • Added to My Schedule
    keyboard_arrow_down
    Hilary Mason

    Hilary Mason - Playing with Words: Building Products with NLP

    schedule  09:00 - 09:45 AM place Grand Ball Room star_halfRate

    Imagine machines that interact with us using the same interface we use to interact with each other — spoken language! Recent progress in NLP has opened up new possibilities for language-based systems. In this talk, we'll explore the recent history of language models and highlight novel applications of statistical and deep learning approaches. Then, we'll explore emerging products that automate, generate, and create using these models, and discuss the implications for building them, including safety, ethics, and the invention of new design metaphors. Finally, we'll speculate about where this might take us in the next few years. Can machines ... play?

     
09:45

    Break / Q&A with Hilary Mason - 25 mins

10:10
10:40

    Break / Q&A with Jennifer Marsman - 25 mins

11:05
  • Added to My Schedule
    keyboard_arrow_down
    Hien Luu

    Hien Luu - Scaling the Machine Learning Platform at DoorDash

    schedule  11:05 - 11:35 AM place Grand Ball Room 1 star_halfRate

    DoorDash’s mission is to grow and empower local economies. DoorDash’s business is a 3-sided marketplace composed of Dashers, consumers, and merchants.

    As DoorDash's business grows, it is essential to establish a centralized ML platform to accelerate the ML development process and to power the numerous ML use cases.  We are making good progress, but we are still in the early days of building out our ML platform.

    This presentation will detail the DoorDash ML platform journey that includes the way we establish a close collaboration and relationship with the Data Science community, how we intentionally set the guardrails in the early days to enable us to make progress, the principled approach of building out the ML platform while meeting the needs of the Data Science community, and finally the technology stack and architecture that powers billions of predictions per day and supports a diverse set of ML use cases. They include search ranking, recommendation, fraud detection, food delivery assignment, food delivery arrival time prediction, and more.

11:35

    Break / Q&A with Hien Luu - 25 mins

12:00
  • Added to My Schedule
    keyboard_arrow_down
    Julie Amundson

    Julie Amundson - Evolving the ML Platform organisation at Netflix: a case study

    schedule  12:00 - 12:30 PM place Grand Ball Room 1 star_halfRate

    Do you wish there was a Machine Learning model to tell you how to structure your ML teams? So do I! While we're waiting for that, I'll share the story of how the ML Platform organisation evolved at Netflix. Although this story is specific to our own journey to expand Netflix ML investments, there are a few lessons learned along the way that you'll be able to relate to. There are several factors going into org structure that we'll discuss, including: the specialty and skillsets of ML practitioners, the variety and depth of ML use cases, who's responsible for the data, the ownership model as ML projects go to production, and how the underlying Platforms are situated. I look forward to sharing and hearing your own thoughts afterward!

12:30

    Break / Q&A with Julie Amundson - 25 mins

12:55

    Lunch - 30 mins

13:25
  • Added to My Schedule
    keyboard_arrow_down
    Savin Goyal

    Savin Goyal - Taming the Long Tail of Industrial ML Applications

    schedule  01:25 - 01:55 PM place Grand Ball Room 1 star_halfRate

    Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery and informing buying decisions to fighting fraud. Our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workloads autonomously without the need to be significantly experienced with systems or data engineering. In this talk, I will discuss some of the challenges involved in improving the development and deployment experience for ML workloads. I will focus on Metaflow, our ML framework, which offers useful abstractions for managing the model’s lifecycle end-to-end, and how a focus on human-centric design positively affects our data scientists' velocity.

13:55

    Break / Q&A with Savin Goyal - 25 mins

14:20
  • Added to My Schedule
    keyboard_arrow_down
    Will Radford

    Will Radford - Assisting design with machine learning in Canva’s editor

    schedule  02:20 - 02:50 PM place Grand Ball Room 1 star_halfRate

    Our team at Canva focuses on building features that make design simple, enjoyable and collaborative for more than 55 million people across the globe. For many who haven’t used design tools, starting with a blank page can be intimidating, which is where Canva’s library of more than 500,000 templates comes in. Unfortunately, switching between templates once required retyping your content. To fix this, we created a feature for our users to bring their text with them while exploring the library. The initial challenge was that the template metadata the feature relied on was scarce and costly for our in-house designers to annotate.

    We wanted to predict metadata for our designers inside the Canva editor, but had to consider a number of real-world engineering tradeoffs. First, we’ll explain the user problem and provide a glimpse inside some of our templates and the metadata that enables text transfer. Then, we’ll explain what features we extracted for our scikit-learn random forest classifier and how we combined it with a designer-in-the-loop to bootstrap enough batch-predicted metadata to launch an MVP version of the feature. Finally, we’ll explain how we decided to reimplement model storage and inference in our TypeScript frontend stack. Creating this new feature was a joint effort made possible by a multidisciplinary team of designers, engineers and data scientists. We’re looking forward to sharing some of the lessons we learned along the way to shipping this smart feature.

14:50

    Break / Q&A with Will Radford - 25 mins

15:15
15:45

    Break / Q&A with Xuanyi Chew - 25 mins

16:10
16:40

    Break / Q&A with Mikio Braun - 25 mins

17:05
17:50

    Break / Q&A with Kendra Vant - 25 mins

YOW! Data 2021 Day 2

Thu, May 13
08:45

    Session Overviews and Introductions - 15 mins

09:00
  • Added to My Schedule
    keyboard_arrow_down
    Sid Anand

    Sid Anand - Building & Operating Autonomous Data Streams

    schedule  09:00 - 09:45 AM place Grand Ball Room star_halfRate

    The world we live in today is fed by data. From self-driving cars and route planning to fraud prevention to content and network recommendations to ranking and bidding, the world we live in today not only consumes low-latency data streams, it adapts to changing conditions modeled by that data. 

     

    While the world of software engineering has settled on best practices for developing and managing both stateless service architectures and database systems, the larger world of data infrastructure still presents a greenfield opportunity. To thrive, this field borrows from several disciplines : distributed systems, database systems, operating systems, control systems, and software engineering to name a few. 

     

    Of particular interest to me is the sub field of data streams, specifically regarding how to build high-fidelity nearline data streams as a service within a lean team. To build such systems, human operations is a non-starter. All aspects of operating streaming data pipelines must be automated. Come to this talk to learn how to build such a system soup-to-nuts.

09:45

    Break / Q&A with Sid Anand - 25 mins

10:10
10:40

    Break / Q&A with Nathan Wallace - 25 mins

11:05
  • Added to My Schedule
    keyboard_arrow_down
    Zhamak Dehghani

    Zhamak Dehghani - Data Mesh; A principled introduction

    schedule  11:05 - 11:35 AM place Grand Ball Room 1 star_halfRate

    For over half a century organizations have assumed that data is an asset to collect more of, and data must be centralized to be useful. These assumptions have led to centralized and monolithic architectures, such as data warehousing and data lake, that limit organization to innovate with data at scale.

     
    Data Mesh as an alternative architecture and organizational structure for managing analytical data.
    Its objective is enabling access to high quality data for analytical and machine learning use cases - at scale.
     
    It's an approach that shifts the data culture, technology and architecture
    - from centralized collection and ownership of data to domain-oriented connection and ownership of data
    - from data as an asset to data as a product
    - from proprietary big platforms to an ecosystem of self-serve data infrastructure with open protocols
    - from top-down manual data governance to a federated computational one.
     
    In this talk, Zhamak will introduce the principles underpinning Data Mesh and architecture.
11:35

    Break / Q&A with Zhamak Dehghani - 25 mins

12:00
12:30

    Break / Q&A with Matteo Merli - 25 mins

12:55

    Lunch - 30 mins

13:25
  • Added to My Schedule
    keyboard_arrow_down
    Jesse Anderson

    Jesse Anderson - Foundations of Data Teams

    schedule  01:25 - 01:55 PM place Grand Ball Room 1 star_halfRate

    Successful data projects are built on solid foundations. What happens when we’re misled or unaware of what a solid foundation for data teams means? When a data team is missing or understaffed, the entire project is at risk of failure.

    This talk will cover the importance of a solid foundation and what management should do to fix it. To do this I’ll be sharing a real-life analogy to show how we can be misled and what that means for our success rates.

    We will talk about the teams in data teams: data science, data engineering, and operations. This will include detailing what each is, does, and the unique skills for the team. It will cover what happens when a team is missing and the effect on the other teams.

    The analogy will come from my own experience with a house that had major cracks in the foundation. We were going to simply remodel the kitchen. We weren’t ever told about the cracks and the house needs a completely new foundation. In a similar way, most managers think adding in advanced analytics such as machine learning is a simple addition (remodel the kitchen). However, management isn’t ever told that you need all three data teams to do it right. Instead, management has to go all the way back to the foundation and fix it. If they don’t, the house (team) will crumble underneath the strain.

13:55

    Break / Q&A with Jesse Anderson - 25 mins

14:20
14:50

    Break / Q&A with Caito Scherr - 25 mins

15:15
  • Added to My Schedule
    keyboard_arrow_down
    Rimma Shafikova

    Rimma Shafikova - Analyzing a Terabyte of Game Data

    schedule  03:15 - 03:45 PM place Grand Ball Room 1 star_halfRate

    A couple of terabytes of data is not impressive by today's standards. A hard drive of that capacity costs about a hundred dollars. But things quickly get complicated when one needs to draw insights from a corpus of unstructured game scenarios that are increasing at a rate of a terabyte a year. 

    You will hear some lessons learned by a data scientist wearing an extra hat of data engineer on this fun side project. The talk will cover topics from using Apache Spark distributed computing framework and optimizing Delta tables to making sense of resulted mega-dataset with graph theory and an interactive Streamlit application. 

     
15:45

    Break / Q&A with Rimma Shafikova - 25 mins

16:10
16:40

    Break / Q&A with Simon Aubury - 25 mins

17:05
  • Added to My Schedule
    keyboard_arrow_down
    Kalinda Griffiths

    Kalinda Griffiths - Rights, Sovereignty and Governance in Official Reporting: Considerations in the Use of Aboriginal and Torres Strait Islander data

    schedule  05:05 - 05:35 PM place Grand Ball Room 1 star_halfRate

    The realisation for Indigenous people in Australia to be counted in official statistics occurred in 1967.
    The identification of Indigenous people in Australia in national data highlights a range of historical
    and contemporary issues that require our attention. This includes how Indigenous people have been
    defined and by whom, as well as how identification is operationalised in official data collections.
    Furthermore, the completeness and accuracy of Indigenous people identified in the data and the
    impact this has on the measurement of health and wellbeing must also be taken into account. Official
    national reporting of Indigenous people is calculated using data from censuses, vital statistics, and
    existing administrative data collections and/or surveys. In alignment with human rights standards,
    individuals in Australia can opt to self-identify as ‘Indigenous’ in the data. Australia’s colonial
    context in which Aboriginal and Torres Strait Islander data is derived results in considerations about
    the sovereign rights of Indigenous people globally in the use of data and how this can be actioned
    through data governance processes.

17:35

    Break / Q&A with Kalinda Griffiths - 25 mins

help