filter_list help_outline
  • Hilary Mason
    keyboard_arrow_down

    Hilary Mason - Playing with Words: Building Products with NLP

    Hilary Mason
    Hilary Mason
    Co-Founder
    Hidden Door
    schedule 2 weeks ago
    Sold Out!
    45 Mins
    Keynote
    Intermediate

    Imagine machines that interact with us using the same interface we use to interact with each other — spoken language! Recent progress in NLP has opened up new possibilities for language-based systems. In this talk, we'll explore the recent history of language models and highlight novel applications of statistical and deep learning approaches. Then, we'll explore emerging products that automate, generate, and create using these models, and discuss the implications for building them, including safety, ethics, and the invention of new design metaphors. Finally, we'll speculate about where this might take us in the next few years. Can machines ... play?

     
  • Kendra Vant
    keyboard_arrow_down

    Kendra Vant - Do you want ML with that? When to say yes and why to say no.

    Kendra Vant
    Kendra Vant
    EGM - Data, ML & AI
    Xero
    schedule 3 weeks ago
    Sold Out!
    45 Mins
    Keynote
    Intermediate

    In this talk I'll speak about why you should only use ML when you really need to, some techniques we've used successfully at Xero to help cut through the noise/analysis paralysis, and why it might help to consider approaching the build of an ML inside the system the same way you might decide what car to buy.

  • Sid Anand
    keyboard_arrow_down

    Sid Anand - Building & Operating Autonomous Data Streams

    Sid Anand
    Sid Anand
    Chief Architect
    Datazoom
    schedule 2 months ago
    Sold Out!
    45 Mins
    Keynote
    Advanced

    The world we live in today is fed by data. From self-driving cars and route planning to fraud prevention to content and network recommendations to ranking and bidding, the world we live in today not only consumes low-latency data streams, it adapts to changing conditions modeled by that data. 

     

    While the world of software engineering has settled on best practices for developing and managing both stateless service architectures and database systems, the larger world of data infrastructure still presents a greenfield opportunity. To thrive, this field borrows from several disciplines : distributed systems, database systems, operating systems, control systems, and software engineering to name a few. 

     

    Of particular interest to me is the sub field of data streams, specifically regarding how to build high-fidelity nearline data streams as a service within a lean team. To build such systems, human operations is a non-starter. All aspects of operating streaming data pipelines must be automated. Come to this talk to learn how to build such a system soup-to-nuts.

  • Xuanyi Chew
    keyboard_arrow_down

    Xuanyi Chew - Yepoko Lessons For Machine Learning on Small Data

    Xuanyi Chew
    Xuanyi Chew
    Chief Data Scientist
    Ordermentum
    schedule 1 month ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Let's face it, in most companies, the amount of good data available to perform machine learning is very small. Most data are small data. So how can we do good machine learning on small data?

  • Jesse Anderson
    keyboard_arrow_down

    Jesse Anderson - Foundations of Data Teams

    30 Mins
    Talk
    Intermediate

    Successful data projects are built on solid foundations. What happens when we’re misled or unaware of what a solid foundation for data teams means? When a data team is missing or understaffed, the entire project is at risk of failure.

    This talk will cover the importance of a solid foundation and what management should do to fix it. To do this I’ll be sharing a real-life analogy to show how we can be misled and what that means for our success rates.

    We will talk about the teams in data teams: data science, data engineering, and operations. This will include detailing what each is, does, and the unique skills for the team. It will cover what happens when a team is missing and the effect on the other teams.

    The analogy will come from my own experience with a house that had major cracks in the foundation. We were going to simply remodel the kitchen. We weren’t ever told about the cracks and the house needs a completely new foundation. In a similar way, most managers think adding in advanced analytics such as machine learning is a simple addition (remodel the kitchen). However, management isn’t ever told that you need all three data teams to do it right. Instead, management has to go all the way back to the foundation and fix it. If they don’t, the house (team) will crumble underneath the strain.

  • Caito Scherr
    keyboard_arrow_down

    Caito Scherr - Sweet Streams are Made of These: Data Driven Development for Stream Processing

    Caito Scherr
    Caito Scherr
    Developer Advocate
    Ververica
    schedule 1 month ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    The strength of a powerful stream processing engine is in how fast, and how much data it can process. This naturally adds complexity to existing integration points and can lead to development overhead. Luckily, there is a set of data-driven development principles that are built to alleviate precisely these challenges. This talk will go over what these are and how to apply them at various points throughout the development process, using real-world successes (and failures!) as examples. Although the examples are for highly complex systems, this talk will be beginner-friendly and applicable to non-streaming use cases. 

  • Matteo Merli
    keyboard_arrow_down

    Matteo Merli - Apache Pulsar and the Streaming Ecosystem

    30 Mins
    Talk
    Intermediate

    Apache Pulsar is an open-source distributed pub-sub messaging system, developed under the stewardship of the Apache Software Foundation.

    This talk will show how its unique architecture enables Pulsar to seamlessly support both streaming and messaging use cases in a single unified platform.

    We will also show where Pulsar fits with the broader ecosystem of data streaming technologies and all the interoperability that is available out of the box, making it a particularly good choice for supporting any kind of data platform, where versatility, interoperability and scalability are the key requirements.

  • Kalinda Griffiths
    keyboard_arrow_down

    Kalinda Griffiths - Rights, Sovereignty and Governance in Official Reporting: Considerations in the Use of Aboriginal and Torres Strait Islander data

    Kalinda Griffiths
    Kalinda Griffiths
    Scientia Lecturer
    UNSW SYDNEY
    schedule 2 weeks ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    The realisation for Indigenous people in Australia to be counted in official statistics occurred in 1967.
    The identification of Indigenous people in Australia in national data highlights a range of historical
    and contemporary issues that require our attention. This includes how Indigenous people have been
    defined and by whom, as well as how identification is operationalised in official data collections.
    Furthermore, the completeness and accuracy of Indigenous people identified in the data and the
    impact this has on the measurement of health and wellbeing must also be taken into account. Official
    national reporting of Indigenous people is calculated using data from censuses, vital statistics, and
    existing administrative data collections and/or surveys. In alignment with human rights standards,
    individuals in Australia can opt to self-identify as ‘Indigenous’ in the data. Australia’s colonial
    context in which Aboriginal and Torres Strait Islander data is derived results in considerations about
    the sovereign rights of Indigenous people globally in the use of data and how this can be actioned
    through data governance processes.

  • Jennifer Marsman
    keyboard_arrow_down

    Jennifer Marsman - Using AI to Mine Unstructured Research Papers to Fight COVID-19

    30 Mins
    Talk
    Intermediate

    There is an overwhelming amount of information (and misinformation) about COVID-19. How can we use AI to better understand this disease? In this session, we take an open dataset of research papers on COVID-19 and apply several machine learning techniques (name entity recognition of medical terms, finding semantically similar words, contextual summarization, and knowledge graphs) which can help first responders and medical professionals better find and make sense of the research they need. We will dive into the techniques used and share the code repository, so developers will walk away with the understanding of how to build a similar solution using Cognitive Search.

  • Julie Amundson
    keyboard_arrow_down

    Julie Amundson - Evolving the ML Platform organisation at Netflix: a case study

    30 Mins
    Talk
    Intermediate

    Do you wish there was a Machine Learning model to tell you how to structure your ML teams? So do I! While we're waiting for that, I'll share the story of how the ML Platform organisation evolved at Netflix. Although this story is specific to our own journey to expand Netflix ML investments, there are a few lessons learned along the way that you'll be able to relate to. There are several factors going into org structure that we'll discuss, including: the specialty and skillsets of ML practitioners, the variety and depth of ML use cases, who's responsible for the data, the ownership model as ML projects go to production, and how the underlying Platforms are situated. I look forward to sharing and hearing your own thoughts afterward!

  • Nathan Wallace
    keyboard_arrow_down

    Nathan Wallace - Data Rainbows - select * from cloud;

    Nathan Wallace
    Nathan Wallace
    Founder
    Turbot
    schedule 3 weeks ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Drowning in a lake? Stuck inside a warehouse? See your data in a different light! Postgres Foreign Data Wrappers provide SQL queries to live cloud data - all the structure and much lighter weight. In this session, we'll explore the potential of Data Rainbows for growing cloud environments and outline the challenges of working with data you can see but can't quite touch.

     
  • Will Radford
    keyboard_arrow_down

    Will Radford - Assisting design with machine learning in Canva’s editor

    Will Radford
    Will Radford
    Data Scientist
    Canva
    schedule 1 month ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Our team at Canva focuses on building features that make design simple, enjoyable and collaborative for more than 55 million people across the globe. For many who haven’t used design tools, starting with a blank page can be intimidating, which is where Canva’s library of more than 500,000 templates comes in. Unfortunately, switching between templates once required retyping your content. To fix this, we created a feature for our users to bring their text with them while exploring the library. The initial challenge was that the template metadata the feature relied on was scarce and costly for our in-house designers to annotate.

    We wanted to predict metadata for our designers inside the Canva editor, but had to consider a number of real-world engineering tradeoffs. First, we’ll explain the user problem and provide a glimpse inside some of our templates and the metadata that enables text transfer. Then, we’ll explain what features we extracted for our scikit-learn random forest classifier and how we combined it with a designer-in-the-loop to bootstrap enough batch-predicted metadata to launch an MVP version of the feature. Finally, we’ll explain how we decided to reimplement model storage and inference in our TypeScript frontend stack. Creating this new feature was a joint effort made possible by a multidisciplinary team of designers, engineers and data scientists. We’re looking forward to sharing some of the lessons we learned along the way to shipping this smart feature.

  • Rimma Shafikova
    keyboard_arrow_down

    Rimma Shafikova - Analyzing a Terabyte of Game Data

    Rimma Shafikova
    Rimma Shafikova
    Data Scientist
    VGW
    schedule 1 month ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    A couple of terabytes of data is not impressive by today's standards. A hard drive of that capacity costs about a hundred dollars. But things quickly get complicated when one needs to draw insights from a corpus of unstructured game scenarios that are increasing at a rate of a terabyte a year. 

    You will hear some lessons learned by a data scientist wearing an extra hat of data engineer on this fun side project. The talk will cover topics from using Apache Spark distributed computing framework and optimizing Delta tables to making sense of resulted mega-dataset with graph theory and an interactive Streamlit application. 

     
  • Savin Goyal
    keyboard_arrow_down

    Savin Goyal - Taming the Long Tail of Industrial ML Applications

    30 Mins
    Talk
    Intermediate

    Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery and informing buying decisions to fighting fraud. Our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workloads autonomously without the need to be significantly experienced with systems or data engineering. In this talk, I will discuss some of the challenges involved in improving the development and deployment experience for ML workloads. I will focus on Metaflow, our ML framework, which offers useful abstractions for managing the model’s lifecycle end-to-end, and how a focus on human-centric design positively affects our data scientists' velocity.

  • Mikio Braun
    keyboard_arrow_down

    Mikio Braun - Lessons learned from building ML products

    Mikio Braun
    Mikio Braun
    Independent Consultant
    Own
    schedule 1 month ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Building products based on machine learning requires much more than taking a ML algorithm and deploying it in the cloud. Based on my experience as a researcher, working in ecommerce and independent consultant, I talk about some of the lessons learned what is needed beyond pure ML algorithms to successfully build products with ML. How do you identify customer problems that can be tackled with ML? How does the technology landscape around ML look like? How do you set up teams and organizations to be "AI ready?" I'll be sharing some of my observation and insights.

  • Hien Luu
    keyboard_arrow_down

    Hien Luu - Scaling the Machine Learning Platform at DoorDash

    30 Mins
    Talk
    Intermediate

    DoorDash’s mission is to grow and empower local economies. DoorDash’s business is a 3-sided marketplace composed of Dashers, consumers, and merchants.

    As DoorDash's business grows, it is essential to establish a centralized ML platform to accelerate the ML development process and to power the numerous ML use cases.  We are making good progress, but we are still in the early days of building out our ML platform.

    This presentation will detail the DoorDash ML platform journey that includes the way we establish a close collaboration and relationship with the Data Science community, how we intentionally set the guardrails in the early days to enable us to make progress, the principled approach of building out the ML platform while meeting the needs of the Data Science community, and finally the technology stack and architecture that powers billions of predictions per day and supports a diverse set of ML use cases. They include search ranking, recommendation, fraud detection, food delivery assignment, food delivery arrival time prediction, and more.

  • Simon Aubury
    keyboard_arrow_down

    Simon Aubury - Islands in the Stream - What country music can teach us about event driven systems

    Simon Aubury
    Simon Aubury
    Principal Data Engineer
    ThoughtWorks
    schedule 2 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Event driven systems are all the rage. It's with good reason we're witnessing a transformation with businesses adopting event driven systems. Building systems around an event-driven architecture is powerful pattern for creating awesome data intensive applications.  But before we sail away to another world, let's avoid the common pitfalls of designing & running event driven systems.

    Islands in the Stream - what Kenny Rogers can teach us about event driven systems from the wisdom of a country music classic

  • Zhamak Dehghani
    keyboard_arrow_down

    Zhamak Dehghani - Data Mesh; A principled introduction

    30 Mins
    Talk
    Intermediate

    For over half a century organizations have assumed that data is an asset to collect more of, and data must be centralized to be useful. These assumptions have led to centralized and monolithic architectures, such as data warehousing and data lake, that limit organization to innovate with data at scale.

     
    Data Mesh as an alternative architecture and organizational structure for managing analytical data.
    Its objective is enabling access to high quality data for analytical and machine learning use cases - at scale.
     
    It's an approach that shifts the data culture, technology and architecture
    - from centralized collection and ownership of data to domain-oriented connection and ownership of data
    - from data as an asset to data as a product
    - from proprietary big platforms to an ecosystem of self-serve data infrastructure with open protocols
    - from top-down manual data governance to a federated computational one.
     
    In this talk, Zhamak will introduce the principles underpinning Data Mesh and architecture.
  • No more submissions exist.
Looking for your submitted proposals. Click here.
help