filter_list help_outline
  • Liked Simon Kaplan
    keyboard_arrow_down

    Simon Kaplan - Lessons from Building a Data Platform for Smart Cities

    Simon Kaplan
    Simon Kaplan
    CEO
    [ui!] the urban institute
    schedule 8 months ago
    Sold Out!
    45 Mins
    Invited Talk
    Intermediate

    We've built a data platform for smart cities. This has been deployed in over a dozen cities, and we've learned a lot in the process, about:

    • why data ingestion from IoT networks can range from trivial to very painful, and how to cope;
    • how to architect the system to easily handle many different 'data domains';
    • getting the architecture to work well including making additions of new data sources as simple as we can;
    • approaches to analytics and visualisations that have been useful;
    • why end-user analytics and visualisations are critical;
    • how user permissions for smart city applications can be different to more 'normal' applications.
    • and lots more

    In the talk, I'll walk through the lessons learned and show off examples of the system in action.

    The goal is to use the platform as an exemplar of the design principles, this is not a sales pitch for the tool itself.

  • Liked Juliet Hougland
    keyboard_arrow_down

    Juliet Hougland - How to Experiment Quickly

    Juliet Hougland
    Juliet Hougland
    Data Vagabond
    Bagged & Boosted
    schedule 8 months ago
    Sold Out!
    45 Mins
    Invited Talk
    Intermediate

    The ‘science’ in data science refers to the underlying philosophy that you don’t know what works for your business until you make changes and rigorously measure impact. Rapid experimentation is a fundamental characteristic of high functioning data science teams. They experiment with models, business processes, user interfaces, marketing strategies, and anything else they can get their hands on. In this talk I will discuss what data platform tooling and organizational designs support rapid experimentation in data science teams.

  • Liked Brendan Hosking
    keyboard_arrow_down

    Brendan Hosking - Custom Continuous Deployment to Uncover the Secrets in the Genome

    Brendan Hosking
    Brendan Hosking
    Solutions Engineer
    CSIRO
    schedule 8 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Reading the genome to search for the cause of a disease has improved the lives of many children enrolled in clinical trials. However, to convert research into clinical practice requires the ability to query large volumes of data and find the needle in the haystack efficiently. This is hampered by traditional server- and database-based approaches being too expensive and unable to scale with accumulating medical information.

    We hence developed a serverless approach to exchange human genomic information between organisations. The framework was architected to provide instantaneous analysis of non-local data on demand, with zero downtime costs and minimal running costs.

    We used Terraform to write the infrastructure, enabling rapid iteration and version control at the architecture level. In order to maintain governance over our infrastructure created in this way, we developed a custom Continuous Deployment service that built and securely maintained each project, providing visibility and security over the entire organisation’s cloud infrastructure.

  • Liked Mat Kelcey
    keyboard_arrow_down

    Mat Kelcey - Practical Learning To Learn

    30 Mins
    Talk
    Advanced

    Gradient descent continues to be our main work horse for training neural networks. One recurring problem though is the large amount of data required. Meta learning frames the problem not as learning from a single large dataset, but learning how to learn from multiple related smaller datasets. In this talk we'll first discuss some key concepts around gradient descent; fine-tuning, transfer learning, joint training and catastrophic forgetting and compare them to how simple meta learning techniques can make optimisation feasible for much smaller datasets.

  • Liked Xuanyi Chew
    keyboard_arrow_down

    Xuanyi Chew - Making The Black Box Transparent: Lessons in Opacity

    Xuanyi Chew
    Xuanyi Chew
    Chief Data Scientist
    Ordermentum
    schedule 7 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Deep Learning is all the rage now. It is powerful, it is cheap. To proponents of "explainable" machine learning however, this is not really good news - deep learning is essentially a black box that one can't look into.

    To be sure, there are efforts to peek inside the black box to see what it's learning - saliency maps and various visualization tools are useful to understand what is going on in deep learning neural networks. The question of course, is whether it's worth it?

    In this talk I shall cover the basics of looking into a deep neural network and share a different approach of looking into neural networks.

  • Liked Paris Buttfield-Addison
    keyboard_arrow_down

    Paris Buttfield-Addison / Mars Geldard - Game Engines and Machine Learning: Training a Self-Driving Car Without a Car?

    30 Mins
    Talk
    Advanced

    Are you a scientist who wants to test a research problem without building costly and complicated real-world rigs? A self-driving car engineer who wants to test their AI logic in a constrained virtual world? A data scientist who needs to solve a thorny real-world problem without touching a production environment? Have you considered AI problem solving using game engines?

    No? This session will teach you how to solve AI and ML problems using the Unity game engine, and Google’s TensorFlow for Python, as well as other popular ML tools.

    In this session, we’ll show you ML and AI problem solving with game engines. Learn how you could use a game engine to train, explore, and manipulate intelligence agents that learn.

    Game engines are a great place to explore ML and AI. They’re wonderful constrained problem spaces, tiny little ecosystems for you to explore a problem in. Here you can learn how to use them even though you’re not a game developer, with no game development experience required!

    In this session, we’ll look at:

    • how video game engines are a perfect environment to constrain a problem and train an agent
    • how easy it is to get started, using the Unity engine and Google’s TensorFlow for Python
    • how to build up a model, and use it in the engine, to explore a particular idea or problem
    • PPO (proximal policy optimisation) for generic but useful machine learning
    • deep reinforcement learning, and how it lets you explore and study complex behaviours

    Specifically, this session will:

    • teach the very basics of the Unity game engine
    • explore how to setup a scene in Unity for both training and use of a ML model
    • show how to train a model, using TensorFlow (and Docker), using the Unity scene
    • discuss the use of the trained model, and potential applications
    • show you how to train AI agents in complicated scenarios and make the real world better by leveraging the virtual

    We’ll explore fun, engaging scenarios, including virtual self-driving cars, bipedal human-like walking robots, and disembodied hands that can play tennis.

    This session is for non-game developers to learn how they can use game technologies to further their understanding of machine learning fundamentals, and solve problems using a combination of open source tools and (sadly often not open source) game engines. Deep reinforcement learning using virtual environments is the beginning of an exciting new wave of AI.

    It’s a bit technical, a bit creative.

  • Liked Simon Belak
    keyboard_arrow_down

    Simon Belak - Sketch algorithms

    Simon Belak
    Simon Belak
    Mad Scientist
    Metabase
    schedule 8 months ago
    Sold Out!
    30 Mins
    Talk
    Advanced

    In this talk we will look at how to efficiently (in both space and time) summarize large, potentially unbounded, streams of data by approximating the underlying distribution using so-called sketch algorithms. The main approach we are going to be looking at is summarization via histograms. Histograms have a number of desirable properties: they work well in an on-line setting, are embarrassingly parallel, and are space-bound. Not to mention they capture the entire (empirical) distribution which is something that otherwise often gets lost when doing descriptive statistics. Building from that we will delve into related problems of sampling in a stream setting, and updating in a batch setting; and highlight some cool tricks such as capturing time-dynamics via data snapshotting. To finish off we will touch upon algorithms to summarize categorical data, most notably count-min sketch.

  • Liked Noon van der Silk
    keyboard_arrow_down

    Noon van der Silk - How Much Data do you _really_ need for Deep Learning?

    Noon van der Silk
    Noon van der Silk
    Director
    Braneshop
    schedule 8 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    A common assumption is that we need significant amounts of data in order to do deep learning. Many companies wanting to adopt AI find themselves stuck in the “data gathering” phase and as a result delaying the use of AI to gain competitive advantage in their business. But how much data is enough? Can we get by with less?

    In this talk we will explore the impact on our results when we use different amounts of data to train a classification model. It is actually possible to get by with much less data than we might expect. We will discuss why this might be so, in which particular areas this applies, and how we can use these ideas to improve how we train, deploy and engage end-users in our models.

  • Liked Simon T. O'Callaghan
    keyboard_arrow_down

    Simon T. O'Callaghan / Alistair Reid / Finn Lattimore - Engineering an Ethical AI System

    30 Mins
    Talk
    Intermediate

    To improve people’s well-being, we must improve the decisions made about them. Consequential decisions are increasingly being made by AI, like selecting who to recruit, who receives a home-loan or credit card, and how much someone pays for goods or services. AI systems have the potential to to make these decisions more accurately and at a far greater scale than humans. However, if AI decision-making is improperly designed it runs the risk of doing unintentional harm, especially to already disadvantaged members of society. Only by building AI systems that accurately estimate the real impact of possible outcomes on a variety of ethically relevant measures, rather than just accuracy or profit, can we ensure this powerful technology improves the lives of everyone.

    This talk focuses on the anatomy of these ethically-aware decision-making systems, and some design principles to help the data scientists, engineers and decision-makers collaborating to build them. We motivate the discussion with a high-level simulation of the "selection" problem where individuals are targeted, based on relevant features, for an opportunity or an intervention. We detail the necessary considerations and the potential pitfalls when engineering an ethically-aware automated solution, from initial conception through to causal analysis, deployment and on-going monitoring.

  • Liked Huon Wilson
    keyboard_arrow_down

    Huon Wilson - Entity Resolution at Scale

    Huon Wilson
    Huon Wilson
    Sr. Software Engineer
    CSIRO's Data61
    schedule 8 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Real world data is rarely clean: there are often corrupted and duplicate records, and even corrupted records that are duplicates! One step in data cleaning is entity resolution: connecting all of the duplicate records into the single underlying entity that they represent.

    This talk will describe how we approach entity resolution, and look at some of the challenges, solutions and lessons learnt when doing entity resolution on top of Apache Spark, and scaling it to process billions of records.

  • Liked Dana Ma
    keyboard_arrow_down

    Dana Ma - Building Rome Every Day - Scaling ML Model Building Infrastructure

    Dana Ma
    Dana Ma
    Sr. Software Engineer
    Zendesk
    schedule 8 months ago
    Sold Out!
    30 Mins
    Case Study
    Intermediate

    "I want to reset my password". "I ordered the wrong size". "These are not the droids I was looking for". Every day, a support agent fields thousands of these queries. Multiply that by the thousands of agents a company might have, and the sheer vastness of data being generated becomes hard to imagine. How can we make sense of it all? It seems a formidable task, but we have a formidable weapon in our arsenalwe have machine learning.

    By combining deep learning, natural language processing and clustering techniques, we built a machine learning model that can take 100,000 tickets and efficiently cluster and summarise them into digestible topics. But that's only part of the challenge; we also had to scale it to build for 30,000 customers, in production, every day.

    In this talk I'll share the story of Content Cues - Zendesk's latest Machine Learning product. It's the story of how we leveraged the power of AWS Batch to scale a model building platform. Of how we tackled challenges such as measuring how well an unsupervised model performs when it's not even clear what "well" means. Of how our team combined our pool of skills across data engineering, data science and product management to deliver a pipeline capable of building a thousand models for the price of a cup of coffee.

  • Liked Suneeta Mall
    keyboard_arrow_down

    Suneeta Mall - The Three-Rs of Data-Science - Repeatability, Reproducibility, and Replicability

    Suneeta Mall
    Suneeta Mall
    Sr. Data Scientist
    Nearmap
    schedule 8 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Adaptation of data-science in industry has been phenomenal in the last 5 years. Primary focus of these adaptations has been about combining the three dimensions of machine-learning i.e. the ‘data’, the ‘model architecture’ and the ‘parameters’ to predict an outcome. Slight change in any of these dimensions has potential to skew the predicted outcomes. So how do we build trust with our models? And how do we manage the variances across multiple models trained on varied set of data, model-architectures and parameters? Why the three Rs i.e. “Repeatability, Reproducibility, and Replicability” may have a relevance in industry application of data-science?

    This talk has following goals:

    • Justify (with demonstrations) as to why “Repeatability, Reproducibility, and Replicability” is important in data-science even if the application is beyond experimental research and is geared towards industry applications.
    • Discuss in detail the requirements around ensuring “Repeatability, Reproducibility, and Replicability” in data-science.
    • Discuss ways to observe repeatability, reproducibility, and replicability with provenance and automated model management.
    • Present various approaches and available tooling pertaining to provenance and model managements and compare and contrast them.
  • Liked Lex Toumbourou
    keyboard_arrow_down

    Lex Toumbourou - Emerging Best Practices for Machine Learning Engineering

    Lex Toumbourou
    Lex Toumbourou
    Sr. Consultant
    ThoughtWorks
    schedule 8 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    In this talk, I'lll walk through some of the emerging best practices for Machine Learning engineering and contrast them to those of traditional software development. I will be covering topics including Product Management; Research and Development; Deployment; QA and Lifecycle Management of Machine Learning projects.

  • Liked Pantelis Elinas
    keyboard_arrow_down

    Pantelis Elinas - Practical Geometric Deep Learning in Python

    30 Mins
    Talk
    Intermediate

    Geometric Deep Learning (GDL) is a fast developing machine learning specialisation that uses the network structure underlying the data to improve learning outcomes. GDL has been successfully applied to problems in various domains with network-structured data, such as social science, medicine, media, finance, etc.

    Inspired by the success of neural networks in domains such as computer vision and natural language processing, the core component driving GDL is the graph convolution operator. This operator is used as the building block for deep learning models applied to networks. This approach takes advantage of many algorithmic and computational developments from modern neural network research and practice – such as composability, optimisation, and end-to-end training – to improve predictive performance.

    However, there is a lack of tools for geometric deep learning targeting data scientists and machine learning practitioners.

    In response, CSIRO’s Data61 has developed StellarGraph, an open source Python library. StellarGraph implements a number of state-of-the-art methods for GDL with a clean and consistent API. Furthermore, StellarGraph is designed to make the application of GDL algorithms to network-structured data easy to integrate with existing machine learning workflows.

    In this talk, we will start with an overview of GDL and its real-world applications. Then we will introduce StellarGraph with a focus on its design philosophy, API and analytics workflow. Finally, we will demonstrate StellarGraph’s flexibility and ease-of-use for developing solutions targeting important applications such as product recommendation and social network moderation. Lastly, we will touch on the challenges of designing and implementing a library for a fast evolving machine learning field.

  • Liked Roger Qiu
    keyboard_arrow_down

    Roger Qiu - Image Classification in a Noisy Fraudulent World - A Journey of Computational and Statistical Performance

    30 Mins
    Talk
    Intermediate

    Formbay's fraud detection system relies on classification of photographic evidence to verify solar installations. Over the last 10 years, Formbay has amassed over 10 million labelled images of solar installations. Image classification over Formbay's dataset sounds easy. Lots of data, apply neural networks and profit from automation! However with such a large dataset, there is room for lots of noise. Noise such as mislabelled images, overlapping classes, corrupted image data, imbalanced classes, rotational variance and more.

    This presentation demonstrates how we built our Image Processing pipeline tackling these noise issues while addressing class/concept drift. First we'll examine the data-situation of Formbay when we started and our initial model. Then we'll address each statistical and computational problem we met and how we decided to address them, slowly evolving our data pipeline over time.

    This presentation focuses on the complexities of engineering production ready ML systems which involve balancing between statistical ("how accurate") and computational performance ("how fast").

  • Liked Kevin Jung
    keyboard_arrow_down

    Kevin Jung - Bitcoin Ransomware Detection with Scalable Graph Machine Learning

    Kevin Jung
    Kevin Jung
    Software Engineer
    CSIRO's Data61
    schedule 8 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    Ransomware is a type of malware that has become a major threat, rising to 600 million attacks per year, and this cyber-crime is very often facilitated via cryptocurrency. While ransomware relies on pseudonymity to send and receive payments that are difficult to trace, the fact that all transactions on the bitcoin blockchain are written publicly presents an opportunity to develop an analytics pipeline to detect such activities.

    Graph Machine Learning is a rapidly developing research area which combines entity attributes and network structure to improve machine learning outcomes. These techniques are becoming increasingly popular, often outperforming traditional approaches when the underlying data can be naturally represented as a graph.

    This talk will highlight two main outcomes: 1) how a graph machine learning pipeline is formulated to detect bitcoin addresses that are suspected to be associated with ransomware, and 2) how this algorithm is scaled out to process over 1 billion transactions using Apache Spark.

  • Liked Hercules Konstantopoulos
    keyboard_arrow_down

    Hercules Konstantopoulos - Is Agile Data Science a thing now?

    30 Mins
    Talk
    Intermediate

    How come there’s no standard text on how to operate a Data Science team? At its current scale this is a young practice without a widely accepted mode of operation. Because so many practitioners are housed in technology shops, we tend to align our delivery cycles with developers… and hence with the Agile framework.

    I will argue that if a data team fits within Agile it is probably not performing data science but operational analytics—a separate and venerable practice, and a requisite for data science. To ‘do’ science we need a fair bit of leeway, although not a complete lack of boundaries. It’s a tricky balance.

    In this talk I will share my experience as a data scientist in a variety of circumstances: in foundational, service, and advisory roles. I will also bring some parallels from my past life in scientific research to discuss how I think data science should be performed at scale. And I will share my current Agile-ish process at Atlassian.

  • Liked Ananth Gundabattula
    keyboard_arrow_down

    Ananth Gundabattula - Auto feature engineering - Rapid feature harvesting using DFS and data engineering techniques

    30 Mins
    Talk
    Intermediate

    As machine learning adoption permeates across many business models, so is the need to deliver models at a much faster rate. Feature engineering arguably is one of the core foundations of model development cycle. While approaches like deep learning tend to take a different approach to feature engineering, it might not be exaggerating to say that feature engineering is the core construct which can make or break a classical machine learning model. Automating feature engineering would immensely shorten the time to market classical machine learning models.

    Deep Feature Synthesis (DFS) is an algorithm that is implemented in the FeatureTools python package. DFS helps in rapid harvesting of new features by taking a stacking approach on top of a relational data model. DFS also has first class support for time dimensions as a fundamental construct. Some of these factors make the feature tools package a compelling tool/library for data practitioners. However the base algorithm itself can be enriched in multiple ways to make it truly appealing for many other use cases. This session will present a high level summary of DFS algorithmic constructs followed by enhancements that can be done on featuretools library to enable it for many other use cases

  • Liked Yanir Seroussi
    keyboard_arrow_down

    Yanir Seroussi - Bootstrapping the Right Way

    30 Mins
    Talk
    Intermediate

    Bootstrap sampling is being touted as a simple technique that any hacker can easily employ to quantify the uncertainty of statistical estimates. However, despite its apparent simplicity, there are many ways to misuse bootstrapping and thereby draw wrong conclusions about your data and the world. This talk gives a brief overview of bootstrap sampling and discusses ways to avoid common pitfalls when bootstrapping your data.

  • Liked Simon Aubury
    keyboard_arrow_down

    Simon Aubury - Which Plane Woke Snowy the Cat?

    Simon Aubury
    Simon Aubury
    Data Engineer Architect
    IAG
    schedule 8 months ago
    Sold Out!
    30 Mins
    Case Study
    Intermediate

    Our new cat, Snowy, is waking early. She is startled by the noise of jets flying over our house.

    This talk describes how common radio receivers can be configured to gather aircraft transponder signals. With an opensource data streaming framework (Apache Kafka) we can build a streaming data pipeline to rapidly process aircraft movements in real-time.

    With data and spatial visualisations we can really determine which plane woke Snowy the cat.

Looking for your submitted proposals. Click here.