filter_list help_outline
  • Liked Dean Wampler
    keyboard_arrow_down

    Dean Wampler - Stream All the Things!!

    50 Mins
    Keynote
    Intermediate

    Streaming data architectures aren't just "faster" Big Data architectures. They must be reliable and scalable as never before, more like microservice architectures.

    This talk has three goals:

    1. Justify the transition from batch-oriented big data to stream-oriented fast data.
    2. Explain the requirements that streaming architectures must meet and the tools and techniques used to meet them.
    3. Discuss the ways that fast data and microservice architectures are converging.

    Big data started with an emphasis on batch-oriented architectures, where data is captured in large, scalable stores, then processed using batch jobs. To reduce the gap between data arrival and information extraction, these architectures are now evolving to be stream oriented, where data is processed as it arrives. Fast data is the new buzz word.

    These architectures introduce new challenges for developers. Whereas a batch job might run for hours, a stream processing system typically runs for weeks or months, which raises the bar for making these systems reliable and scalable to handle any contingency.

    The microservice world has faced this challenge for a while. Microservices are inherently message driven, responding to requests for service and sending messages to other microservices, in turn. Hence, they are also stream oriented, in the sense that they must respond reliably to never-ending input. So, they offer guidance for how to build reliable streaming data systems. I'll discuss how these architectures are merging in other ways, too.

    We'll also discuss how to pick streaming technologies based on four axes of concern:

    • Low latency: What's my time budget for handling this data?
    • High volume: How much data per unit time must I handle?
    • Data processing: Do I need machine learning, SQL queries, conventional ETL processing, etc.?
    • Integration with other tools: Which ones and how is data exchanged between them?

    We'll consider specific examples of streaming tools and how they fit on these axes, including Spark, Flink, Akka Streams, and Kafka.

  • Liked Andrea Burbank
    keyboard_arrow_down

    Andrea Burbank - Building a culture of experimentation at Pinterest

    Andrea Burbank
    Andrea Burbank
    Data Scientist
    Pinterest
    schedule 4 months ago
    Sold Out!
    50 Mins
    Keynote
    Intermediate

    A successful experimentation program consists of much more than mere randomization and measurement. How do you help stakeholders understand the right things to measure, avoid common pitfalls, and learn to rely on A/B tests as the best way to measure a new system or feature? Building a culture of experimentation and the right tools to support it is just as important as the statistics behind the comparisons themselves - and potentially much trickier to get right.

  • Liked Linda McIver
    keyboard_arrow_down

    Linda McIver - Kids can change the world with data

    45 Mins
    Invited Talk
    Intermediate

    From researchers perpetrating unspeakable acts with their datasets, to students trying to communicate experimental results with pie charts that don’t sum to 100, we have all seen data horror stories. But in my classes kids have worked with their communities to create real change using data and computation. We can band together to give all kids the opportunity to change the world. This is the story of kids. Of data. Of Computation. And of change. It’s the story of the Australian Data Science Education Institute.

  • Liked Noon van der Silk
    keyboard_arrow_down

    Noon van der Silk - Deep Learning Workshop

    Noon van der Silk
    Noon van der Silk
    AI Engineer
    Silverpond
    schedule 3 months ago
    Sold Out!
    960 Mins
    Workshop
    Intermediate

    Venture into deep learning with this 2-day workshop that will take you from the mathematical and theoretical foundations to building models and neural networks in TensorFlow. You will apply as you learn, working on exercises throughout the workshop. To enhance learning, a second day is dedicated to applying your new skills in team project work.

    This hands-on workshop is ideal for both data science and programming professionals, who are interested in learning the basics of deep learning and embarking on their first project.

  • Liked Dr Eugene Dubossarsky
    keyboard_arrow_down

    Dr Eugene Dubossarsky - The Zen of Data Science

    45 Mins
    Invited Talk
    Intermediate
    What makes data science such a different field ? Why is it such a challenge to structure, manage and capture the value of data science ?
    This presentation will focus on the key issues around the practice of discovery, (“science”) and how it differs from the practice of building new things (“engineering”).
    Other questions addressed will include:
    How do organisations manage and leverage the value of data science ? What are the key “unknown unknowns” that managers miss so often, with disastrous results ?
    What are the cultural, procedural, managerial differences between “scientists” and engineers in a modern, data-driven workplace ?
  • Liked Dean Wampler
    keyboard_arrow_down

    Dean Wampler - Hands-on Kafka Streaming Microservices with Akka Streams and Kafka Streams

    300 Mins
    Workshop
    Intermediate

    If you're building streaming data apps, your first inclination might be to reach for Spark Streaming, Flink, Apex, or similar tools, which run as services to which you submit jobs for execution. But sometimes, writing conventional microservices, with embedded stream processing, is a better fit for your needs.

    In this hands-on tutorial, we start with the premise that Kafka is the ideal backplane for reliable capture and organization of data streams for downstream consumption. Then, we build several applications using Akka Streams and Kafka Streams on top of Kafka. The goal is to understand the relative strengths and weaknesses of these toolkits for building Kafka-based streaming applications. We'll also compare and contrast them to systems like Spark Streaming and Flink, to understand when those tools are better choices. Briefly, Akka Streams and Kafka Streams are best for data-centric microservices, where maximum flexibility is required for running the applications and interoperating with other systems, while systems like Spark Streaming and Flink are best for richer analytics over large streams where horizontal scalability through "automatic" partitioning of the data is required.

    Each engine has particular strengths that we'll demonstrate:

    • Kafka Streams is purpose built for reading data from Kafka topics, processing it, and writing the results to new topics. With powerful stream and table abstractions, and an "exactly-once" capability, it supports a variety of common scenarios involving transformation, filtering, and aggregation.
    • Akka Streams emerged as a dataflow-centric abstraction for the general-purpose Akka Actors model, designed for general-purpose microservices, especially when _per-event_ low-latency is important, such as for complex event processing, where each event requires individual handling. In contrast, many other systems are efficient at scale, when the overhead is amortized over sets of records or when processing "in bulk". Also because of its general-purpose nature, Akka Streams supports a wider class of application problems and third-party integrations, but it's less focused on Kafka-specific capabilities.

    Kafka Streams and Akka Streams are both libraries that you integrate into your microservices, which means you must manage their lifecycles yourself, but you also get lots of flexibility to do this as you see fit.

    In contrast, Spark Streaming and Flink run their own services. You write "jobs" or use interactive shells that tell these services what computations to do over data sources and where to send results. Spark and Flink then determine what processes to run in your cluster to implement the dataflows. Hence, there is less of a DevOps burden to bear, but also less flexibility when you might need it. Both systems are also more focused on data analytics problems, with various levels of support for SQL over streams, machine learning model training and scoring, etc.

    For the tutorial, you'll be given an execution environment and the code examples in a GitHub repo. We'll experiment with the examples together, interspersed with short presentations, to understand their strengths, weaknesses, performance characteristics, and lifecycle management requirements.

  • Liked Sachin Abeywardana
    keyboard_arrow_down

    Sachin Abeywardana - Trump Tweets (Fun with Deep Learning)

    Sachin Abeywardana
    Sachin Abeywardana
    Data Scientist
    DeepSchool.io
    schedule 3 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    In this talk I will introduce LSTMs. This is how deep learning deals with time series data. The dataset we will be focusing on is Trump's tweets. Using this data we will make a tweet generator in which we will train how to simulate Trump style tweets one character at a time.

    We will be using Keras to generate the Deep Learning model. Google colab will be used so that all attendees will be able to use a GPU.

  • Liked Cameron Joannidis
    keyboard_arrow_down

    Cameron Joannidis - Machine Learning Systems for Engineers

    30 Mins
    Talk
    Intermediate

    Machine Learning is often discussed in the context of data science, but little attention is given to the complexities of engineering production ready ML systems. This talk will explore some of the important challenges and provide advice on solutions to these problems.

  • Liked Atif Rahman
    keyboard_arrow_down

    Atif Rahman - Privacy Preserved Data Augmentation using Enterprise Data Fabric

    Atif Rahman
    Atif Rahman
    Systems Design Lead
    Zetaris
    schedule 2 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate
    Enterprises hold data that has potential value outside their own firewalls. We have been trying to figure out how to share such data at a level of detail with others in a secure, safe, legal and risk mitigated manner that ensure high level of privacy while adding tangible economic and social value. Enterprises are facing numerous roadblocks, failed projects, inadequate business cases, and issues of scale that needs newer techniques, technology and approach.
    In this talk, we will be setup the groundwork for scalable data augmentation for organisations and visualising technical architectures and solutions around emerging technologies of data fabrics, edge computing and a second coming of data virtualisation.
    A self-assessment toolkit will be shared for people interested to apply it to their organisations.
  • Liked Hercules Konstantopoulos
    keyboard_arrow_down

    Hercules Konstantopoulos - The catastrophic consequences of not being awesome at plots

    30 Mins
    Talk
    Intermediate

    “Hey boss, we made a totally rad deep learning algorithm! It trawls through the internet and literally tells the future!”

    “Yeah, cool. But do you have a plot I can show the board?”

    Data visualisation is the capstone of data science. As businesses collate massive and disparate data streams, and as algorithms become more complex, communicating results has become more important and more challenging than even before. We need to to start placing as much importance on accessible visualisation as we do on database architecture or algorithm design.

    In this talk I will present some core visualisation principles that I have developed over 15 years of experience visualising (literally) astronomical datasets, carbon emissions, behavioural analytics, even the odd basketball game. These can be employed by any data scientist to help their data tell the right story to the right people.

  • Liked Tash Keuneman
    keyboard_arrow_down

    Tash Keuneman - The 80/20 of UX research: What qualitative data to use and when

    Tash Keuneman
    Tash Keuneman
    UX Designer
    Data61
    schedule 3 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    We'll talk about the one qualitative process that you should use 80% of the time. Learn how to translate your customer research into meaningful numbers that you can make product decisions on.

    How many customers do you have to talk to find 85% of the usability problems? How do you avoid the most common pitfalls that result in bad data? We'll go through that as well as the best practices you can use to confidentially calculate the extent of the problems you've found.

    You'll walk out of the conference room with UX skills that you can put into practice right away and a gift bag to boot.

  • Liked Fiona Tweedie
    keyboard_arrow_down

    Fiona Tweedie - On the quest for advanced analytics: governance and the Internet of Things

    30 Mins
    Case Study
    Intermediate

    Data scientists dream of crystal clear data lakes and perfectly ordered warehouses with comprehensive dictionaries, consistent formats and never a null value or encoding error to mar their analysis. The reality, however, is that the bulk of time on most data projects is spent sourcing and munging data before the exploration and analysis can begin. Governance is often presented as the solution to all data woes but all too often generates more meetings than results.

    The University of Melbourne is home to 8000 staff and 48000 students across seven campuses. Both researchers and professional staff recognise that data is going to be key to understanding this complex community and supporting its members. Sensor data collected from around the campuses promises the opportunity to analyse everything from demands on public transport to the impact of weather on coffee consumption. With researchers spread across ten faculties, there is a danger that multiple projects will collect fragmented data and the real power that comes from joining multiple datasets will never be realised. Conversely, overly prescriptive policies will date quickly and hamper innovation. Is it possible to satisfy both the desire to move rapidly to take advantage of new opportunities and the need to maintain data quality?

    This case study will present some of the IoT projects currently being explored at the University and examine the governance efforts that are being trialled to ensure the adoption of standards and future interoperability of devices and data.

  • 30 Mins
    Talk
    Intermediate

    Bees are dying – in recent years an unprecedented decline in honey bee colonies has been seen around the globe. The causes are still largely unknown. At CSIRO, the Global Initiative for Honey bee Health (GIHH) is an international collaboration of researchers, beekeepers, farmers, and industry set up to research the threats to bee health in order to better understand colony collapse and find solutions that will allow for sustainable crop pollination and food security. Integral to the research effort is RFID tags that are manually fitted to bees. The abundance of data being collected by the thousands of bee-attached sensors as well as additional environmental sensors poses several challenges regarding the interpretation and comprehension of the data, both computationally as well as from a user perspective. In this talk, I will discuss visual analytics techniques that we have been investigating to facilitate an effective path from data to insight. I will particularly focus on interactive and immersive user interfaces that allow for a range of end users to effectively explore the complex sensor data.

  • Liked Rohan Dhupelia
    keyboard_arrow_down

    Rohan Dhupelia - A Retrospective on Building and Running Atlassian’s Data Lake

    30 Mins
    Case Study
    Intermediate

    Atlassian’s strives to be a data driven company that builds collaborative software for teams. Two and a half years ago we launched our analytics platform (i.e. data lake) on AWS which is used by over 1500 internal users each month to gain insights and enforce decisions.

    In this talk we will present a retrospective on our analytics platform covering our blessings (what went well) and our mistakes (what could have been better) as well as talking about what potential next steps we might take to further improve the platform.

    This talk will cover both the technical aspect (i.e. architectural choices) as well as non-technical aspect (i.e. team organisation, our principals and mandate).

  • 30 Mins
    Talk
    Intermediate

    You must have heard it a few times that AI has beaten human in image recognition. Is that true? Have you seen it yourself? I am going to demonstrate Cyclops, an image recognition we built to recognise car model far better than any human.

    From here on, this talk will take you through our journey, how it's all began, why we built the early version of Cyclops and what was the outcome. Furthermore, how we used this technology to dramatically improve consumer experience and built many consumers facing products which we thought was not possible before.

    I will then dive down into technical details, starting from how we built Cyclops 1.0 with Tensorflow and how we overcame the training complexity with transfer learning. However, transfer learning comes with a limitation of directional invariance in which I will show what it is and how we overcame it with our novel solution.

    Next, I will show you that building a car recognition as complex as Cyclops 2.0 requires a more superior model and modification of our existing transfer learning technique. I will also take you to see problems we faced with low coverage when we are going deeper and how we solved them. I will then investigate how a distributed training can speed up the training process to make this practical.

  • Liked Gareth Jones
    keyboard_arrow_down

    Gareth Jones - Using Sentiment Analysis To Fill In The Gaps From User Surveys

    Gareth Jones
    Gareth Jones
    Consultant
    Shine Solutions
    schedule 6 months ago
    Sold Out!
    30 Mins
    Case Study
    Intermediate

    We put a year's worth of online help chat logs from a major Australian Superannuation website through Google's Natural Language API, to see what insights we could gain from the users. This talk will discuss how the Natural Language API works, and the underlying machine learning concepts, and also give you some ideas on how to make use of the information based on examples from our work. We'll compare the sentiment values with those expressed in exit surveys and find out how useful an indicator it can be.

  • Liked Aidan O'Brien
    keyboard_arrow_down

    Aidan O'Brien - DevOps 2.0: Evidence-based evolution of serverless architecture through automatic evaluation of “infrastructure as code” deployments

    Aidan O'Brien
    Aidan O'Brien
    PhD student
    CSIRO
    schedule 3 months ago
    Sold Out!
    30 Mins
    Talk
    Intermediate

    The scientific approach teaches us to formulate hypotheses and test them experimentally in order to advance systematically. DevOps and software architecture in particular, do not traditionally follow this approach. Here decisions like “scaling up to more machines or simply employing a batch queue” or “using Apache Spark or sticking to a job scheduler across multiple machines” are worked out theoretically rather than implemented and tested objectively. Furthermore, the paucity of knowledge in unestablished systems like serverless cloud architecture hampers the theoretical approach.

    We therefore partnered with James Lewis and Kief Morris to establish a fundamentally different approach for serverless architecture design that is based on scientific principles. For this, the serverless architecture stack needs to firstly be fully defined through code/text, e.g. AWS CloudFormation, so that it can easily and consistently be deployed. This “architecture as text”-base can then be modified and re-deployed to systematically test hypotheses, e.g. is an algorithm faster or a particular autoscaling group more efficient. The second key element to this novel way of evolving architecture is the automatic evaluation of any newly deployed architecture without manually recording runtime or defining interactions between services, e.g. Epsagon’s monitoring solution.

    Here we describe the two key aspects in detail and showcase the benefits by describing how we improved runtime by 80% for the bioinformatics software framework GT-Scan, which is used by Australia’s premier research organization to conduct medical research.

  • Liked Dana Bradford
    keyboard_arrow_down

    Dana Bradford - How to Save a Life: Could Real-Time Sensor Data Have Saved Mrs Elle?

    Dana Bradford
    Dana Bradford
    Sr. Research Scientist
    CSIRO
    schedule 3 months ago
    Sold Out!
    30 Mins
    Case Study
    Intermediate

    This is the story of Mrs Elle*, a participant in a smart home pilot study. The pilot study was aimed to test the efficacy of sensors to capture in-home activity data including meal preparation, attention to hygiene and movement around the house. The in-home monitoring and response service associated with the sensors had not been implemented, and as such, data was not analyzed in real time. Sadly, Mrs Elle suffered a massive stroke one night, and was found some time after. She later died in hospital without regaining consciousness. This paper looks at the data leading up to Mrs Elle’s stroke, to see if there were any clues that a neurological insult was imminent. We were most interested to know, had we been monitoring in real time, could the sensors have told us how to save a life?

    *pseudonym

  • Liked Elaina Hyde
    keyboard_arrow_down

    Elaina Hyde - What happens when Galactic Evolution and Data Science collide?

    Elaina Hyde
    Elaina Hyde
    Consultant
    Servian
    schedule 3 months ago
    Sold Out!
    30 Mins
    Case Study
    Intermediate

    This talk will cover a short trip around our Milky Way Galaxy and a discussion of how data science can be used to detect faint and sparse objects such as the dwarf satellites and streams that helped form the galaxy we live in. The data science applications and algorithms used determine the accuracy with which we can make detections of these mysterious bodies and with the advent of greater cloud computing capability the sky is no longer the limit when it comes to programming or Astronomy

  • Liked Tomasz Bednarz
    keyboard_arrow_down

    Tomasz Bednarz - Visual Analytics on Steroids: High Performance Visualisation, Simulations and AI

    30 Mins
    Talk
    Intermediate

    In the time that someone takes to read this abstract, another could solve a detective puzzle if only they had enough quantitative evidence on which to prove their suspicions. But also, one could use visualisation and computational tools like a microscope, to seek a new cure for cancer or predict hospitalisation prevention. In this presentation, we will demonstrate new visual analytics techniques that use various mixed reality approaches that link simulations with collaborative, complex and interactive data exploration, placing the human-in-the-loop. In the recent days, thanks to advances in graphics hardware and compute power (especially GPGPU and modern Big Data / HPC infrastructures), the opportunities are immense, especially in improving our understanding of complex models that represent the real- or hybrid-worlds. Use cases presented will be drawn from ongoing research at CSIRO, and Expanded Perception and Interaction Centre (EPICentre) using world class GPU clusters and visualisation capabilities.

Looking for your submitted proposals. Click here.