How to train your dragon - Reinforcement learning from scratch

Reinforcement learning helped Google's "AlphaGo" beat the world's best Go player. Have you wondered if you too can train a program to play a simple game?

Reinforcement learning is a simple yet powerful technique that is driving many applications, from recommender systems to autonomous vehicles. It is best suited to handle situations where the behavior of the system cannot be described in simple rules. For example, a trained reinforcement learning agent can understand the scene on the road and drive the car like a human. In supply chain management, RL agents can make decisions on inventory ordering.

In this talk, I will demonstrate how to train a RL agent to a) cross a maze and b) play a game of Tic-Tac-Toe against an intelligent opponent c) act as a warehouse manager and learn inventory ordering; with the help of plain python code. As you participate in this talk, you will master the basics of reinforcement learning and acquire the skills to train your own dragon.


Outline/Structure of the Talk

  1. Brief introduction to Reinforcement Learning (RL) (2 mins)
  2. RL algorithms (2 mins)
    • On policy & off policy algorithms, Q learning & Bellman equation
    • State-Action-Reward-State-Action (SARSA), Deep Q network & Deep deterministic policy gradients
  3. Programming approach (4 mins)
    • Components of a RL program - function and metrics
    • Training the agent & testing
  4. Three games: (9 mins)
    • RL agent learning to crossing the maze
    • Playing a game of Tic-Tac-Toe
    • RL in supply chain - inventory ordering in the beer game
  5. Learnings: (3 mins)
    • How the agent learns in various scenarios
    • Training it for your application
    • Challenges and solutions

Learning Outcome

As you participate, you will master the fundamentals of Reinforcement Learning and acquire skills to develop your own RL agent.

Target Audience

Beginners, interested in reinforcement learning.

Prerequisites for Attendees

Basic programming skills. Some understanding of reinforcement learning will be helpful.



schedule Submitted 1 year ago

Public Feedback

    • Dr. Sri Vallabha Deevi

      Dr. Sri Vallabha Deevi - Machine health monitoring with AI

      20 Mins

      Predictive maintenance is the most recent technique in maintenance engineering. Machine operational parameters are used to assess the health of equipment and decide on maintenance schedule. In Aviation, aircraft engine manufacturers continuously monitor their engine parameters in flight to evaluate performance and deviations from normal.

      Application of AI in this field enables measurement of behavior that is not observable using traditional means. AI based monitoring provides the edge required to operate in Industry 4.0 where connected machines do away with buffers in between processes and any unscheduled downtime of one machine effects the entire production chain.

      This demonstration will walk you through the development of AI models using IoT data for one of the largest metal manufacturing company in India. It will help you master different types of AI models to answer questions like

      • When do I plan the maintenance of a given equipment?
      • Will a component last till the next maintenance cycle or do I replace it during the current maintenance?
      • How to identify faulty equipment in the long production line?
    • Gunjan Dewan

      Gunjan Dewan - Developing a match-making algorithm between customers and Go-Jek products!

      Gunjan Dewan
      Gunjan Dewan
      Data Scientist
      schedule 1 year ago
      Sold Out!
      20 Mins

      20+ products. Millions of active customers. Insane amount of data and complex domain. Come join me in this talk to know the journey we at Gojek took to predict which of our products a user is most likely to use next.

      A major problem we faced, as a company, was targeting our customers with promos and vouchers that were relevant to them. We developed a generalized model that takes into account the transaction history of users and gives a ranked list of our services that they are most likely to use next. From here on, we are able to determine the vouchers that we can target these customers with.

      In this talk, I will be talking about how we used recommendation engines to solve this problem, the challenges we faced during the time and the impact it had on our conversion rates. I will also be talking about the different iterations we went through and how our problem statement evolved as we were solving the problem.

    • Ravi Ranjan

      Ravi Ranjan - Deep Reinforcement Learning Based RecSys Using Distributed Q Table

      20 Mins

      Recommendation systems (RecSys) are the core engine for any personalized experience on eCommerce and online media websites. Most of the companies leverage RecSys to increase user interaction, to enrich shopping potential and to generate upsell & cross-sell opportunities. Amazon uses recommendations as a targeted marketing tool throughout its website that contributes 35% of its total revenue generation [1]. Netflix users watch ~75% of the recommended content and artwork [2]. Spotify employs a recommendation system to update personal playlists every week so that users won’t miss newly released music by artists they like. This has helped Spotify to increase its number of monthly users from 75 million to 100 million at a time [3]. YouTube's personalized recommendation helps users to find relevant videos quickly and easily which account for around 60% of video clicks from the homepage [4].

      In general, RecSys generates recommendations based on user browsing history and preferences, past purchases and item metadata. It turns out most existing recommendation systems are based on three paradigms: collaborative filtering (CF) and its variants, content-based recommendation engines, and hybrid recommendation engines that combine content-based and CF or exploit more information about users in content-based recommendation. However, they suffer from limitations like rapidly changing user data, user preferences, static recommendations, grey sheep, cold start and malicious user.

      Classical RecSys algorithm like content-based recommendation performs great on item to item similarities but will only recommend items related to one category and may not recommend anything in other categories as the user never viewed those items before. Collaborative filtering solves this problem by exploiting the user's behavior and preferences over the items in recommending items to the new users. However, collaborative filtering suffers from a few drawbacks like cold start, popularity bias, and sparsity. The classical recommendation models consider the recommendation as a static process. We can solve the static recommendation on rapidly changing user data by RL. RL based RecSys captures the user’s temporal intentions and responds promptly. However, as the user action and items matrix size increases, it becomes difficult to provide recommendations using RL. Deep RL based solutions like actor-critic and deep Q-networks overcome all the aforementioned drawbacks.

      Present systems suffer from two limitations, firstly considering the recommendation as a static procedure and ignoring the dynamic interactive nature between users and the recommender systems. Also, most of the works focus on the immediate feedback of recommended items and neglecting the long-term rewards based on reinforcement learning. We propose a recommendation system that uses the Q-learning method. We use ε-greedy policy combined with Q learning, a powerful method of reinforcement learning that handles those issues proficiently and gives the customer more chance to explore new pages or new products that are not so popular. Usually while implementing Reinforcement Learning (RL) to real-world problems both the state space and the action space are very vast. Therefore, to address the aforementioned challenges, we propose the multiple/distributed Q table approaches which can deal with large state-action space and that aides in actualizing the Q learning algorithm in the recommendation and huge state-action space.


      1. "":
      2. "":
      3. "":
      4. "":
      5. "Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modelling":
      6. "Deep Reinforcement Learning for Page-wise Recommendations":
      7. "Deep Reinforcement Learning for List-wise Recommendations":
      8. "Deep Reinforcement Learning Based RecSys Using Distributed Q Table":
    • Kriti Doneria

      Kriti Doneria - Trust Building in AI systems: A critical thinking perspective

      Kriti Doneria
      Kriti Doneria
      Data Science
      schedule 1 year ago
      Sold Out!
      20 Mins

      How do I know when to trust AI,and when not to?

      Who goes to jail if a self driving car kills someone tomorrow?

      Do you know scientists say people will believe anything,repeated enough

      Designing AI systems is also an exercise in critical thinking because an AI is only as good as its creator.This talk is for discussions like these,and more.

      With the exponential increase in computing power available, several AI algorithms that were mere papers written decades ago have become implementable. For a data scientist, it is very tempting to use the most sophisticated algorithm available. But given that its applicability has moved beyond academia and out into the business world, are numbers alone sufficient? Putting context to AI, or XAI (explainable AI) takes the black box out of AI to enhance human-computer interaction. This talk shall revolve around the interpret-ability-complexity trade-off, challenges, drivers and caveats of the XAI paradigm, and an intuitive demo of translating inner workings of an ML algorithm into human understandable formats to achieve more business buy-ins.

      Prepare to be amused and enthralled at the same time.

    • Srikanth K S

      Srikanth K S - Actionable Rules from Machine Learning Models

      20 Mins

      Beyond predictions, some ML models provide rules to identify actionable sub-populations in support-confidence-lift paradigm. Along with making the models interpretable, rules make it easy for stakeholders to decide on the plan of action. We discuss rule-based models in production, rule-based ensembles, anchors using R package: tidyrules.