Proximal Policy Optimization: OpenAI's Major Breakthrough

OpenAI made a breakthrough in Deep Reinforcement Learning when they created OpenAI five,a team of 5 agents and beat some of the best Dota2 players in the world.

Underlying algorithm that helped them in achieving this is Proximal Policy Optimization.OpenAI told that PPO has become its default reinforcement algorithm because of its ease and good performance.It uses novel policy gradient techniques that was more efficient and less costly than the traditional reinforcement learning methods.Key contribution of PPO is ensuring that a new update of policy does not change it too much from the previous policy.This leads to less variance at the cost of some bias but ensures smoother training.It also makes sure that the agent does not go down the unrecoverable path of taking senseless actions.

In this talk,I will be discussing the architecture of PPO,its major changes from Policy Gradient methods and how it can be implemented through code

 
 

Outline/Structure of the Demonstration

Agenda for the talk:

  • Recap of Value and Policy based methods( 2 mins)
  • Use cases of PPO( 2 mins )
  • PPO Architecture( 10 mins)
  • Code walkthrough: Training an agent with PPO( 4 mins )
  • Q / A session ( 2mins )

Learning Outcome

1.Why PPO is the major breakthrough in Reinforcement Learning

2.How it is used in dealing with earlier challenges in Reinforcement Learning

3.Good understanding of the PPO Architecture

Target Audience

Data Scientist, Deep Learning Engineers, RL Enthusiasts, AI Researchers

Prerequisites for Attendees

Attendees are required to have a good understanding of basic Reinforcement Learning Concepts

Video


schedule Submitted 1 year ago

Public Feedback


    • Gouthaman Asokan
      keyboard_arrow_down

      Gouthaman Asokan - Real Time Multi Person Pose Estimation

      Gouthaman Asokan
      Gouthaman Asokan
      AI Researcher
      Cellstrat
      schedule 1 year ago
      Sold Out!
      20 Mins
      Demonstration
      Intermediate

      Openpose is a library written in C++ with python wrapper available for real time multi person key point detection and multithreading. This model predicts the location of various human keypoints such as chest, hips, shoulder, neck, elbows, knees. This model uses part affinity fields and greedy inference to connect these localized keypoints.

      In this talk, I'll be discussing how Openpose helps in the real time multi person detection system to jointly detect human body,hand,facial and foot keypoints detection and the part affinity field.

      Also,discuss the model architecture,comparing with other models like Mask RCNN and AlphaPose. Finally show how pose estimation can be done on single as well as multiple person images using pretrained models

    • Gouthaman Asokan
      keyboard_arrow_down

      Gouthaman Asokan - Controlling the Style of Images Generated using StyleGAN

      Gouthaman Asokan
      Gouthaman Asokan
      AI Researcher
      Cellstrat
      schedule 1 year ago
      Sold Out!
      20 Mins
      Demonstration
      Intermediate

      Generative adversarial networks (GANs) are algorithmic architectures that use two neural networks, pitting one against the other (thus the “adversarial”) in order to generate new, synthetic instances of data that can pass for real data. They are used widely in image generation, video generation and voice generation

      While GAN images become more realistic over time,one of the main challenges is controlling the output i.e:changing specific features such as pose,face shape and hair style. NVIDIA proposed a novel method which generated images starting from very low resolution and continuing to high resolution.By modifying the input at each level,it controls the visual features in that level from coarse features(pose,face shape) to find details(hair color without affecting other levels)

      In this talk,I will be discussing how the lack of control affected the generation of images in previous GAN models and the changes introduced by StyleGAN that help in controlling the style and generating impressive images of synthetic human faces

    help