Proximal Policy Optimization: OpenAI's Major Breakthrough

OpenAI made a breakthrough in Deep Reinforcement Learning when they created OpenAI five,a team of 5 agents and beat some of the best Dota2 players in the world.

Underlying algorithm that helped them in achieving this is Proximal Policy Optimization.OpenAI told that PPO has become its default reinforcement algorithm because of its ease and good performance.It uses novel policy gradient techniques that was more efficient and less costly than the traditional reinforcement learning methods.Key contribution of PPO is ensuring that a new update of policy does not change it too much from the previous policy.This leads to less variance at the cost of some bias but ensures smoother training.It also makes sure that the agent does not go down the unrecoverable path of taking senseless actions.

In this talk,I will be discussing the architecture of PPO,its major changes from Policy Gradient methods and how it can be implemented through code

 
 

Outline/Structure of the Demonstration

Agenda for the talk:

  • Recap of Value and Policy based methods( 2 mins)
  • Use cases of PPO( 2 mins )
  • PPO Architecture( 10 mins)
  • Code walkthrough: Training an agent with PPO( 4 mins )
  • Q / A session ( 2mins )

Learning Outcome

1.Why PPO is the major breakthrough in Reinforcement Learning

2.How it is used in dealing with earlier challenges in Reinforcement Learning

3.Good understanding of the PPO Architecture

Target Audience

Data Scientist, Deep Learning Engineers, RL Enthusiasts, AI Researchers

Prerequisites for Attendees

Attendees are required to have a good understanding of basic Reinforcement Learning Concepts

schedule Submitted 4 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Dr. Vikas Agrawal
    By Dr. Vikas Agrawal  ~  3 months ago
    reply Reply

    Dear Gauthaman: You are presenting the work of other researchers in this talk. Have you applied this to a specific problem at CellStart with Vivek Singhal et al, please? At ODSC we love to see how our colleagues are applying other researchers work in their own problems, and we can learn from those. Warm Regards, Vikas

    • Gouthaman Asokan
      By Gouthaman Asokan  ~  3 months ago
      reply Reply

      Hi Vikas,
      I will be discussing how PPO can be extrapolated and used in other games.Also,will be showing with a specific example that I worked on,the python code implementation of Proximal Policy Optimization.

      Thanks,
      Gouthaman Asokan

  • Natasha Rodrigues
    By Natasha Rodrigues  ~  4 months ago
    reply Reply

    Hi Gouthaman,

    Thanks for your proposal and for your voice-over videos, however to help the program committee understand your presentation style, can you provide a link to your past recording or record a small 1-2 mins trailer of your talk and share the link to the same?

    Thanks,
    Natasha


  • Liked Gouthaman Asokan
    keyboard_arrow_down

    Gouthaman Asokan - Real Time Multi Person Pose Estimation

    Gouthaman Asokan
    Gouthaman Asokan
    AI Researcher
    Cellstrat
    schedule 4 months ago
    Sold Out!
    20 Mins
    Demonstration
    Intermediate

    Openpose is a library written in C++ with python wrapper available for real time multi person key point detection and multithreading. This model predicts the location of various human keypoints such as chest, hips, shoulder, neck, elbows, knees. This model uses part affinity fields and greedy inference to connect these localized keypoints.

    In this talk, I'll be discussing how Openpose helps in the real time multi person detection system to jointly detect human body,hand,facial and foot keypoints detection and the part affinity field.

    Also,discuss the model architecture,comparing with other models like Mask RCNN and AlphaPose. Finally show how pose estimation can be done on single as well as multiple person images using pretrained models

  • Liked Gouthaman Asokan
    keyboard_arrow_down

    Gouthaman Asokan - Controlling the Style of Images Generated using StyleGAN

    Gouthaman Asokan
    Gouthaman Asokan
    AI Researcher
    Cellstrat
    schedule 4 months ago
    Sold Out!
    20 Mins
    Demonstration
    Intermediate

    Generative adversarial networks (GANs) are algorithmic architectures that use two neural networks, pitting one against the other (thus the “adversarial”) in order to generate new, synthetic instances of data that can pass for real data. They are used widely in image generation, video generation and voice generation

    While GAN images become more realistic over time,one of the main challenges is controlling the output i.e:changing specific features such as pose,face shape and hair style. NVIDIA proposed a novel method which generated images starting from very low resolution and continuing to high resolution.By modifying the input at each level,it controls the visual features in that level from coarse features(pose,face shape) to find details(hair color without affecting other levels)

    In this talk,I will be discussing how the lack of control affected the generation of images in previous GAN models and the changes introduced by StyleGAN that help in controlling the style and generating impressive images of synthetic human faces