Introduction :

"Special children" includes children who are affected with a complex neuro-behavioral conditions like autism, which includes impairments in social interaction, language development and communication skills, combined with rigid, repetitive behaviors. Children with autism particularly face a very difficult childhood as they have extreme difficulty in communication. They have trouble in understanding what other people think and feel. This makes it very hard for them to express themselves either with words or through gestures.

Such special children need “special” care for the development of their cognitive abilities. The amount of learning resources required for teaching such children are extremely hard to find and less accessible to many.

So, can artificial intelligence with the help of modern deep learning algorithms generate animated videos for developing or improving cognitive abilities of such a special group?

The idea to combat the problem:

Well, I feel it can be done!

An animated video consists of 3 main components:

1. Graphical video (sequence of images put together to tell a story),

2. A background story and

3. A relevant background audio or music.

Now if we have to come up with a system that produces machine generated animated video, we would have to think about these three components:

  1. Machine generated sequence of images with a spatial coherence
  2. Machine generated text, or the story
  3. Machine generated audio or music, that highlights the mood or the theme of the video

If these three discrete components are put together in a cohesive flow, our purpose can be achieved. And the Deep Learning community has already been able to make significant progress in terms of machine generated images and audio and machine generated text.

Details about the three pillars of this problem:

Machine generated sequence of images with a spatial coherence

Generative Adversarial Networks (GANs) has been quite successful till date to come up with generated images and audio. Also, for our use case, to maintain a coherency in spatial features, Variational Auto Encoders (VAEs) have been even better.

If we start with a popular use case of a very popular cartoon series, Tom & Jerry, specially modified for autistic children, let’s consider a simple scene where tom is chasing jerry. On an image level, for the entire scene, the posture of tom and jerry will remain constant, only their location will vary in every subsequent image frame in the entire scene. Which means, only their spatial location with respect to the entire image background will vary and hence VAEs will have the potential to implement such a use case as VAEs helps to provide probabilistic descriptions of features or observations in latent spaces.

Machine generated text, or the story

Coming to text generation or story generation, recurrent neural networks like Long/Short Term Memory (LSTM) has been quite successful. Already, LSTM has been used to artificially generate chapters from popular novels or stories like Harry Potter and Cinderella. So, for a simple animated video story specially structured for autistic children, LSTM can be effective. Although Gradient Recurrent Units (GRU) can be the other alternative, but till date LSTM has been more successful, so the first preference will always be LSTM.

Machine generated audio or music

For music generation, GANs have been proved effective till date. For our use case, Natural Language Processing or NLP can used to determine the type of scene from the generated story, e.g. for the Tom & Jerry scene, it will be a chase scene. Based on this classification, Deep Convolution Generative Adversarial Networks (DCGAN) can be used to generate music which is relevant to such a chase scene and at the same time be soothing and enjoyable to such children!

Assembling everything together

Now if we can put all these discrete pieces of the puzzle together, we can come up with a completely machine generated animated video tailor-made for developing and improving cognitive abilities of children with autism. This will be a new progress in the field of Artificial Intelligence!

These machine generated videos can be trained on Neural Network in such a way that it can be a source of fun and enjoyment for this special group and at the same time reward their good behavior and educate them in a sensitive way without any human dependency.

Future scope and extension

As a future scope, if this approach is successful, the gaming industry can adopt usage of such a technology and with the help of reinforcement learning, can come up with machine generated video games and educational games specially designed for such children that can disrupt the entire gaming industry and can be a source of happiness for such children!


Outline/Structure of the Talk

Introduction : Discussion on the problem statement

Objectives : Discussion on the target that can be achieved

Technical discussion on the three components of the problem:

  1. Image Generation using GAN and VAE
  2. Text Generation using LSTM and GRU
  3. Music Generation using GAN


Social Impact

Sustainability and Future Scope

AI for ALL

Brief Demonstration

Learning Outcome

The audience is expected to receive concrete knowledge on the following topic:

1. Computer Vision with Deep Learning

2. Image generation with VAEs

3. Use of deep learning in sequential data

4. Music generation with GAN

Also, some of the applications of this solution can be extended to other use cases as well which will be discussed during the talk

Target Audience

Researchers, Developers, AI Enthusiasts , Government Organizations and NGO members interested to know the potential of AI and anyone who wants to join our journey to improve the childhood experience of autistic children.

Prerequisites for Attendees

1. Basics of Machine Learning

2. Basics of Neural Network

3. Basics of Computer Vision with Deep Learning

4. Basics of Natural Language Processing

5. High level idea about modern deep learning algorithms

6. Passion for solving human life problems

schedule Submitted 8 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Dr. Vikas Agrawal
    By Dr. Vikas Agrawal  ~  7 months ago
    reply Reply

    Dear Aditya: Your description sounds like a proposal of what one would like to do and the proposal seems to described like a tutorial for different techniques.

    Do you have a working solution for the problem you are presenting? Are you presenting that solution?

    Warm Regards


    • Aditya Bhattacharya
      By Aditya Bhattacharya  ~  7 months ago
      reply Reply

      Thanks alot for reading through my proposal. For this session, I was planning to talk about how my proposed solution can solve the problem that I am targeting. So, for answering the "how" part I thought of discussing about how various deep learning techniques can be used and put together to come up with the proposed solution.

      As of now, I do not have a total integrated working solution ready. My plan is to start with each of the three pillars mentioned and then come up with an integrated solution. But yes, I do have plans of demoing whatever progress I will be able to make and would like to talk about some of the challenges faced, so that the community can guide me towards a better solution.

      Please feel free to let me know your overall feedback about the concept and the idea and if you can think of any other concept that can be considered as another pillar for such a solution, that would be great!

      Best Regards,

  • Usha Rengaraju
    By Usha Rengaraju  ~  7 months ago
    reply Reply

    Hi Aditya,

    Thank you for the proposal submission. I would like to know which group of autistic people are you targeting as autism is a spectrum . High functioning autistic have advanced cognitive abilities . ( Eg . Bill Gates , Steve Jobs , Benjamin Franklin , Abraham Lincoln are few of the famous High Functioning autistic). Low functioning autism people will have trouble understanding machine generated text. 

    Thanks and Regards,

    Usha Rengaraju


    • Aditya Bhattacharya
      By Aditya Bhattacharya  ~  7 months ago
      reply Reply

      Hello Usha,

      Thanks for the excellent question and perspective! I was thinking more about low functioning autism particularly for children with under-developed cognitive abilities but it will be  a really interesting use case for high functioning autism. I will try to research on high functioning autism and how my proposed solution can help such a group. But as of now, my proposal was mainly for LFA.

      Best Regards,


  • Dipanjan Sarkar
    By Dipanjan Sarkar  ~  7 months ago
    reply Reply

    Hi Aditya thanks for this submission. The topic definitely looks to be innovative and having a good scope.

    Few questions here: 

    1. Would you be showcasing any examples around how these are done?
    2. What I'm having trouble understanding is, in what way would the AI generated content be specially geared for children with autism vs normal children i.e how would it benefit them
    3. Would you have enough time to tie the three major components in your proposal and showcase how them connect together?



    - DJ

    • Aditya Bhattacharya
      By Aditya Bhattacharya  ~  7 months ago
      reply Reply

      Thanks DJ for reading my proposal. 

      Coming to your queries:

      1. Yes, I have plans to show how the three main components that I have talked about can be done.

      2. Generating content typically designed for autism requires extensive research and domain knowledge and a lot of training data. I might not be able to come up with the entire solution before ODSC, given the time constraint. But from what I was able to find out, autism requires different type of music like more extensive use of binaural beats , image patterns and textual structure as compared to that of normal content and hence I thought that it can be an interesting point to start with. I would try to demo my progress till then and would try to talk about challenges that I will face so that I can reach out for help from the community. 

      3. Time will be a major constraint, but definitely I would try to talk about how these three components can be tied together. But yet I don't want to make the content too lengthy with too much details as the audience might lose the focus!

      I hope I was able to answer most of your queries. If not, let's connect offline. Feel free to reach me through email or LinkedIn :)

      • Dipanjan Sarkar
        By Dipanjan Sarkar  ~  7 months ago
        reply Reply

        That is good. It's alright you don't need to build a full-fledged product or something but even if you can showcase some of these aspects it would be great, particularly, " But from what I was able to find out, autism requires different type of music like more extensive use of binaural beats , image patterns and textual structure as compared to that of normal content and hence I thought that it can be an interesting point to start with"

        I think if you can showcase some of these distinctions and how they would be of use to children with special needs, it would be amazing.

        • Aditya Bhattacharya
          By Aditya Bhattacharya  ~  7 months ago
          reply Reply

          Thanks DJ, definitely would try to work on and showcase some of these distinct features.