Introduction :

"Special children" includes children who are affected with a complex neuro-behavioral conditions like autism, which includes impairments in social interaction, language development and communication skills, combined with rigid, repetitive behaviors. Children with autism particularly face a very difficult childhood as they have extreme difficulty in communication. They have trouble in understanding what other people think and feel. This makes it very hard for them to express themselves either with words or through gestures.

Such special children need “special” care for the development of their cognitive abilities. The amount of learning resources required for teaching such children are extremely hard to find and less accessible to many.

So, can artificial intelligence with the help of modern deep learning algorithms generate animated videos for developing or improving cognitive abilities of such a special group?

The idea to combat the problem:

Well, I feel it can be done!

An animated video consists of 3 main components:

1. Graphical video (sequence of images put together to tell a story),

2. A background story and

3. A relevant background audio or music.

Now if we have to come up with a system that produces machine generated animated video, we would have to think about these three components:

  1. Machine generated sequence of images with a spatial coherence
  2. Machine generated text, or the story
  3. Machine generated audio or music, that highlights the mood or the theme of the video

If these three discrete components are put together in a cohesive flow, our purpose can be achieved. And the Deep Learning community has already been able to make significant progress in terms of machine generated images and audio and machine generated text.

Details about the three pillars of this problem:

Machine generated sequence of images with a spatial coherence

Generative Adversarial Networks (GANs) has been quite successful till date to come up with generated images and audio. Also, for our use case, to maintain a coherency in spatial features, Variational Auto Encoders (VAEs) have been even better.

If we start with a popular use case of a very popular cartoon series, Tom & Jerry, specially modified for autistic children, let’s consider a simple scene where tom is chasing jerry. On an image level, for the entire scene, the posture of tom and jerry will remain constant, only their location will vary in every subsequent image frame in the entire scene. Which means, only their spatial location with respect to the entire image background will vary and hence VAEs will have the potential to implement such a use case as VAEs helps to provide probabilistic descriptions of features or observations in latent spaces.

Machine generated text, or the story

Coming to text generation or story generation, recurrent neural networks like Long/Short Term Memory (LSTM) has been quite successful. Already, LSTM has been used to artificially generate chapters from popular novels or stories like Harry Potter and Cinderella. So, for a simple animated video story specially structured for autistic children, LSTM can be effective. Although Gradient Recurrent Units (GRU) can be the other alternative, but till date LSTM has been more successful, so the first preference will always be LSTM.

Machine generated audio or music

For music generation, GANs have been proved effective till date. For our use case, Natural Language Processing or NLP can used to determine the type of scene from the generated story, e.g. for the Tom & Jerry scene, it will be a chase scene. Based on this classification, Deep Convolution Generative Adversarial Networks (DCGAN) can be used to generate music which is relevant to such a chase scene and at the same time be soothing and enjoyable to such children!

Assembling everything together

Now if we can put all these discrete pieces of the puzzle together, we can come up with a completely machine generated animated video tailor-made for developing and improving cognitive abilities of children with autism. This will be a new progress in the field of Artificial Intelligence!

These machine generated videos can be trained on Neural Network in such a way that it can be a source of fun and enjoyment for this special group and at the same time reward their good behavior and educate them in a sensitive way without any human dependency.

Future scope and extension

As a future scope, if this approach is successful, the gaming industry can adopt usage of such a technology and with the help of reinforcement learning, can come up with machine generated video games and educational games specially designed for such children that can disrupt the entire gaming industry and can be a source of happiness for such children!

 
 

Outline/Structure of the Talk

Introduction : Discussion on the problem statement

Objectives : Discussion on the target that can be achieved

Technical discussion on the three components of the problem:

  1. Image Generation using GAN and VAE
  2. Text Generation using LSTM and GRU
  3. Music Generation using GAN

Innovativeness

Social Impact

Sustainability and Future Scope

AI for ALL

Brief Demonstration

Learning Outcome

The audience is expected to receive concrete knowledge on the following topic:

1. Computer Vision with Deep Learning

2. Image generation with VAEs

3. Use of deep learning in sequential data

4. Music generation with GAN

Also, some of the applications of this solution can be extended to other use cases as well which will be discussed during the talk

Target Audience

Researchers, Developers, AI Enthusiasts , Government Organizations and NGO members interested to know the potential of AI and anyone who wants to join our journey to improve the childhood experience of autistic children.

Prerequisites for Attendees

1. Basics of Machine Learning

2. Basics of Neural Network

3. Basics of Computer Vision with Deep Learning

4. Basics of Natural Language Processing

5. High level idea about modern deep learning algorithms

6. Passion for solving human life problems

schedule Submitted 1 year ago

Public Feedback