Member since 7 months
Aditya is currently working as Data and Cloud Platform Engineer in West Pharmaceuticals in Bangalore, India. Being an ex-Microsoft employee, Aditya is an enthusiastic learner, who loves to explore new technologies and tries to grasp the in-depth knowledge of the concepts used in them.
Aditya has roughly 4 years of experience in domains like Internet of Things (IoT), Machine Learning, Robotics and Cloud Computing. Currently, he is working on a research project to compare performance of Deep Learning Algorithms Variational Auto Encoder (VAE) and Deep Convolutional Generative Adversarial Networks (DCGAN) for generating pencil sketch images.
Apart from Computer Vision, he has experience in handling sequential unstructured data like generation of text using LSTM, Natural Language Processing and Speech Recognition. Apart from technical expertise, Aditya is enthusiastic about teaching, mentoring and active community participation. Sports and Education are the two areas which deeply excites him and he believes that with a good development for individuals in these two areas can solve some of the major challenges which the world is facing.
Aditya has recently attended NIT Silchar's ML Hackathon 2019 as a guest speaker on Computer Vision using Deep Learning and has mentored teams for the hackathon and was a part of the judges panel in the event.
Phone: +91 7980501915
Workshop: Introduction to Image Generation in Computer Vision using Deep Learning
Impact of Data Science and Artificial Intelligence on Societal Growth
Machine generated animations for improving cognitive abilities of "special children"1 odsc-india-2019 machine-learning-&-deep-learning Talk 45 Mins Advanced machine-generated-animation machine-generated-animations-for-improving-cognitive-abilities-of-children-with-autism open-data-science improve-human-cognitive-abilities-with-ai gan lstm vae computer-vision machine-generated-animations-for-improving-cognitive-abilities-of-"special-children"
"Special children" includes children who are affected with a complex neuro-behavioral conditions like autism, which includes impairments in social interaction, language development and communication skills, combined with rigid, repetitive behaviors. Children with autism particularly face a very difficult childhood as they have extreme difficulty in communication. They have trouble in understanding what other people think and feel. This makes it very hard for them to express themselves either with words or through gestures.
Such special children need “special” care for the development of their cognitive abilities. The amount of learning resources required for teaching such children are extremely hard to find and less accessible to many.
So, can artificial intelligence with the help of modern deep learning algorithms generate animated videos for developing or improving cognitive abilities of such a special group?
The idea to combat the problem:
Well, I feel it can be done!
An animated video consists of 3 main components:
1. Graphical video (sequence of images put together to tell a story),
2. A background story and
3. A relevant background audio or music.
Now if we have to come up with a system that produces machine generated animated video, we would have to think about these three components:
- Machine generated sequence of images with a spatial coherence
- Machine generated text, or the story
- Machine generated audio or music, that highlights the mood or the theme of the video
If these three discrete components are put together in a cohesive flow, our purpose can be achieved. And the Deep Learning community has already been able to make significant progress in terms of machine generated images and audio and machine generated text.
Details about the three pillars of this problem:
Machine generated sequence of images with a spatial coherence
Generative Adversarial Networks (GANs) has been quite successful till date to come up with generated images and audio. Also, for our use case, to maintain a coherency in spatial features, Variational Auto Encoders (VAEs) have been even better.
If we start with a popular use case of a very popular cartoon series, Tom & Jerry, specially modified for autistic children, let’s consider a simple scene where tom is chasing jerry. On an image level, for the entire scene, the posture of tom and jerry will remain constant, only their location will vary in every subsequent image frame in the entire scene. Which means, only their spatial location with respect to the entire image background will vary and hence VAEs will have the potential to implement such a use case as VAEs helps to provide probabilistic descriptions of features or observations in latent spaces.
Machine generated text, or the story
Coming to text generation or story generation, recurrent neural networks like Long/Short Term Memory (LSTM) has been quite successful. Already, LSTM has been used to artificially generate chapters from popular novels or stories like Harry Potter and Cinderella. So, for a simple animated video story specially structured for autistic children, LSTM can be effective. Although Gradient Recurrent Units (GRU) can be the other alternative, but till date LSTM has been more successful, so the first preference will always be LSTM.
Machine generated audio or music
For music generation, GANs have been proved effective till date. For our use case, Natural Language Processing or NLP can used to determine the type of scene from the generated story, e.g. for the Tom & Jerry scene, it will be a chase scene. Based on this classification, Deep Convolution Generative Adversarial Networks (DCGAN) can be used to generate music which is relevant to such a chase scene and at the same time be soothing and enjoyable to such children!
Assembling everything together
Now if we can put all these discrete pieces of the puzzle together, we can come up with a completely machine generated animated video tailor-made for developing and improving cognitive abilities of children with autism. This will be a new progress in the field of Artificial Intelligence!
These machine generated videos can be trained on Neural Network in such a way that it can be a source of fun and enjoyment for this special group and at the same time reward their good behavior and educate them in a sensitive way without any human dependency.
Future scope and extension
As a future scope, if this approach is successful, the gaming industry can adopt usage of such a technology and with the help of reinforcement learning, can come up with machine generated video games and educational games specially designed for such children that can disrupt the entire gaming industry and can be a source of happiness for such children!
Person Identification via Multi-Modal Interface with Combination of Speech and Image DataJoy MustafiFounder and PresidentMUST ResearchAditya BhattacharyaAI ResearcherMUST Research
schedule 9 months agoSold Out!
Having multiple modalities in a system gives more affordance to users and can contribute to a more robust system. Having more also allows for greater accessibility for users who work more effectively with certain modalities. Multiple modalities can be used as backup when certain forms of communication are not possible. This is especially true in the case of redundant modalities in which two or more modalities are used to communicate the same information. Certain combinations of modalities can add to the expression of a computer-human or human-computer interaction because the modalities each may be more effective at expressing one form or aspect of information than others. For example, MUST researchers are working on a personalized humanoid built and equipped with various types of input devices and sensors to allow them to receive information from humans, which are interchangeable and a standardized method of communication with the computer, affording practical adjustments to the user, providing a richer interaction depending on the context, and implementing robust system with features like; keyboard; pointing device; touchscreen; computer vision; speech recognition; motion, orientation etc.
There are six types of cooperation between modalities, and they help define how a combination or fusion of modalities work together to convey information more effectively.
- Equivalence: information is presented in multiple ways and can be interpreted as the same information
- Specialization: when a specific kind of information is always processed through the same modality
- Redundancy: multiple modalities process the same information
- Complimentarity: multiple modalities take separate information and merge it
- Transfer: a modality produces information that another modality consumes
- Concurrency: multiple modalities take in separate information that is not merged
Computer - Human Modalities
Computers utilize a wide range of technologies to communicate and send information to humans:
- Vision - computer graphics typically through a screen
- Audition - various audio outputs
Adaptive: They MUST learn as information changes, and as goals and requirements evolve. They MUST resolve ambiguity and tolerate unpredictability. They MUST be engineered to feed on dynamic data in real time.
Interactive: They MUST interact easily with users so that those users can define their needs comfortably. They MUST interact with other processors, devices, services, as well as with people.
Iterative and Stateful: They MUST aid in defining a problem by asking questions or finding additional source input if a problem statement is ambiguous or incomplete. They MUST remember previous interactions in a process and return information that is suitable for the specific application at that point in time.
Contextual: They MUST understand, identify, and extract contextual elements such as meaning, syntax, time, location, appropriate domain, regulation, user profile, process, task and goal. They may draw on multiple sources of information, including both structured and unstructured digital information, as well as sensory inputs (visual, gestural, auditory, or sensor-provided).
Multi-Modal Interaction: https://www.youtube.com/watch?v=jQ8Gq2HWxiA
Gesture Detection: https://www.youtube.com/watch?v=rDSuCnC8Ei0
Speech Recognition: https://www.youtube.com/watch?v=AewM3TsjoBk
Assignment (Hands-on Challenge for Attendees)
Real-time multi-modal access control system for authorized access to work environment - All the key concepts and individual steps will be demonstrated and explained in this workshop, and the attendees need to customize the generic code or approach for this assignment or hands-on challenge.
No more submissions exist.
No more submissions exist.