Text Extraction from Images using deep learning techniques
Extracting texts of various sizes, shapes and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to e-commerce, augmented reality assistance system in a natural scene, content moderation in social media platforms, etc. The text from the image can be a richer and more accurate source of data than human inputs which can be used in several applications like Attribute Extraction, Profanity Checks, etc.
Typically, Extracting Text is achieved in 2 stages:
Text detection: this module helps to know the regions in the input image where the text is present.
Text recognition: given the regions in the image where the text is present, this module gives the raw text out of it.
In this session, I will be talking about the Character level Text Detection for detecting normal and arbitrary shaped texts. Later will be discussing the CRNN-CTC network & the need for CTC loss to obtain the raw text from the images.
Outline/Structure of the Talk
- Motivation for Text extraction from Images: 2 mins
- Defining a pipeline for Text Extraction: 2 mins
- Deep Learning techniques for Text Detection: 5 mins
- Understanding receptive fields in CNN: 2 mins
- Data preparation for training text recognition model: 1 min
- CRNN-CTC model for Text Recognition: 6 mins
- Use cases of text extraction in different domains: 2 mins
Learning Outcome
- Understanding the need for Text Extraction from Images.
- Deep Learning Techniques for detecting highly oriented text.
- Understanding of receptive fields in CNN’s.
- Theoretical understanding CRNN-CTC network for Text recognition and need for CTC loss.
- Usage of Text Extraction in various fields/domains.
Target Audience
Data Scientists, Machine Learning Engineers, Researchers.
Prerequisites for Attendees
Basic understanding of CNN's and LSTM's.
Links
Presenting at:
Spark AI Summit, San Francisco, California in June organized by Data Bricks.
Also, this work is presented at Kaggle Days Meetup(Senior Track) #2, Bengaluru
Link: https://www.linkedin.com/posts/rajeshshreedhar_kaggle-kaggledays-meetup-activity-6589216549201117184-tqfT
Previous talks:
- [Nov. 2019] Speaker at Kaggle Days Meetup(Senior Track) #3, Bengaluru, Topic: Attention Models in NLP & Vision, Link: https://www.linkedin.com/posts/designerhv_kaggledays-kaggledaysbangalore-datascience-activity-6606514045027745792-ufb6
- [Nov. 2019] Speaker at Data Hack Summit, India's Largest Applied Artificial Intelligence & Machine Learning Conference organized by Analytics Vidhya, Topic: Image Captioning with Attention models, Link: https://www.analyticsvidhya.com/datahack-summit-2019/schedule/hack-session-image-captioning-using-attention-models
- [Oct. 2019] Speaker at Kaggle Days Meetup(Senior Track) #2, Bengaluru, Topic:Text extraction from Product Images, Link: https://www.linkedin.com/posts/rajeshshreedhar_kaggle-kaggledays-meetup-activity-6589216549201117184-tqfT
- [Aug. 2019] Panelist at Kaggle Days Meetup(Senior Track) #1, Bengaluru, Topic: Kaggle Journey and Approaching Kaggle Competitions, https://www.linkedin.com/posts/rajeshshreedhar_kaggle-kaggledays-meetup-activity-6571065779024953344-57s0
Upcoming talks:
- [Jun. 2020] Speaking at Spark AI Summit, CA. Topic: Text Extraction from Image, Link:https://databricks.com/session_na20/text-extraction-from-product-images-using-state-of-the-art-deep-learning-techniques
schedule Submitted 2 years ago
People who liked this proposal, also liked:
-
keyboard_arrow_down
SOURADIP CHAKRABORTY / Rajesh Shreedhar Bhat - Learning Generative models with Visual attentions in the field of Image Captioning
SOURADIP CHAKRABORTYData ScientistWalmart LabsRajesh Shreedhar BhatData ScientistWalmart Labsschedule 3 years ago
20 Mins
Talk
Intermediate
Image caption generation is the task of generating a descriptive and appropriate sentence of a given image. For humans, the task looks straightforward with the motive of summarising the image in a single sentence incorporating the interactions between the various components present in the image. But to replicate this phenomenon in an artificial framework is a very challenging task. Attention fixes this problem as it allows the network to look over the relevant features of the encoder as an input to the decoder at each time step. In this session, we show how attention mechanism enhances the performance of language translation tasks in an encoder-decoder framework.
Before the attention mechanism in sequence to sequence settings, the entire sequence was encoded into a thought/context vector which was used to initialize the decoder to generate the output sequence. But the major shortcoming of this methodology was that no weightage was given to the encoder features in the context of the generated sequence, thereby confounding the network and resulting in the inadequate output sequence.
Inspired by the outstanding results of using attention mechanisms in machine translation and other seq2seq tasks, there have been few advancements in the field of computer vision using attention techniques. In this session, we incorporate visual attention mechanisms in generating relevant captions from images using a deep learning framework.
-
keyboard_arrow_down
SOURADIP CHAKRABORTY / Sayak Paul - Implicit Data Modelling using Self-Supervised Transfer Learning
SOURADIP CHAKRABORTYData ScientistWalmart LabsSayak PaulDeep Learning AssociatePyImageSearchschedule 3 years ago
20 Mins
Talk
Intermediate
Transfer learning is specifically very helpful when there is a scarcity of data, limited bandwidth that might not allow training deep models from scratch, and so on. In the world of computer vision, ImageNet pre-training has been widely successful across a number of different tasks, image classification being the most popular one. All of that success has been possible mainly because of the ImageNet dataset which is a collection of images spanning across 1000 labels. This is where a stern limitation comes in - the need for having labeled data. In this session, we want to take a deep dive into the world of self-supervised learning which allows models to exploit the implicit labels of input data. In the first half of the session, we will be covering the basics of transfer learning, its successes, and its challenges. We will then start by formulating the problem that self-supervised learning tries to address. In the second half of the session, we will be discussing the ABCs of self-supervised learning along with some examples. We will conclude by a shortcode walk-through and a discussion on the challenges of self-supervised learning.