Machine Learning Engineer
Member since 1 year
Piyush is a graduate from Georgia Institute of Techonology and is currently working as an NLP Engineer at vahan.ai. After passing out from IITR with a Bachelors in ECE, he started out his career as a 4G protocol engineer but soon got attracted towards the fast growing ML/AI domain. Over time he switched over to this domain and, after some exploration, found his interest in working with vernacular languages.
When he is not at work, he spends his time focussing on fitness and honing his skills with the guitar.
Normalizing User-Generated Text DataPiyush MakhijaMachine Learning EngineerVahan Inc
schedule 4 months agoSold Out!
A large fraction of work in NLP work in academia and research groups deals with clean datasets that are much more structured and free of noise. However, when it comes to building real-world NLP applications, one often has to collect data from applications such as chats, user-discussion forums, social-media conversations, etc. Invariably all NLP applications in industrial settings that have to deal with much more noisy and varying data - data with spelling mistakes, typos, acronyms, emojis, embedded metadata, etc.
There is a high level of disparity between the data SOTA language models were trained on & the data these models are expected to work on in practice. This renders most commercial NLP applications working with noisy data unable to take advantage of SOTA advances in the field of language computation.
Handcrafting rules and heuristics to correct this data on a large scale might not be a scalable option for most industrial applications. Most SOTA models in NLP are not designed keeping in mind noise in the data. They often give a substandard performance on noisy data.
In this talk, we share our approach, experience, and learnings from designing a robust system to clean noise in data, without handcrafting the rules, using Machine Translation, and effectively making downstream NLP tasks easier to perform.
This work is motivated by our business use case where we are building a conversational system over WhatsApp to screen candidates for blue-collar jobs. Our candidate user base often comes from tier-2 and tier-3 cities of India. Their responses to our conversational bot are mostly a code mix of Hindi and English coupled with non-canonical text (ex: typos, non-standard syntactic constructions, spelling variations, phonetic substitutions, foreign language words in a non-native script, grammatically incorrect text, colloquialisms, abbreviations, etc). The raw text our system gets is far from clean well-formatted text and text normalization becomes a necessity to process it any further.
This talk is meant for computational language researchers/NLP practitioners, ML engineers, data scientists, senior leaders of AI/ML/DS groups & linguists working with non-canonical resource-rich, resource-constrained i.e. vernacular & code-mixed languages.
Going Beyond "from huggingface import bert"Piyush MakhijaMachine Learning EngineerVahan IncAnkit KumarNLP Researchervahan.co
schedule 4 months agoSold Out!
Google AI stirred up the language processing domain with the introduction of Transformer architecture and BERT models. Models built using transformer based architecture have outperformed and set new standards for State-of-the-art (SOTA) for NLP tasks like text classification, question-answering, text summarization, etc. BERT is said to improve 10% of google search results, single handedly the largest improvement brought in by any approach Google has tried in recent years. In this talk, we aim to demystify BERT and help industry practitioners help gain a deeper understanding of the same.
The ImageNet moment for NLP arrived with Bidirectional Encoder Representations from Transformers (BERT). Introduction of BERT created a wave in language research and variations of BERT established new State-of-the-Art (SOTA) metrics for all standard NLP tasks which were majorly held by techniques utilizing pre-trained word-vectors. BERT and its variants demonstrate, with a high degree of validity across the research community, that pre-trained models can SOTA on a range of NLP Tasks.
Owing to its success in academia, industry practitioners started utilizing open-source BERT based models in their own applications for tasks ranging from NER extraction & text classification to search recommendations & opinion mining. It is common to find applied scientists, ML engineers or even researchers to use BERT based models as a black box for their tasks. In some cases, miraculous better than expected results are found, but in many cases we may not find encouraging results upon direct application of a black-box understanding.
In this talk, we aim to go under the skin of BERT and help the audience build a better understanding of the internal workings of the same.
Virtual Assistant for Hiring Last-Mile WorkforcePiyush MakhijaMachine Learning EngineerVahan Inc
schedule 1 year agoSold Out!
Logistics companies, both old and new, have invested heavily in building an efficient frontline workforce to provide swift and convenient services to their users. Timely delivery is often a critical deciding factor for the ever-impatient customers to choose service A over service B. Hence, operations/logistic team is the key enabler here.
The attrition rate in large frontline teams is high, close to 75 percent annually. Yet most companies have aggressive growth targets, necessitating recruitment of high volumes of workers constantly. High-growth companies in this domain like Zomato and Swiggy, grew by more than 50-60 percent by the end of 2018, recruited tens of thousands of delivery boys every month.
At Vahan, we have developed an AI-driven virtual assistant that helps logistics companies scale and automate their hiring process by leveraging the common addiction of messaging applications like WhatsApp and FB messenger.
In this talk, I will cover in detail how we developed a complete data collection and natural language processing pipeline for Indian languages and built a chatbot over Whatsapp which is currently connecting companies like Dunzo, Zomato, Swiggy & Rapido Express with potential frontline workers and fulfilling the hiring requirements of this industry in a scalable and autonomous fashion.
No more submissions exist.
No more submissions exist.