Traversing the graph computing and database ecosystem
Graphs have long held a special place in computer science’s history (and codebases). We're seeing the advent of a new wave of the information age; an age that is characterized by great emphasis on linked data. Hence, graph computing and databases have risen to prominence rapidly over the last few years. Be it enterprise knowledge graphs, fraud detection or graph-based social media analytics, there are a great number of potential applications.
To reap the benefits of graph databases and computing, one needs to understand the basics as well as current technical landscape and offerings. Equally important is to understand if a graph-based approach suits your problem.
These realizations are a result of my involvement in an effort to build an enterprise knowledge graph platform. I also believe that graph computing is more than a niche technology and has potential for organizations of varying scale.
Now, I want to share my learning with you.
This talk will touch upon the above points with the general premise being that data structured as graph(s) can lead to improved data workflows.
During our journey, you will learn fundamentals of graph technology and witness a live demo using Neo4j, a popular property graph database. We will walk through a day in the life of data workers (engineers, scientists, analysts), the challenges that they face and how graph-based approaches result in elegant solutions.
We'll end our journey with a peek into the current graph ecosystem and high-level concepts that need to be kept in mind while adopting an offering.
Outline/Structure of the Talk
- Graphs: an introduction
- Rich history and real world-relevance across domains
- Recent surge in graph-based approaches
- Life in the day of a data worker ~ challenges of data discovery and integration ~ curse of many-to-many relationships and JOINs
- Graphs to the rescue!
- Potential use-cases in data science workflows (social networks, finance, compliance, recommendation systems)
- Current ecosystem ~ property graphs and triple stores ~ Neo4j, Apache Tinkerpop, AWS Neptune, Dgraph, Allegrograph etc.
- Tale of query languages ~ SQL for graphs ~ Cypher and Gremlin
- Technical challenges: story of trade-offs
- Demo
Learning Outcome
By the end of this talk, you will:
- know why graphs are awesome!
- understand real-world application scenarios
- have a sense about the current open-source and cloud graph DB landscape
- appreciate challenges to implementation
- know the basics of neo4j, a popular property graph DB
- most importantly, be able to identify if a graph-based approach will suit your problem and start critiquing it
Target Audience
Data scientists, engineers, DBAs, CIOs
Prerequisites for Attendees
There are no hard pre-requisites.
Soft pre-requisites include:
- High-level familiarity with SQL or NoSQL databases, and associated concepts
- Familiarity with data science workflows.
Video
Links
Some of the topics that I've presented in the past include:
- Kubeflow: portable and scalable ML using Jupyterhub and Kubernetes - https://speakerdeck.com/analyticalmonk/kubeflow-portable-and-scalable-machine-learning-using-jupyterhub-and-kubernetes-pydata-delhi-2018
- Graph databases and analysis using Neo4j - https://speakerdeck.com/analyticalmonk/graph-databases-and-analysis-using-neo4j
- Introduction to Tensorflow's high-level API - https://speakerdeck.com/analyticalmonk/o-extended-2018
schedule Submitted 3 years ago
People who liked this proposal, also liked:
-
keyboard_arrow_down
Dipanjan Sarkar - Explainable Artificial Intelligence - Demystifying the Hype
45 Mins
Tutorial
Intermediate
The field of Artificial Intelligence powered by Machine Learning and Deep Learning has gone through some phenomenal changes over the last decade. Starting off as just a pure academic and research-oriented domain, we have seen widespread industry adoption across diverse domains including retail, technology, healthcare, science and many more. More than often, the standard toolbox of machine learning, statistical or deep learning models remain the same. New models do come into existence like Capsule Networks, but industry adoption of the same usually takes several years. Hence, in the industry, the main focus of data science or machine learning is more ‘applied’ rather than theoretical and effective application of these models on the right data to solve complex real-world problems is of paramount importance.
A machine learning or deep learning model by itself consists of an algorithm which tries to learn latent patterns and relationships from data without hard-coding fixed rules. Hence, explaining how a model works to the business always poses its own set of challenges. There are some domains in the industry especially in the world of finance like insurance or banking where data scientists often end up having to use more traditional machine learning models (linear or tree-based). The reason being that model interpretability is very important for the business to explain each and every decision being taken by the model.However, this often leads to a sacrifice in performance. This is where complex models like ensembles and neural networks typically give us better and more accurate performance (since true relationships are rarely linear in nature).We, however, end up being unable to have proper interpretations for model decisions.
To address and talk about these gaps, I will take a conceptual yet hands-on approach where we will explore some of these challenges in-depth about explainable artificial intelligence (XAI) and human interpretable machine learning and even showcase with some examples using state-of-the-art model interpretation frameworks in Python!
-
keyboard_arrow_down
Subhasish Misra - Causal data science: Answering the crucial ‘why’ in your analysis.
45 Mins
Talk
Intermediate
Causal questions are ubiquitous in data science. For e.g. questions such as, did changing a feature in a website lead to more traffic or if digital ad exposure led to incremental purchase are deeply rooted in causality.
Randomized tests are considered to be the gold standard when it comes to getting to causal effects. However, experiments in many cases are unfeasible or unethical. In such cases one has to rely on observational (non-experimental) data to derive causal insights. The crucial difference between randomized experiments and observational data is that in the former, test subjects (e.g. customers) are randomly assigned a treatment (e.g. digital advertisement exposure). This helps curb the possibility that user response (e.g. clicking on a link in the ad and purchasing the product) across the two groups of treated and non-treated subjects is different owing to pre-existing differences in user characteristic (e.g. demographics, geo-location etc.). In essence, we can then attribute divergences observed post-treatment in key outcomes (e.g. purchase rate), as the causal impact of the treatment.
This treatment assignment mechanism that makes causal attribution possible via randomization is absent though when using observational data. Thankfully, there are scientific (statistical and beyond) techniques available to ensure that we are able to circumvent this shortcoming and get to causal reads.
The aim of this talk, will be to offer a practical overview of the above aspects of causal inference -which in turn as a discipline lies at the fascinating confluence of statistics, philosophy, computer science, psychology, economics, and medicine, among others. Topics include:
- The fundamental tenets of causality and measuring causal effects.
- Challenges involved in measuring causal effects in real world situations.
- Distinguishing between randomized and observational approaches to measuring the same.
- Provide an introduction to measuring causal effects using observational data using matching and its extension of propensity score based matching with a focus on the a) the intuition and statistics behind it b) Tips from the trenches, basis the speakers experience in these techniques and c) Practical limitations of such approaches
- Walk through an example of how matching was applied to get to causal insights regarding effectiveness of a digital product for a major retailer.
- Finally conclude with why understanding having a nuanced understanding of causality is all the more important in the big data era we are into.
-
keyboard_arrow_down
Johnu George / Ramdoot Kumar P - A Scalable Hyperparameter Optimization framework for ML workloads
Johnu GeorgeTechnical LeadCisco SystemsRamdoot Kumar PTECHNICAL LEADCisco Systemsschedule 3 years ago
20 Mins
Demonstration
Intermediate
In machine learning, hyperparameters are parameters that governs the training process itself. For example, learning rate, number of hidden layers, number of nodes per layer are typical hyperparameters for neural networks. Hyperparameter Tuning is the process of searching the best hyper parameters to initialize the learning algorithm, thus improving training performance.
We present Katib, a scalable and general hyper parameter tuning framework based on Kubernetes which is ML framework agnostic (Tensorflow, Pytorch, MXNet, XGboost etc). You will learn about Katib in Kubeflow, an open source ML toolkit for Kubernetes, as we demonstrate the advantages of hyperparameter optimization by running a sample classification problem. In addition, as we dive into the implementation details, you will learn how to contribute as we expand this platform to include autoML tools.
-
keyboard_arrow_down
Ishita Mathur - How GO-FOOD built a Query Semantics Engine to help you find food faster
45 Mins
Case Study
Beginner
Context: The Search problem
GOJEK is a SuperApp: 19+ apps within an umbrella app. One of these is GO-FOOD, the first food delivery service in Indonesia and the largest food delivery service in Southeast Asia. There are over 300 thousand restaurants on the platform with a total of over 16 million dishes between them.
Over two-thirds of those who order food online using GO-FOOD do so by utilising text search. Search engines are so essential to our everyday digital experience that we don’t think twice when using them anymore. Search engines involve two primary tasks: retrieval of documents and ranking them in order of relevance. While improving that ranking is an extremely important part of improving the search experience, actually understanding that query helps give the searcher exactly what they’re looking for. This talk will show you what we are doing to make it easy for users to find what they want.
GO-FOOD uses the ElasticSearch stack with restaurant and dish indexes to search for what the user types. However, this results in only exact text matches and at most, fuzzy matches. We wanted to create a holistic search experience that not only personalised search results, but also retrieved restaurants and dishes that were more relevant to what the user was looking for. This is being done by not only taking advantage of ElasticSearch features, but also developing a Query semantics engine.
Query Understanding: What & Why
This is where Query Understanding comes into the picture: it’s about using NLP to correctly identify the search intent behind the query and return more relevant search results, it’s about the interpretation process even before the results are even retrieved and ranked. The semantic neighbours of the query itself become the focus of the search process: after all, if I don’t understand what you’re trying to ask for, how will I give you what you want?
In the duration of this talk, you will learn about how we are taking advantage of word embeddings to build a Query Understanding Engine that is holistically designed to make the customer’s experience as smooth as possible. I will go over the techniques we used to build each component of the engine, the data and algorithmic challenges we faced and how we solved each problem we came across.
-
keyboard_arrow_down
Sujoy Roychowdhury - Building Multimodal Deep learning recommendation Systems
20 Mins
Talk
Intermediate
Recommendation systems aid in consumer decision making processes
like what to buy, which books to read or movies to watch.
Recommendation systems are specially useful in e-commerce websites
where a user has to navigate through several hundred items
in order to get to what they’re looking for . The data on how users
interact with the systems can be used to analyze user behaviour and
make recommendations that are in line with users’ preferences of
certain item attributes over others. Collaborative filtering has, until
recently, been able to achieve personalization through user based
and item based collaborative filtering techniques. Recent advances
in the application of Deep Learning in research as well as industry
has led people to apply these techniques in recommendation systems.
Many recommendation systems use product features for recommendations.
However textual features available on products are
almost invariably incomplete in real-world datasets due to various
process related issues. Additionally, product features even when
available cannot describe completely a certain feature. These limit
the success of such recommendation techniques. Deep learning
systems can process multi-modal data like text, images, audio and
thus is our choice in implementing multi-modal recommendation
system.
In this talk we show a real-world application of a fashion recommendation
system. This is based on a multi-modal deep learning system which is able to address the problem of poor annotation in the product data. We evaluate different deep learning architectures
to process multi-modal data and compare their effectiveness. We
highlight the trade-offs seen in a real-world implementation and
how these trade-offs affect the actual choice of the architecture. -
keyboard_arrow_down
Venkata Pingali - Accelerating ML using Production Feature Engineering Platform
45 Mins
Talk
Intermediate
Anecdotally only 2% of the models developed are productionized, i.e., used day to day to improve business outcomes. Part of the reason is the high cost and complexity of productionization of models. It is estimated to be anywhere from 40 to 80% of the overall work.
In this talk, we will share Scribble Data’s insights into productionization of ML, and how to reduce the cost and complexity in organizations. It is based on the last two years of work at Scribble developing and deploying production ML Feature Engineering Platform, and study of platforms from major organizations such as Uber. This talk expands on a previous talk given in January.
First, we discuss the complexity of production ML systems, and where time and effort goes. Second, we give an overview of feature engineering, which is an expensive ML task, and the associated challenges Third, we suggest an architecture for Production Feature Engineering platform. Last, we discuss how one could go about building one for your organization
-
keyboard_arrow_down
Ashay Tamhane - Modeling Contextual Changes In User Behaviour In Fashion e-commerce
20 Mins
Talk
Intermediate
Impulse purchases are quite frequent in fashion e-commerce; browse patterns indicate fluid context changes across diverse product types probably due to the lack of a well-defined need at the consumer’s end. Data from fashion e-commerce portal indicate that the final product a person ends-up purchasing is often very different from the initial product he/she started the session with. We refer to this characteristic as a ‘context change’. This feature of fashion e-commerce makes understanding and predicting user behaviour quite challenging. Our work attempts to model this characteristic so as to both detect and preempt context changes. Our approach employs a deep Gated Recurrent Unit (GRU) over clickstream data. We show that this model captures context changes better than other non-sequential baseline models.
-
keyboard_arrow_down
Venkatraman J - Entity Co-occurence and Entity Reputation scoring from Unstructured data using Semantic Knowledge graph
20 Mins
Talk
Intermediate
Knowledge representation has been a research for many years in AI world and its continuing further too. Once knowledge is represented, reasoning from that extracted knowledge is done by various inferencing techniques. Initial knowledge bases were built using rules from domain experts and different inferencing techniques like Fuzzy inference, Bayesian inference were applied to extract reasoning from those knowledge bases. Semantic networks is another form of knowledge representation which can represent structured data like WordNet, DBpedia which solves problems in a specific domain by storing entities and relations among entities using onotologies.
Knowledge graph is another representation technique deeply researched in academia as well as used by businesses in production to augment search relevancy in information retrieval(Google knowledgegraph), improve recommender systems, semantic search applications and also Question answering problems.In this talk i will illustrate the benefits of semantic knowledge graph, how it differs from Semantic ontologies, different technologies involved in building knowledge graph, how i built one to analyse unstructured (twitter data) to discover hidden relationships from the twitter corpus. I will also show how Knowledge graph is data scientist's tool kit to discover hidden relationships and insights from unstructured data quickly.
In this talk i will show the technology and architecture used to determine entity reputation and entity co-occurence using Knowledge graph.Scoring an entity for reputation is useful in many Natural language processing tasks and applications such as Recommender systems.
-
keyboard_arrow_down
Shrutika Poyrekar / kiran karkera / Usha Rengaraju - Introduction to Bayesian Networks
Shrutika PoyrekarData SientistEnvestnet | Yodleekiran karkeraData scientistDex.sgUsha RengarajuPrincipal Data ScientistMysuru Consulting Groupschedule 3 years ago
90 Mins
Workshop
Advanced
{ This is a handson workshop . The use case is Traffic analysis . }
Most machine learning models assume independent and identically distributed (i.i.d) data. Graphical models can capture almost arbitrarily rich dependency structures between variables. They encode conditional independence structure with graphs. Bayesian network, a type of graphical model describes a probability distribution among all variables by putting edges between the variable nodes, wherein edges represent the conditional probability factor in the factorized probability distribution. Thus Bayesian Networks provide a compact representation for dealing with uncertainty using an underlying graphical structure and the probability theory. These models have a variety of applications such as medical diagnosis, biomonitoring, image processing, turbo codes, information retrieval, document classification, gene regulatory networks, etc. amongst many others. These models are interpretable as they are able to capture the causal relationships between different features .They can work efficiently with small data and also deal with missing data which gives it more power than conventional machine learning and deep learning models.
In this session, we will discuss concepts of conditional independence, d- separation , Hammersley Clifford theorem , Bayes theorem, Expectation Maximization and Variable Elimination. There will be a code walk through of simple case study.
-
keyboard_arrow_down
AbdulMajedRaja - Become Language Agnostic by Combining the Power of R with Python using Reticulate
45 Mins
Tutorial
Intermediate
Language Wars have always been there for ages and it's got a new candidate with Data science booming - R vs Python. While the fans are fighting R vs Python, the creators (Hadley Wickham (Chief DS @ RStudio) and Wes McKinney (Creator of Pandas Project)) are working together as Ursa Labs team to create open source data science tools. A similar effort by RStudio has given birth to Reticulate (R Interface to Python) that helps programmers combine R and Python in the same code, session and project and create a new kind of super hero.
-
keyboard_arrow_down
Kshitij Srivastava / Manikant Prasad - Data Science in Containers
Kshitij SrivastavaData ScientistMillimanManikant PrasadData EngineerMillimanschedule 3 years ago
45 Mins
Case Study
Beginner
Containers are all the rage in the DevOps arena.
This session is a live demonstration of how the data team at Milliman uses containers at each step in their data science workflow -
1) How do containerized environments speed up data scientists at the data exploration stage
2) How do containers enable rapid prototyping and validation at the modeling stage
3) How do we put containerized models on production
4) How do containers make it easy for data scientists to do DevOps
5) How do containers make it easy for data scientists to host a data science dashboard with continuous integration and continuous delivery
-
keyboard_arrow_down
Ravi Ranjan - Machine Learning Model Management with MLflow
45 Mins
Talk
Intermediate
Background
Data is the new oil and its size is growing exponentially day by day. Most of the companies are leveraging data science capabilities extensively to affect business decisions, perform audits on ML patterns, decode faults in business logic, and more. They run large number of machine learning model to produce results.
Problem Statement
Managing ML models in production is non-trivial. The training, maintenance, deployment, monitoring, organization and documentation of machine learning (ML) models – in short model management – is a critical task in virtually all production ML use cases. Wrong model management decisions can lead to poor performance of a ML system and can result in high maintenance cost and less effective utilization. Below are the key concern for model management:
- Computational challenges: machine learning model definition and validation, decisions on model retraining, adversarial settings.
- Data management challenges: lack of a declarative abstraction for the whole ML pipeline, querying model metadata, model interpretation.
- Engineering challenges: multiple tools and frameworks make integration complex, heterogeneous skill level of users, backwards compatibility of trained Models and hard to reproduce the training result.
Existing Solution
There are custom ML platform to address the above concerns such as FBLearner by Facebook and Michelangelo by Uber but they have their own limitations like:
- They standardize the data preparation, training and deployment loop specific to particular platform and business needs.
- They are limited to a few algorithms and frameworks.
- They tied to one company infrastructure and hard to open source.
Why MLflow?
Databricks team found above concerns as their motivation to develop MLflow as an open source and cloud agnostic machine learning model management platform. Benefits of MLflow from machine learning model management:
- Works with any ML library and language.
- They are platform independent i.e. ML models run in same way anywhere example local system or any cloud platform.
- Designed to be useful for 1 or 10000 person organisation.
-
keyboard_arrow_down
Dr. Neha Sehgal - Open Data Science for Smart Manufacturing
45 Mins
Talk
Intermediate
Open Data offers a tremendous opportunity in transformation of today’s manufacturing sector to smarter manufacturing. Smart Manufacturing initiatives include digitalising production processes and integrating IoT technologies for connecting machines to collect data for analysis and visualisation.
In this talk, an understanding of linkage between various industries within manufacturing sector through lens of Open Data Science will be illustrated. The data on manufacturing sector companies, company profiles, officers and financials will be scraped from UK Open Data API’s. The work I plan to showcase in ODSC is part of UK Made Smarter Project, where the work has been useful for major aerospace alliances to find out the champions and strugglers (SMEs) within manufacturing sector based on the open data gathered from multiple sources. The talk includes discussion on data extraction, data cleaning, data transformation - transforming raw financial information about companies to key metrics of interest - and further data analytics to create clusters of manufacturing companies into "Champions" and "Strugglers". The talk showcased examples of powerful R Shiny based dashboards of interest for suppliers, manufacturer and other key stakeholders in supply chain network.
Further analysis includes network analysis for industries, clustering and deploying the model as an API using Google Cloud Platform. The presenter will discuss about the necessity of 'Analytical Thinking' approach as an aid to handle complex big data projects and how to overcome challenges while working with real-life data science projects.
-
keyboard_arrow_down
Krishna Sangeeth - The last mile problem in ML
45 Mins
Talk
Intermediate
“We have built a machine learning model, What next?”
There is quite a bit of journey that one needs to cover from building a model in Jupyter notebook to taking it to production.
I would like to call it as the “last mile problem in ML” , this last mile could be a simple tread if we embrace some good ideas.This talk covers some of these opinionated ideas on how we can get around some of the pitfalls in deployment of ML models in production.
We would go over the below questions in detail think about solutions for them.
- How to fix the zombie models apocalypse, a state when nobody knows how the model was trained ?
- In Science, experiments are found to be valid only if they are reproducible. Should this be the case in Datascience as well ?
- Training the model in your local machine and waiting for an eternity to complete is no fun. What are some better ways of doing this ?
- How do you package your machine learning code in a robust manner?
- Does an ML project have the luxury of not following good Software Engineering principles?
-
keyboard_arrow_down
Pallavi Mudumby - B2B Recommender System using Semantic knowledge - Ontology
45 Mins
Case Study
Intermediate
In this era of big data , Recommender systems are becoming increasingly important for businesses because they can help companies offer personalized product recommendations to customers. There have been many acknowledged recognized successes of consumer-oriented recommender systems, particularly in e-commerce. However, when it comes to Business to-Business (B2B) market space, there has been less research and real-time application of such systems.
In our case study, we present a hybrid approach of building a context-sensitive recommender system incorporating semantic knowledge in the form of domain ontology and a custom user- user collaborative filtering model in a B2B space. Using Engineering Products transaction data of an Instrumentation company, we demonstrate that this recommendation algorithm offers improved personalization, diversity and cold start performance compared to standard Collaborative Filtering based recommender system.
-
keyboard_arrow_down
Pushker Ravindra - Data Science Best Practices for R and Python
20 Mins
Talk
Intermediate
How many times did you feel that you were not able to understand someone else’s code or sometimes not even your own? It’s mostly because of bad/no documentation and not following the best practices. Here I will be demonstrating some of the best practices in Data Science, for R and Python, the two most important programming languages in the world for Data Science, which would help in building sustainable data products.
- Integrated Development Environment (RStudio, PyCharm)
- Coding best practices (Google’s R Style Guide and Hadley’s Style Guide, PEP 8)
- Linter (lintR, Pylint)
- Documentation – Code (Roxygen2, reStructuredText), README/Instruction Manual (RMarkdown, Jupyter Notebook)
- Unit testing (testthat, unittest)
- Packaging
- Version control (Git)
These best practices reduce technical debt in long term significantly, foster more collaboration and promote building of more sustainable data products in any organization.
-
keyboard_arrow_down
Apoorv - Combining Data, Tech and Social Science to understand the Indian Judiciary
45 Mins
Case Study
Beginner
Context:
The judicial system in India is an interconnected web of court complexes and establishments. It's a multi-tier system of 674 District courts, 25 State High courts, and the Supreme Court - all working together to bring justice to the 1.3BN citizens of this country. Information around new case registrations, pending and disposed case details creates a massive data pool of legal data
Challenge:
Data management, standardization and accessibility are some huge challenges and rarely there are cases where people cite these platforms for conducting important legal research on important topics of Case pendency and Case-law analysis, etc. This coupled with the stories of judicial corruption in the media has fueled a low level of trust in the Collegium. Current research is highly fragmented and is powered by data provided by some closed source tools which makes it extremely difficult to validate and conduct reproducible research
Solution:
We envisage an ‘Open Judicial Data Platform’ that makes it easy for researchers to get access to a range of information - making it possible to research about the oldest case while still accessing the latest court judgements - and takes the burden of data cleaning off their shoulders, thereby ensuring that they spend their time building the narrative. By building data tools on top of this data, we close the information loop by making it easier to digest these research pieces by other stakeholders, eventually increasing their scope to participate in the legal process.
Showcase:
I would like to share some insights:
- On the process of creating this platform with our partners including legal researchers, lawyers and data scientists
- overcoming the barrier of understanding the legal space
- handling data and tech challenges using open tools and frictionless data packages
- and making the platform available to a diverse set of user classes.
As one of the use cases of the platform, I would also like to demonstrate a case study where we used open source entity recognition tools such as Spacy on the text of legal judgements to understand the juvenile justice activity in the country
-
keyboard_arrow_down
Vidhya Veeraraghavan - Story Teller - Analytics in Banking & Financial Sector
Vidhya VeeraraghavanAssociate Vice President - AnalyticsStandard Chartered Bankschedule 3 years ago
45 Mins
Case Study
Beginner
As kids, we always enjoyed stories. Some scary, some holy, some imbibing moral values & some just for fun.
Analytics is fun when you approach it with passion and curiosity. I know this because I have done this. With few case studies, I wish to illuminate your wits about Analytics and how it is being actively used in Banking and Financial Sector.
Come join me for a fun ride.
-
keyboard_arrow_down
Maulik Soneji / Jewel James - Using ML for Personalizing Food Search
45 Mins
Talk
Beginner
GoFood, the food delivery product of Gojek is one of the largest of its kind in the world. This talk summarizes the approaches considered and lessons learnt during the design and successful experimentation of a search system that uses ML to personalize the restaurant results based on the user’s food and taste preferences .
We formulated the estimation of the relevance as a Learning To Rank ML problem which makes the task of performing the ML inference for a very large number of customer-merchant pairs the next hurdle.
The talk will cover our learnings and findings for the following:
a. Creating a Learning Model for Food Search
b. Targetting experiments to a certain percentage of users
c. Training the model from real time data
d. Enriching Restaurant data with custom tagsOur story should help the audience in making design decisions on the data pipelines and software architecture needed when using ML for relevance ranking in high throughput search systems.
-
keyboard_arrow_down
Kumar Nityan Suman - Beating BERT at NER For E-Commerce Products
45 Mins
Tutorial
Intermediate
Natural Language Processing is a messy and complicated affair but modern advanced techniques are offering increasingly impressive results. Embeddings are a modern machine learning technique that has taken the natural language processing world by storm.
This hands-on tutorial will showcase the advantage of learning custom Word and Character Embeddings for natural language problems over pre-trained vectors like ELMo and BERT using a Named Entity Recognition case study over e-commerce data.