How we Effectively Scaled the Contact Insights Computation From 0 orgs to 20k orgs With our Spark Data Pipeline
In the world of active conversation across multiple sales reps and customers, there is always a case that a sales rep needs a quick introduction to kickstart their sales process. With millions of conversations going around across multiple user base, building activity graph is a time consuming operation. The scale for computation becomes harder when we need to consistently compute for 20k organizations, and keep the closest computations updated and better with latest conversations and newer relations. We are walk through with our initial approach of solving this harder scale problem, different approaches we choose and fail, and how we effectively scaled it up for growing number of orgs.
Outline/Structure of the Case Study
We will demonstrate various challenges that we ran into while building data pipelines, especially with Spark jobs. Audience will takeaway about the best practices of designing, building, tuning, scaling and maintaining data (Spark) compute jobs.
Data Engineers intended to use data science models to building prediction insights