How we Effectively Scaled the Contact Insights Computation From 0 orgs to 20k orgs With our Spark Data Pipeline

In the world of active conversation across multiple sales reps and customers, there is always a case that a sales rep needs a quick introduction to kickstart their sales process. With millions of conversations going around across multiple user base, building activity graph is a time consuming operation. The scale for computation becomes harder when we need to consistently compute for 20k organizations, and keep the closest computations updated and better with latest conversations and newer relations. We are walk through with our initial approach of solving this harder scale problem, different approaches we choose and fail, and how we effectively scaled it up for growing number of orgs.

1 favorite thumb_down thumb_up 0 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist

Outline/Structure of the Case Study

Learning Outcome

We will demonstrate various challenges that we ran into while building data pipelines, especially with Spark jobs. Audience will takeaway about the best practices of designing, building, tuning, scaling and maintaining data (Spark) compute jobs.

Target Audience

Data Engineers intended to use data science models to building prediction insights

schedule Submitted 3 weeks ago

Public Feedback

comment Suggest improvements to the Speaker