How we Effectively Scaled the Contact Insights Computation From 0 orgs to 20k orgs With our Spark Data Pipeline

In the world of active conversation across multiple sales reps and customers, there is always a case that a sales rep needs a quick introduction to kickstart their sales process. With millions of conversations going around across multiple user base, building activity graph is a time consuming operation. The scale for computation becomes harder when we need to consistently compute for 20k organizations, and keep the closest computations updated and better with latest conversations and newer relations. We are walk through with our initial approach of solving this harder scale problem, different approaches we choose and fail, and how we effectively scaled it up for growing number of orgs.

 
1 favorite thumb_down thumb_up 2 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Case Study

In order to understand our overall journey with building a new pipeline for modeling, generating, indexing and surfacing the Contact insights through a graph model, I broke into 3 primary topic and here are the sub-sections for each topic:

- Contact Insights Context (7-10 mins)

- How important is to pair AI with context
- Consumer vs Enterprise context w.r.t Contact insights

- Why using a graph to model context (10 mins)

- Why we chose a graph to encode relationships
- Architecture of our pipeline
- Key decisions with choosing a relevant technology

- Key problems solved and lessons learned (20-25 mins)
I will start digging into various Spark pipeline issues such as

- Memory
- Tuning the Buckets/Cores
- Partitioning
- Handling multiple file sizes
- Scaling issues

Learning Outcome

We will demonstrate various challenges that we ran into while building data pipelines, especially with Spark jobs. Audience will takeaway about the best practices of designing, building, tuning, scaling and maintaining data (Spark) compute jobs.

Target Audience

Data Engineers intended to use data science models to building prediction insights

schedule Submitted 4 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Deepti Tomar
    By Deepti Tomar  ~  3 months ago
    reply Reply

    Hello Praveen,

    Thanks for your submission! 

    Request you to define the outline/structure with the topics/subsections along with their time-break up (on the proposal).

    Thanks,

    Deepti

    • Praveen Innamuri
      By Praveen Innamuri  ~  3 months ago
      reply Reply

      Deepti, I've updated the outline/structure and their sub-sections with time breakups. Please let me know if you need any details further. Thanks!