Apache Spark for Machine Learning on Large Data Sets

Apache Spark is a general purpose distributed computing framework for distributed data processing. With MLlib, Spark's machine learning library, fitting a model to a huge data set becomes very easy. Similarly, Spark's general purpose functionality enables application of a model across a large collection of observations. We'll walk through fitting a model to a big data set using MLlib and applying a trained scikit-learn model to a large data set.

 
1 favorite thumb_down thumb_up 1 comment visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/structure of the Session

...

Learning Outcome

...

Target Audience

TBA

schedule Submitted 2 weeks ago

Comments Subscribe to Comments

comment Comment on this Proposal
  • Josh Graham
    By Josh Graham  ~  1 week ago
    reply Reply

    G'day Juliet, if we can see some sort of draft outline or even topic synopsis this week, that would be awesome.