Crunch Data and Deploy Serverless Architecture the Smart Way
The workshop will showcase how to perform machine learning analysis on notebooks, where the participants will be able to run their own Jupyter or Databricks notebook to find predictive features in a dataset with many columns. Furthermore, it will showcase how to deploy a serverless architecture using AWS CloudFormation template. The workshop also provides the opportunity to discuss differences in academic versus commercial data science.
Before the workshop:
For the workshop:
Outline/Structure of the Workshop
1) Generating Insights
- What is Data Science, Machine Learning and Analytics?
- Definitions
- Application cases
- Challenges
- Frameworks that help, e.g. Databricks
2) Generating Insights - hands on: running wide random forest on Databricks
VariantSpark on Databricks: https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1020355316241938/1261269838839355/5046708719373721/latest.html
- Where would you apply this to
- Scoping project ideas for your business
3) Going to production
- Infrastructure as Code
- Hypothesis-driven architecture: https://devops.com/devops-2-0-a-new-evolution-of-serverless-cloud-architecture/
4) Going to production - Demo : GT-Scan deploy
5) Going to production - Hands on
Cloudformation AWS template: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/GettingStarted.Walkthrough.html
6) Stacking data science services
- API, Serverless and "aaS" ecosystem
7) Stacking data science services - hands on:
GT-Scan jupyter notebook: https://github.com/BauerLab/GT-scan2-Notebooks
- https://<cfid>.execute-api.ap-southeast-2.amazonaws.com/prod/submit
- https://<cfid>.execute-api.ap-southeast-2.amazonaws.com/prod/results/gt5285823/
Learning Outcome
- Data driven insights essential for robust business decisions
- VariantSpark is RF implementation for 'wide' data
- Deploying infrastructure through IaC is easy and reliable and allows hypothesis-driven incremental changes to cloud infrastructure
- Data science modules can be "stacked" to easily reuse components for different questions
Target Audience
Data Scientists, Data Engineers, Data Specialists, Machine Learning Engineers, Data Science Enthusiasts
Prerequisites for Attendees
1) Make sure you have a free AWS account: https://portal.aws.amazon.com/billing/signup#/start
2) Make sure you have AWS cli installed on your machine: https://aws.amazon.com/cli/
3) Make sure you have a free Databricks account: https://databricks.com/try-databricks
4) Make sure you have Postman installed: https://chrome.google.com/webstore/detail/postman/fhbjgbiflinjbdggehcddcbncdddomop?hl=en
schedule Submitted 2 years ago
People who liked this proposal, also liked:
-
keyboard_arrow_down
Naresh Jain / Dr. Arun Verma / Dr. Denis Bauer / Favio Vázquez / Sheamus McGovern / Drs. Tarry Singh / Dr. Tom Starke / Dr. Veena Mendiratta - Unanswered Questions - Ask the Experts!
Naresh JainFounderXnsioDr. Arun VermaSr. Quantitative Researcher & Head of Quant Solutions TeamBloomberg L.P.Dr. Denis BauerTeam Leader Transformational BioinformaticsCSIROFavio VázquezSr. Data ScientistRaken Data GroupSheamus McGovernFounderOpen Data ScienceDrs. Tarry SinghCEO, Founder & AI Neuroscience Researcherdeepkapha.aiDr. Tom StarkeCEOAAAQuantsDr. Veena MendirattaResearch LeaderNokia Bell Labsschedule 2 years ago
45 Mins
Keynote
Beginner
Through the conference, we would have heard different speaker's perspective and experience with Data Science and AI. In this closing panel, we want to step back and look at any unanswered questions that the audience would have.