A Retrospective on Building and Running Atlassian’s Data Lake
Atlassian’s strives to be a data driven company that builds collaborative software for teams. Two and a half years ago we launched our analytics platform (i.e. data lake) on AWS which is used by over 1500 internal users each month to gain insights and enforce decisions.
In this talk we will present a retrospective on our analytics platform covering our blessings (what went well) and our mistakes (what could have been better) as well as talking about what potential next steps we might take to further improve the platform.
This talk will cover both the technical aspect (i.e. architectural choices) as well as non-technical aspect (i.e. team organisation, our principals and mandate).
Outline/Structure of the Case Study
-
Introduction (~5 minutes)
-
What is Atlassian
-
Atlassian’s Analytics Platform
-
Some statistics (# of users, size of lake)
-
Brief architecture overview
-
Current use-cases and history of the lake
-
Retrospective (~20 minutes)
-
What went (or is working) well
-
What could be better and what are we doing to rectify
-
Closing Thoughts and Key Take Aways (~1 minute)
-
Q&A ( remaining time)
Learning Outcome
-
Attendees will have a better sense of what to avoid in terms of architectural and non-technical choices.
-
Attendees will have some tips on how to make their analytics platforms more successful in their organisations
Target Audience
Data Architects / Data Engineering Leads
Prerequisites for Attendees
Basic understanding of cloud data concepts (e.g. AWS S3, Kinesis, EMR).
Links
I’ve talked about Atlassian’s analytics platform at Amazon Re:Invent 2017 in Las Vegas:
https://youtu.be/0vdW1ORLWyk?t=21m30s
This was also picked up and run as an article by ITnews:
https://www.itnews.com.au/news/atlassian-lifts-lid-on-500tb-data-lake-478894