Using Open Data to Predict Market Movements

schedule Aug 31st 02:55 PM - 03:15 PM place Jupiter people 60 Interested

As companies progress on their digital transformation journeys, technology becomes a strategic business decision. In this realm, consulting firms such as Gartner exert tremendous influence on technology purchasing decisions. The ability of these firms to predict the movement of market players will provide vendors with competitive benefits.

We will explore how, with the use of publicly available data sources, IT industry trends can be mimicked and predicted.

Big Data enthusiasts learned quickly that there are caveats to making Big Data useful:

  • Data source availability
  • Producing meaningful insights from publicly available sources

Working with large data sets that are frequently changing can become expensive and frustrating. The learning curve is steep and discovery process long. Challenges range from selection of efficient tools to parse unstructured data, to development of a vision for interpreting and utilizing the data for competitive advantages.

We will describe how the archive of billions of web pages, captured monthly since 2008 and available for free analysis on AWS, can be used to mimic and predict trends reflected in industry-standard consulting reports.

There could be potential opportunity in this process to apply machine learning to tune the models and to self-learn so they can optimize automatically. There are over 70 topic area reports that Gartner publishes. Having an automated tool that can analyze across all of those topic areas to help us quickly understand major trends across today’s landscape and plan for those to come would be invaluable to many organizations.

 
2 favorite thumb_down thumb_up 2 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/structure of the Session

The session will focus on a case study in which we will talk about how open data (i.e. publicly available data) can be used to gain insights on market movements, which technologies are gaining traction and which are not and predict the movement of vendors in Gartner's Magic Quadrant reports. We will also talk about how this project was implemented from downloading indexes of web crawl data, performing text analysis on distributed computing clusters to visualizations. We will cover 3 use cases and demonstrate our findings.

Learning Outcome

The audience will learn how simple text analysis in conjunction with open data can be used to uncover the hidden trends and correlations and can provide with competitive advantage. The audience will also learn the tools, technologies and techniques used in this project

Target Audience

Data Scientists, Data Engineers, Software Engineers, Data Enthusiasts

Prerequisite

A general knowledge of web crawler, indexes, AWS core services like S3, EMR, text analytics, python/scala, working with Zeppelin notebook

schedule Submitted 6 months ago

Comments Subscribe to Comments

comment Comment on this Submission
  • Naresh Jain
    By Naresh Jain  ~  5 months ago
    reply Reply

    Asha, thanks for proposing this topic. We understand that you cannot share the paper publicly, however, would you be able to share it privately with the program committee for their review? If yes, please email it to indianteam@odsc.com.

    Also for the program committee to get more confidence in your expertise, can you please share links to past video presentation and/or articles you've written on this topic?

    • Asha Saini
      By Asha Saini  ~  5 months ago
      reply Reply

      Thanks for the ask Naresh. Emailed indiateam@odsc.com Please let me know if you need anything else. 

       

       


  • Liked Dr. Tom Starke
    keyboard_arrow_down

    Dr. Tom Starke - Intelligent Autonomous Trading Systems - Are We There Yet?

    Dr. Tom Starke
    Dr. Tom Starke
    CEO
    AAAQuants
    schedule 4 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    Over the last two decades, trading has seen a remarkable evolution from open-outcry in the Wall Street pits to screen trading all the way to current automation and high-frequency trading (HFT). The success of machine learning and artificial intelligence (AI) seems like natural progression for the evolution of trading. However, unlike other fields of AI, trading has some domain specific problems that project the dream of set-it-and-forget-it money making machines still some way in the future. This talk will describe the current challenges for intelligent autonomous trading systems and provides some practical examples where machine learning is already being used in financial applications.

  • Liked Vincenzo Tursi
    keyboard_arrow_down

    Vincenzo Tursi - Puzzling Together a Teacher-Bot: Machine Learning, NLP, Active Learning, and Microservices

    Vincenzo Tursi
    Vincenzo Tursi
    Data Scientist
    KNIME
    schedule 4 months ago
    Sold Out!
    45 Mins
    Demonstration
    Beginner

    Hi! My name is Emil and I am a Teacher Bot. I was built to answer your initial questions about using KNIME Analytics Platform. Well, actually, I was built to point you to the right training materials to answer your questions about KNIME.

    Puzzling together all the pieces to implement me wasn't that difficult. All you need are:

    • A user interface - web or speech based - for you to ask questions
    • A text parser for me to understand
    • A brain to find the right training materials to answer your question
    • A user interface to send you the answer
    • A feedback option - nice to have but not a must - on whether my answer was of any help

    The most complex part was, of course, my brain. Building my brain required: a clear definition of the problem, a labeled data set, a class ontology, and finally the training of a machine learning model. The labeled data set in particular was lacking. So, we relied on active learning to incrementally make my brain smarter over time. Some parts of the final architecture, such as understanding and resource searching, were deployed as microservices.

  • Liked Atin Ghosh
    keyboard_arrow_down

    Atin Ghosh - AR-MDN - Associative and Recurrent Mixture Density Network for e-Retail Demand Forecasting

    45 Mins
    Case Study
    Intermediate

    Accurate demand forecasts can help on-line retail organizations better plan their supply-chain processes. The chal- lenge, however, is the large number of associative factors that result in large, non-stationary shifts in demand, which traditional time series and regression approaches fail to model. In this paper, we propose a Neural Network architecture called AR-MDN, that simultaneously models associative fac- tors, time-series trends and the variance in the demand. We first identify several causal features and use a combination of feature embeddings, MLP and LSTM to represent them. We then model the output density as a learned mixture of Gaussian distributions. The AR-MDN can be trained end-to-end without the need for additional supervision. We experiment on a dataset of an year’s worth of data over tens-of-thousands of products from Flipkart. The proposed architecture yields a significant improvement in forecasting accuracy when compared with existing alternatives.

  • Liked Anand Chitipothu
    keyboard_arrow_down

    Anand Chitipothu - DevOps for Data Science: Experiences from building a cloud-based data science platform

    Anand Chitipothu
    Anand Chitipothu
    Co-Founder
    Rorodata
    schedule 4 months ago
    Sold Out!
    45 Mins
    Case Study
    Beginner

    Productionizing data science applications is non trivial. Non optimal practices, the people-heavy way of the traditional approaches, the developers love for complex solutions for the sake of using cool technologies makes the situation even worse.

    There are two key ingredients required to streamline this: “the cloud” and “the right level of devops abstractions”.

    In this talk, I’ll share the experiences of building a cloud-based platform for streamlining data science and how such solutions can greatly simplify building and deploying data science and machine learning applications.

  • Liked Dr. Ravi Vijayaraghavan
    keyboard_arrow_down

    Dr. Ravi Vijayaraghavan / Dr. Sidharth Kumar - Analytics and Science for Customer Experience and Growth in E-commerce

    20 Mins
    Experience Report
    Advanced

    In our talk, we will cover the broad areas where Flipkart leverages Analytics and Sciences to drive both human and machine-driven decisions. We will go deeper into one use case related to pricing in e-commerce.

  • Liked Janakiram MSV
    keyboard_arrow_down

    Janakiram MSV - Accelerate Machine Learning Adoption with AutoML

    20 Mins
    Demonstration
    Beginner

    One emerging trend that's going to fundamentally change the face of ML is AutoML. It enables business analysts and developers to evolve machine learning models that can address complex scenarios. From platform companies such as Google and Microsoft to early-stage startups, AutoML is fast gaining traction. This session demonstrates how AutoML accelerates building machine learning models.