Approximating time series data
IOT is ubiquitous now a days. With time series data pouring in from all possible sources, there is an increasing need to process the data at near real time. However, the transmission, storage and compute cost associated with this data is ever increasing.
Approximation of Time Series
In this talk, we will look at discrete transforms implemented using Haskell, combined with streaming libraries such as conduit/streamly. We will look at how we can use discrete wavelet transform to use with stream, where we do not know the size of the stream beforehand.
Then we will look at how this data can be used to create and update sketches for supporting aggregation query on the time series data. We conclude that using approximate time series is very beneficial in applications such as IOT analytics, finance etc. We will also see how a simple Haskell list based application can be scaled using conduit not only for ingesting data, but also for querying it dynamically.
Especially, we will look at CPU utilization of few machines and present the comparative study between query over raw CPU utilization data, and same queries over approximate time series.
Currently being studied at my organization, such functional application is found to be useful in representing data for quick analytics.
Outline/Structure of the Case Study
- Necessity of approximating time series
- Representing discrete wavelet transformation for multivariate signals in functional language (Haskell)
- Applying wavelet transforms for Haskell Lists and lifting it for conduit/streamly
- Creating functional sketches
- Using state monad and STM to update sketches and wavelet coefficients.
- Running query on live stream.
- Stream / reactive programming
- Working with conduit/streamly
- Signal processing in functional language
- Reducing signal size
People interested in applying FP in day-to-day life to solve interesting problems.
Prerequisites for Attendees
No formal knowledge of Haskell is necessary, though through simple List processing we will move to parallel high/low pass filter for time series data for fast processing.
It is assumed that the audience has some idea of FP, immutability.
Some math fundamentals (Linear Algebra, Convolution) will be very useful.