AIOps - Prediction of Critical Events
With the rise of cloud, distributed architectures, containers, and microservices, a rise in data overload is visible. With growing amounts of DevOps processes; alerts, repeated mundane jobs etc. have put new demands to both synthesize meaning from this influx of information and connect it to broader business objectives.
AIOps is the application of artificial intelligence for IT operations. AIOps uses machine learning and data science to give IT operations teams a real-time understanding of any issues affecting the availability or performance of the systems under their care. Rather than reacting to issues as they arise in the application environment, AIOps platforms allow IT operations teams to proactively manage performance challenges faster, and in real-time
This case study focuses on solving the following business needs:
1. With an ever-increasing rise in alerts, a large number of incidents were getting generated. There was a need to develop a framework that can generate correlations and identify correlated events, thereby reduce overall incidents volume.
2. For many incidents a reactive strategy does not work and can lead to a loss of reputation; there was a need to develop predictive capabilities that can detect anomalous events and predict critical events well in advance.
3. Given the pressures of reducing the Resolution time and short window of opportunity available to the analysts, there was a need to provide search capabilities so that the analysts can have a head start as to how similar incidents were solved in past.
Data from multiple systems sending alerts, including traditional IT monitoring, log events in text format, application and network performance data etc were made available for the PoC.
The solution framework developed had a discovery phase where the base data was visualized and explored, a NLP driven text mining layer where log data in text format was pre-processed, clustered and correlations were developed to identify related events using Machine Learning algorithms. Topic Mining was used to get a quick overview of a large number of event data. Next, a temporal mining layer explored the temporal relationship between nodes and cluster groups, necessary features were developed on top of the associations generated from temporal layers. Advanced Machine learning algorithms were then developed on these features to predict critical events almost 12 hours in advance. Last but not the least a search layer that computed the similarity of any incident with those in Service Now database was developed that provided analysts insights readily available information on similar incidents and how they were solved in past so that the analysts do not have to reinvent the wheel.
Outline/Structure of the Case Study
1. AIOps overview and overview of challenges in the space - 10 minutes
2. Problem Introduction and Business Needs -10 minutes
3.. Overview of the Analytical Framework adopted - 5 minutes
4.. Results of Data Discovery and Correlation of log events - 5 minutes
5. Results of Temporal Analysis and Feature Engineering - 5 minutes
6. Prediction of Critical Events and Search Capabilities - 5 minutes
7. Q &A - 5 minutes
Learning Outcome
Gain an understanding of AIOps and use cases that business is solving in them
Develop knowhow on how to process log event data for correlation of events and prediction of critical events
Target Audience
AI and Machine learning enthusiasts, DevOps , Developers
Prerequisites for Attendees
Familiarity with Machine Learning Algorithms
Video
Links
1. Failure Prediction in IBM BlueGene/ Event Logs
https://ieeexplore.ieee.org/document/4470294
2. A Statistical Machine Learning approach for Ticket Mining in IT Service Delivery
schedule Submitted 4 years ago
People who liked this proposal, also liked:
-
keyboard_arrow_down
Dr. Vikas Agrawal - Non-Stationary Time Series: Finding Relationships Between Changing Processes for Enterprise Prescriptive Systems
45 Mins
Talk
Intermediate
It is too tedious to keep on asking questions, seek explanations or set thresholds for trends or anomalies. Why not find problems before they happen, find explanations for the glitches and suggest shortest paths to fixing them? Businesses are always changing along with their competitive environment and processes. No static model can handle that. Using dynamic models that find time-delayed interactions between multiple time series, we need to make proactive forecasts of anomalous trends of risks and opportunities in operations, sales, revenue and personnel, based on multiple factors influencing each other over time. We need to know how to set what is “normal” and determine when the business processes from six months ago do not apply any more, or only applies to 35% of the cases today, while explaining the causes of risk and sources of opportunity, their relative directions and magnitude, in the context of the decision-making and transactional applications, using state-of-the-art techniques.
Real world processes and businesses keeps changing, with one moving part changing another over time. Can we capture these changing relationships? Can we use multiple variables to find risks on key interesting ones? We will take a fun journey culminating in the most recent developments in the field. What methods work well and which break? What can we use in practice?
For instance, we can show a CEO that they would miss their revenue target by over 6% for the quarter, and tell us why i.e. in what ways has their business changed over the last year. Then we provide the prioritized ordered lists of quickest, cheapest and least risky paths to help turn them over the tide, with estimates of relative costs and expected probability of success.
-
keyboard_arrow_down
Badri Narayanan Gopalakrishnan / Shalini Sinha / Usha Rengaraju - Lifting Up: How AI and Big data can contribute to anti-poverty programs
Badri Narayanan GopalakrishnanFounder and DirectorInfinite-Sum Modelling Inc.Shalini SinhaVP - Data ScienceMobileumUsha RengarajuPrincipal Data ScientistMysuru Consulting Groupschedule 4 years ago
45 Mins
Case Study
Intermediate
Ending poverty and zero hunger are top two goals United Nations aims to achieve by 2030 under its sustainable development program. Hunger and poverty are byproducts of multiple factors and fighting them require multi-fold effort from all stakeholders. Artificial Intelligence and Machine learning has transformed the way we live, work and interact. However economics of business has limited its application to few segments of the society. A much conscious effort is needed to bring the power of AI to the benefits of the ones who actually need it the most – people below the poverty line. Here we present our thoughts on how deep learning and big data analytics can be combined to enable effective implementation of anti-poverty programs. The advancements in deep learning , micro diagnostics combined with effective technology policy is the right recipe for a progressive growth of a nation. Deep learning can help identify poverty zones across the globe based on night time images where the level of light correlates to higher economic growth. Once the areas of lower economic growth are identified, geographic and demographic data can be combined to establish micro level diagnostics of these underdeveloped area. The insights from the data can help plan an effective intervention program. Machine Learning can be further used to identify potential donors, investors and contributors across the globe based on their skill-set, interest, history, ethnicity, purchasing power and their native connect to the location of the proposed program. Adequate resource allocation and efficient design of the program will also not guarantee success of a program unless the project execution is supervised at grass-root level. Data Analytics can be used to monitor project progress, effectiveness and detect anomaly in case of any fraud or mismanagement of funds.
-
keyboard_arrow_down
Juan Manuel Contreras - How to lead data science teams: The 3 D's of data science leadership
45 Mins
Talk
Advanced
Despite the increasing number of data scientists who are asked to take on leadership roles as they grow in their careers, there are still few resources on how to lead data science teams successfully.
In this talk, I will argue that an effective data science leader has to wear three hats: Diplomat (understand the organization and their team and liaise between them), Diagnostician (figure out how what organizational needs can be met by their team and how), and Developer (grow their and their team's skills as well as the organization's understanding of data science to maximize the value their team can drive).
Throughout, I draw on my experience as a data science leader both at a political party (the Democratic Party of the United States of America) and at a fintech startup (Even.com).
Talk attendees will learn a framework for how to manage data scientists and lead a data science practice. In turn, attendees will be better prepared to tackle new or existing roles as data science leaders or be better able to identify promising candidates for these roles.
-
keyboard_arrow_down
Tanuj Jain - Taming the Spark beast for Deep Learning predictions at scale
45 Mins
Talk
Intermediate
Predicting at scale is a challenging pursuit, especially when working with Deep Learning models. This is because Deep Learning models tend to have high inference time. At idealo.de, Germany's biggest price comparison platform, the Data Science team was tasked with carrying out image tagging to improve our product galleries.
One of the biggest challenges we faced was to generate predictions for more than 300 million images within a short time while keeping the costs low. Moreover, a resolution for the scaling problem became critical since we intended to apply other Deep Learning models on the same big dataset. We ended up formulating a batch-prediction solution by employing an Apache Spark setup that ran on an AWS EMR cluster.
Spark is notorious for being difficult to configure and tune. As a result, we had to carry on several optimisation steps in order to meet the scale requirements that adhered to our time and financial constraints. In this talk, I would present our Spark setup and focus on the journey of optimising the Spark tagging solution. Additionally, I would also talk briefly about the underlying deep learning model which was used to predict the image tags.
-
keyboard_arrow_down
Pankaj Kumar / Abinash Panda / Usha Rengaraju - Quantitative Finance :Global macro trading strategy using Probabilistic Graphical Models
Pankaj KumarQuantitative Research AssociateStatestreet Global AdvisorsAbinash PandaCEOProdios LabsUsha RengarajuPrincipal Data ScientistMysuru Consulting Groupschedule 4 years ago
90 Mins
Workshop
Advanced
{ This is a handson workshop in pgmpy package. The creator of pgmpy package Abinash Panda will do the code demonstration }
Crude oil plays an important role in the macroeconomic stability and it heavily influences the performance of the global financial markets. Unexpected fluctuations in the real price of crude oil are detrimental to the welfare of both oil-importing and oil-exporting economies.Global macro hedge-funds view forecast the price of oil as one of the key variables in generating macroeconomic projections and it also plays an important role for policy makers in predicting recessions.
Probabilistic Graphical Models can help in improving the accuracy of existing quantitative models for crude oil price prediction as it takes in to account many different macroeconomic and geopolitical variables .
Hidden Markov Models are used to detect underlying regimes of the time-series data by discretising the continuous time-series data. In this workshop we use Baum-Welch algorithm for learning the HMMs, and Viterbi Algorithm to find the sequence of hidden states (i.e. the regimes) given the observed states (i.e. monthly differences) of the time-series.
Belief Networks are used to analyse the probability of a regime in the Crude Oil given the evidence as a set of different regimes in the macroeconomic factors . Greedy Hill Climbing algorithm is used to learn the Belief Network, and the parameters are then learned using Bayesian Estimation using a K2 prior. Inference is then performed on the Belief Networks to obtain a forecast of the crude oil markets, and the forecast is tested on real data.
-
keyboard_arrow_down
Shalini Sinha / Ashok J / Yogesh Padmanaban - Hybrid Classification Model with Topic Modelling and LSTM Text Classifier to identify key drivers behind Incident Volume
45 Mins
Case Study
Intermediate
Incident volume reduction is one of the top priorities for any large-scale service organization along with timely resolution of incidents within the specified SLA parameters. AI and Machine learning solutions can help IT service desk manage the Incident influx as well as resolution cost by
- Identifying major topics from incident description and planning resource allocation and skill-sets accordingly
- Producing knowledge articles and resolution summary of similar incidents raised earlier
- Analyzing Root Causes of incidents and introducing processes and automation framework to predict and resolve them proactively
We will look at different approaches to combine standard document clustering algorithms such as Latent Dirichlet Allocation (LDA) and K-mean clustering on doc2vec along-with Text classification to produce easily interpret-able document clusters with semantically coherent/ text representation that helped IT operations of a large FMCG client identify key drivers/topics contributing towards incident volume and take necessary action on it.
-
keyboard_arrow_down
Gaurav Shekhar - Adversarial Learning challenges for Cybersecurity
45 Mins
Talk
Intermediate
According to the European Union Agency for Network and Information Security (ENISA) Threat Landscape report 2017, firms face millions of cyber threats including malware, web-based attacks, phishing, ransomware, botnets, etc. Detecting advanced persistent threats is a tough task since the real goals of these attacks stay undetected for a long period of time.
To overcome the ever-increasing new threats being designed by attackers, firms are increasingly relying on augmenting their security systems with advanced Machine Learning and Deep Learning techniques to protect their data and network from malicious attacks. However there is a growing realization that to defend your systems you need to learn how to attack them first and Adversarial Machine learning algorithms are finding use in this space.
Adversarial Machine learning is the study of machine learning vulnerabilities in adversarial environments. Much like how a hacker might take advantages of a firewall vulnerability to gain access to a web server, a machine learning system can itself be targeted to serve the goals of an attacker. Hence before putting such solutions in production, it is crucial that machine learning system designers build safeguards to preempt these attacks.
In this talk we start with a quick overview of the overall landscape of Cyber threats, understand some of the commonly used threat hunting methodologies and focus on some real world uses of Machine learning solutions to augment security. The second part of the talk focuses on developing an understanding of Adversarial Machine Learning algorithms, how they can be used to bypass security solutions build using Machine Learning algorithms. In the third section, we will demonstrate how Adversarial techniques can be developed to subvert solutions built using Machine Learning algorithms. We will also focus on some of the countermeasures to adopt which can help in protecting Machine learning based security systems