Containers are all the rage in the DevOps arena.

This session is a live demonstration of how the data team at Milliman uses containers at each step in their data science workflow -

1) How do containerized environments speed up data scientists at the data exploration stage

2) How do containers enable rapid prototyping and validation at the modeling stage

3) How do we put containerized models on production

4) How do containers make it easy for data scientists to do DevOps

5) How do containers make it easy for data scientists to host a data science dashboard with continuous integration and continuous delivery

 
4 favorite thumb_down thumb_up 2 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Case Study

1) How do containerized environments speed up data scientists at the data exploration stage - We'll talk about basic data exploration in a data science environment, followed by a survey and discussion of exploration tools and technologies and then a basic overview of how containers can speed up the process. This will be followed by some questions from the attendees - 12 mins

2) How do containers enable rapid prototyping and validation at the modeling stage - We'll then process to talk about modelling environments and how containers enable rapid prototyping and testing. Followed by questions. 10 mins

3) How do we put containerized models on production - We'll talk about how we leverage cloud environments to enable parallelized model runs and predictions. Again followed by questions - 8 mins

4) How do containers make it easy for data scientists to do DevOps - We'll start with basic DevOps, data science workflows, agile principles and a discussion on container support on Travis CI and Azure DevOps. Followed by questions - 5 mins

5) How do containers make it easy for data scientists to host a data science dashboard with continuous integration and continuous delivery - Using containers to enable microservice architecture for a dashboard and container orchestration using Docker Swarm/ Kubernetes. Followed by a final round of questions - 10 mins

Learning Outcome

1) Attendees will have a basic understanding of containers/serverless computing paradigm

2) Attendees can have a reasoned discussion about whether containerized environments suit their workflows

3) Data Scientists and Data engineers will be able to make a case for using containers within their teams

4) Data Science and Engineering managers can clearly reason how containers aid in the agile workflow

Target Audience

Data Scientists, Data Science Managers, Product Managers, Data Engineers, Data Architects, Data Product teams

Prerequisites for Attendees

This is a beginner level introduction to containers and their specific use cases in data science. Attendees are expected to be privy to basic concepts of agile teams and rapid prototyping and data science. The concepts discussed are tailored for folks in data teams but can be applied to other software engineering teams as well.

schedule Submitted 3 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Dipanjan Sarkar
    By Dipanjan Sarkar  ~  3 months ago
    reply Reply

    While the overall topic looks to be good, this looks to be a broad overview talking about 'what' it is w.r.t containers + ML. 

     

    Given the nature of the conference being applied and aimed towards practitioners, the intent is also to focus on the 'how'. Will you be showcasing any actual demos of maybe how to take a machine learning model, have a web service and then containerize it and maybe even scale it out using kubernetes etc?

    • Kshitij Srivastava
      By Kshitij Srivastava  ~  3 months ago
      reply Reply

      Hi Dipanjan,

      Thanks for your interest in our topic.

      Absolutely. We intend to make it a live demonstration. In each subsection, we will have slides and then a live demo.

      Regards,

      Kshitij


  • Liked Viral B. Shah
    keyboard_arrow_down

    Viral B. Shah - Growing a compiler - Getting to ML from the general-purpose Julia compiler

    45 Mins
    Keynote
    Intermediate

    Since we originally proposed the need for a first-class language, compiler and ecosystem for machine learning (ML) - a view that is increasingly shared by many, there have been plenty of interesting developments in the field. Not only have the tradeoffs in existing systems, such as TensorFlow and PyTorch, not been resolved, but they are clearer than ever now that both frameworks contain distinct "static graph" and "eager execution" interfaces. Meanwhile, the idea of ML models fundamentally being differentiable algorithms – often called differentiable programming – has caught on.

    Where current frameworks fall short, several exciting new projects have sprung up that dispense with graphs entirely, to bring differentiable programming to the mainstream. Myia, by the Theano team, differentiates and compiles a subset of Python to high-performance GPU code. Swift for TensorFlow extends Swift so that compatible functions can be compiled to TensorFlow graphs. And finally, the Flux ecosystem is extending Julia’s compiler with a number of ML-focused tools, including first-class gradients, just-in-time CUDA kernel compilation, automatic batching and support for new hardware such as TPUs.

    This talk will demonstrate how Julia is increasingly becoming a natural language for machine learning, the kind of libraries and applications the Julia community is building, the contributions from India (there are many!), and our plans going forward.

  • Liked Dr. Vikas Agrawal
    keyboard_arrow_down

    Dr. Vikas Agrawal - Non-Stationary Time Series: Finding Relationships Between Changing Processes for Enterprise Prescriptive Systems

    45 Mins
    Talk
    Intermediate

    It is too tedious to keep on asking questions, seek explanations or set thresholds for trends or anomalies. Why not find problems before they happen, find explanations for the glitches and suggest shortest paths to fixing them? Businesses are always changing along with their competitive environment and processes. No static model can handle that. Using dynamic models that find time-delayed interactions between multiple time series, we need to make proactive forecasts of anomalous trends of risks and opportunities in operations, sales, revenue and personnel, based on multiple factors influencing each other over time. We need to know how to set what is “normal” and determine when the business processes from six months ago do not apply any more, or only applies to 35% of the cases today, while explaining the causes of risk and sources of opportunity, their relative directions and magnitude, in the context of the decision-making and transactional applications, using state-of-the-art techniques.

    Real world processes and businesses keeps changing, with one moving part changing another over time. Can we capture these changing relationships? Can we use multiple variables to find risks on key interesting ones? We will take a fun journey culminating in the most recent developments in the field. What methods work well and which break? What can we use in practice?

    For instance, we can show a CEO that they would miss their revenue target by over 6% for the quarter, and tell us why i.e. in what ways has their business changed over the last year. Then we provide the prioritized ordered lists of quickest, cheapest and least risky paths to help turn them over the tide, with estimates of relative costs and expected probability of success.

  • Liked Subhasish Misra
    keyboard_arrow_down

    Subhasish Misra - Causal data science: Answering the crucial ‘why’ in your analysis.

    Subhasish Misra
    Subhasish Misra
    Staff Data Scientist
    Walmart Labs
    schedule 4 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    Causal questions are ubiquitous in data science. For e.g. questions such as, did changing a feature in a website lead to more traffic or if digital ad exposure led to incremental purchase are deeply rooted in causality.

    Randomized tests are considered to be the gold standard when it comes to getting to causal effects. However, experiments in many cases are unfeasible or unethical. In such cases one has to rely on observational (non-experimental) data to derive causal insights. The crucial difference between randomized experiments and observational data is that in the former, test subjects (e.g. customers) are randomly assigned a treatment (e.g. digital advertisement exposure). This helps curb the possibility that user response (e.g. clicking on a link in the ad and purchasing the product) across the two groups of treated and non-treated subjects is different owing to pre-existing differences in user characteristic (e.g. demographics, geo-location etc.). In essence, we can then attribute divergences observed post-treatment in key outcomes (e.g. purchase rate), as the causal impact of the treatment.

    This treatment assignment mechanism that makes causal attribution possible via randomization is absent though when using observational data. Thankfully, there are scientific (statistical and beyond) techniques available to ensure that we are able to circumvent this shortcoming and get to causal reads.

    The aim of this talk, will be to offer a practical overview of the above aspects of causal inference -which in turn as a discipline lies at the fascinating confluence of statistics, philosophy, computer science, psychology, economics, and medicine, among others. Topics include:

    • The fundamental tenets of causality and measuring causal effects.
    • Challenges involved in measuring causal effects in real world situations.
    • Distinguishing between randomized and observational approaches to measuring the same.
    • Provide an introduction to measuring causal effects using observational data using matching and its extension of propensity score based matching with a focus on the a) the intuition and statistics behind it b) Tips from the trenches, basis the speakers experience in these techniques and c) Practical limitations of such approaches
    • Walk through an example of how matching was applied to get to causal insights regarding effectiveness of a digital product for a major retailer.
    • Finally conclude with why understanding having a nuanced understanding of causality is all the more important in the big data era we are into.
  • Liked Badri Narayanan Gopalakrishnan
    keyboard_arrow_down

    Badri Narayanan Gopalakrishnan / Shalini Sinha / Usha Rengaraju - Lifting Up: Deep Learning for implementing anti-hunger and anti-poverty programs

    45 Mins
    Case Study
    Intermediate

    Ending poverty and zero hunger are top two goals United Nations aims to achieve by 2030 under its sustainable development program. Hunger and poverty are byproducts of multiple factors and fighting them require multi-fold effort from all stakeholders. Artificial Intelligence and Machine learning has transformed the way we live, work and interact. However economics of business has limited its application to few segments of the society. A much conscious effort is needed to bring the power of AI to the benefits of the ones who actually need it the most – people below the poverty line. Here we present our thoughts on how deep learning and big data analytics can be combined to enable effective implementation of anti-poverty programs. The advancements in deep learning , micro diagnostics combined with effective technology policy is the right recipe for a progressive growth of a nation. Deep learning can help identify poverty zones across the globe based on night time images where the level of light correlates to higher economic growth. Once the areas of lower economic growth are identified, geographic and demographic data can be combined to establish micro level diagnostics of these underdeveloped area. The insights from the data can help plan an effective intervention program. Machine Learning can be further used to identify potential donors, investors and contributors across the globe based on their skill-set, interest, history, ethnicity, purchasing power and their native connect to the location of the proposed program. Adequate resource allocation and efficient design of the program will also not guarantee success of a program unless the project execution is supervised at grass-root level. Data Analytics can be used to monitor project progress, effectiveness and detect anomaly in case of any fraud or mismanagement of funds.

  • Liked Gaurav Godhwani
    keyboard_arrow_down

    Gaurav Godhwani / Swati Jaiswal - Fantastic Indian Open Datasets and Where to Find Them

    45 Mins
    Case Study
    Beginner

    With the big boom in Data Science and Analytics Industry in India, a lot of data scientists are keen on learning a variety of learning algorithms and data manipulation techniques. At the same time, there is this growing interest among data scientists to give back to the society, harness their acquired skills and help fix some of the major burning problems in the nation. But how does one go about finding meaningful datasets connecting to societal problems and plan data-for-good projects? This session will summarize our experience of working in Data-for-Good sector in last 5 years, sharing few interesting datasets and associated use-cases of employing machine learning and artificial intelligence in social sector. Indian social sector is replete with good volume of open data on attributes like annotated images, geospatial information, time-series, Indic languages, Satellite Imagery, etc. We will dive into understanding journey of a Data-for-Good project, getting essential open datasets and understand insights from certain data projects in development sector. Lastly, we will explore how we can work with various communities and scale our algorithmic experiments in meaningful contributions.

  • Liked Antrixsh Gupta
    keyboard_arrow_down

    Antrixsh Gupta - Creating Custom Interactive Data Visualization Dashboards with Bokeh

    90 Mins
    Workshop
    Beginner

    This will be a hands-on workshop how to build a custom interactive dashboard application on your local machine or on any cloud service provider. You will also learn how to deploy this application with both security and scalability in mind.

    Powerful Data visualization software solutions are extremely useful when building interactive data visualization dashboards. However, these types of solutions might not provide sufficient customization options. For those scenarios, you can use open source libraries like D3.js, Chart.js, or Bokeh to create custom dashboards. While these libraries offer a lot of flexibility for building dashboards with tailored features and visualizations.

  • Liked Indranil Basu
    keyboard_arrow_down

    Indranil Basu - Machine Generation of Recommended Image from Human Speech

    45 Mins
    Talk
    Advanced

    Introduction:

    Synthesizing audio for specific domains has many practical applications in creative sound design for music and film. But the application is not restricted to entertainment industry. We propose an architecture that will convert audio (human voice) to the voice owner’s preferred image – for the time being we restrict the intended images to two domains – Object Design and Human body. Many times, human beings are unable to describe a design (may be power-point presentation or interior decoration of a house) or a known person by verbally described attributes as they are able to visualise the same design or the person. But the other person, the listener may be unable to interpret the object or human descriptions from the speaker’s verbal descriptions as he/she is not visualising the same. Complete communication thus needs much of a trial and error and overall hazardous and time consuming. Examples of such situations are 1) While making presentation, an executive or manager can visualise something and an express to his/her employee to make the same. But, making the best slides from manger’s description may not be proper. Another relevant example is that a house owner or office owner wants his/her premises to have certain design which he/she can visualise and express to the concerned vendor. But the vendor may not be able to produce the same. Also, trial and error in this case is highly expensive. Having an automated Image, recommended to him/her can address this problem. 2) Verbal description of a terrorist or criminal suspect (facial description and/or attribute) may not be always available to all the security people every time, in Airports or Railway Stations or sensitive areas. Presence of a software system having Machine Generated Image with Ranked Recommendation for such suspect can immediately point to one or very few people in a crowded Airport or even Railway Station or any such sensitive place. Security agencies can then frisk only those people or match their attributes with existing database. This can avoid hazardous manual checking of every people in the same process and can help the security agencies to do adequate checking for those recommended individuals.

    We can use a Sequential Architecture consisting of simple NLP and more complex Deep Learning algorithms primarily based on Generative Adversarial Network (GAN) and Neural Personalised Ranking (NPR) to help the object designers and security personnel for serving their specific purposes.

    The idea to combat the problem:

    I propose a combination of Deep Learning and Recommender System approach to tackle this problem. Architecture of the Solution model consists of 4 major Components – 1) Speech to Text

    2) Text Classification into Person or Design; 3) Text to Image Formation; 4) Recommender System

    We are trying to address these four steps in consecutive applications of effective Machine Learning and Deep Learning Algorithms. Deep Learning community has already been able to make significant progress in terms of Text to Image generation and also in Ranking based Recommender System

    Brief Details about the four major pillars of this problem:

    Deep Learning based Speech Recognition – Primary technique for Speech to text could be Baidu’s DeepSpeech for which a Tensorflow implementation is readily available. Also, Google Cloud Speech-to-Text enables the develop to convert Voice to text. Voice of the user needs to be converted in .wav file. Our steps for Deep-Speech-2 are like this – Fixing GPU memory, Adding Batch normalization to RNN, implement row Convolution layer and generate text.

    Nowadays, we have quite a few free Speech to Text software, e.g. Google Docs Voice typing, windows Speech Recognition, Speech-notes etc.

    Text Classification of Content – This is needed to classify the converted text into two classes – a) Design Description or b) Human Attribute Description because these two applications and therefore image types are different. This may be Statistically easier part, but its importance is immense. A Dictionary of words related to Designs and Personal Attributes can be built using online available resources. Then, a supervised algorithm using tf-idf and Latent Semantic Analysis (LSA) should be able to classify the text into two classes – Object and Person. These are very much traditional and proven techniques in many NLP research

    Text to Image Formation – This is our main component for this proposal. Today, one of the most challenging problems in the world of Computer Vision is synthesizing high-quality images from text descriptions. In recent years, GANs have been found to generate good results. Samples generated by existing text-to-image approaches can roughly reflect the meaning of the given descriptions, but they fail to contain necessary details and vivid object parts. There have been a few approaches to address this problem, all using GAN. One of those is given as Stacked Generative Adversarial Networks (StackGAN). Heart of such approaches is Conditional GAN which is an extension of GAN where both generator and discriminator receive additional conditioning variables c, yielding G(z, c) and D(x, c). This formulation allows G to generate images conditioned on variables c.

    In our case, we train deep convolutional generative adversarial network (DC-GAN) conditioned on text features. These text features are encoded by a hybrid character-level convolutional-recurrent neural network. Overall, DC-GAN uses text embeddings where the context of a word is of prime importance. Class label determined in the earlier step will be of help in this case. This will simply help DC-GAN to generate more relevant images than irrelevant ones. Details will be discussed during the talk

    The most straightforward way to train a conditional GAN is to view (text, image) pairs as joint observations and train the discriminator to judge pairs as real or fake. The discriminator has no explicit notion of whether real training images match the text embedding context. To account for this, in GAN-CLS, in addition to the real/fake inputs to the discriminator during training, a third type of input consisting of real images with mismatched text is added, which the discriminator must learn to score as fake. By learning to optimize image/text matching in addition to the image realism, the discriminator can provide an additional signal to the generator. (details are in talk)

    Image Recommender System – In the last step, we propose personalised image recommendation for the user from the set of images generated by GAN-CLS architecture. Image Recommendation brings down the number of choice of images to a top N (N=3, 5, 10 ideally) with a rank given to each of those and therefore user finds it easier to choose. In this case, we propose Neural Personalized Ranking (NPR) – a personalized pairwise ranking model over implicit feedback datasets – that is inspired by Bayesian Personalized Ranking (BPR) and recent advances in neural networks. We like to mention that, now NPR is improved to contextual enhanced NPR. This enhanced Model depends on implicit feedbacks from the users, its contexts and incorporates the idea of generalized matrix factorization. Contextual NPR significantly outperforms its competitors

    In the presentation, we shall describe the complete sequence in detail

  • Liked Shrutika Poyrekar
    keyboard_arrow_down

    Shrutika Poyrekar / kiran karkera / Usha Rengaraju - Introduction to Bayesian Networks

    90 Mins
    Workshop
    Advanced

    { This is a handson workshop . The use case is Traffic analysis . }

    Most machine learning models assume independent and identically distributed (i.i.d) data. Graphical models can capture almost arbitrarily rich dependency structures between variables. They encode conditional independence structure with graphs. Bayesian network, a type of graphical model describes a probability distribution among all variables by putting edges between the variable nodes, wherein edges represent the conditional probability factor in the factorized probability distribution. Thus Bayesian Networks provide a compact representation for dealing with uncertainty using an underlying graphical structure and the probability theory. These models have a variety of applications such as medical diagnosis, biomonitoring, image processing, turbo codes, information retrieval, document classification, gene regulatory networks, etc. amongst many others. These models are interpretable as they are able to capture the causal relationships between different features .They can work efficiently with small data and also deal with missing data which gives it more power than conventional machine learning and deep learning models.

    In this session, we will discuss concepts of conditional independence, d- separation , Hammersley Clifford theorem , Bayes theorem, Expectation Maximization and Variable Elimination. There will be a code walk through of simple case study.

  • Liked Akash Tandon
    keyboard_arrow_down

    Akash Tandon - Traversing the graph computing and database ecosystem

    Akash Tandon
    Akash Tandon
    Data Engineer
    SocialCops
    schedule 3 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    Graphs have long held a special place in computer science’s history (and codebases). We're seeing the advent of a new wave of the information age; an age that is characterized by great emphasis on linked data. Hence, graph computing and databases have risen to prominence rapidly over the last few years. Be it enterprise knowledge graphs, fraud detection or graph-based social media analytics, there are a great number of potential applications.

    To reap the benefits of graph databases and computing, one needs to understand the basics as well as current technical landscape and offerings. Equally important is to understand if a graph-based approach suits your problem.
    These realizations are a result of my involvement in an effort to build an enterprise knowledge graph platform. I also believe that graph computing is more than a niche technology and has potential for organizations of varying scale.
    Now, I want to share my learning with you.

    This talk will touch upon the above points with the general premise being that data structured as graph(s) can lead to improved data workflows.
    During our journey, you will learn fundamentals of graph technology and witness a live demo using Neo4j, a popular property graph database. We will walk through a day in the life of data workers (engineers, scientists, analysts), the challenges that they face and how graph-based approaches result in elegant solutions.
    We'll end our journey with a peek into the current graph ecosystem and high-level concepts that need to be kept in mind while adopting an offering.

  • Liked Shankar Somayajula
    keyboard_arrow_down

    Shankar Somayajula - Revisiting Market Basket Analysis (MBA) with the help of SQL Pattern Matching

    45 Mins
    Case Study
    Intermediate

    Market Basket Analysis or Affinity Analysis using Association Rules based model is a cross domain Solution Framework used for in Retail Analytics (Shopping Baskets), Clickstream/Web Traffic Analytics, Customer Behaviour Analytics, Fraud Analytics etc.

    Market Basket Analysis (MBA) is used to discover/identify patterns from transactional data (a master-detail transactional set of line items) and serves many down-stream Business processes like Recommendations, Merchandising/Inventory Planning, Product Assortments etc.

    MBA is extensively used in the industry. There are quite a few extensions possible to MBA like (a) Multi-Level Association Rules by allowing the core item/product hierarchy level to be flexible, (b) Multi-Dimensional Association Rules by including additional nuggets of information 'tags' along additional dimensions of interest, (c) Sequential Association Rules by considering the order of events within the transaction and eliciting signals relating to directionality of the Rule including possible causal indicators.

    MBA is typically performed as an offline batch/etl/analytic process with the results of the modeling extracted and saved for subsequent perusal by the Domain/Business Analyst.

    In this solution/revisiting of the MBA process, we decouple the Rule/Pattern identification/discovery phase (finding patterns/rules via Association Rules model build) from the Rule/Pattern KPI calculation phase related to the usefulness evaluation of the patterns (scoring patterns/rules via KPIs).

    MBA Rules/Patterns are typically evaluated via the Support, Confidence and Lift KPIs. Some experts have advocated for the definition of additional KPIs like Conviction, Imbalance Ratio (IR), Kulc factor (Kulczynski) to identify interesting Rule/Patterns. We define these KPIs as well as many custom KPIs which help qualify the Rule/Patterns and aid in Rule/Pattern Discovery/Exploration phase.

    The SQL approach to MBA allows us to

    => Include the pattern matching capability within an offline ETL workflow (match and pre-calculate results) or within a view (match on demand, dynamic calculation ) or a combination of both (both pre-calculated as well as on-demand) for regular BI Tools to leverage .

    => We can cover special/edge cases of interest in special domains like Fraud Patterns etc with insufficient coverage (very low support) but which need to be identified nevertheless. The pattern space can be very voluminous but in certain cases, we can identify/analyze user defined seeded patterns using SQL w/o having to build the MBA model.

    => We can also address Sequential Rules/Patterns where transaction order of items are considered during the matching process. For e.g if the Market Basket Rule is "b,p,r => c" then we can use SQL sequential logic to derive the most dominant sequential pattern within the antecedents "b", "p" and "r" is "p,b,r" 67% of the time and also that overall including both antecedents (b, p, r) and consequent (c) the dominant sequential pattern amongst the 4 basket products is "p,b,c,r" 50% of the time. This acts as a nudge to the domain analyst/business user to perhaps approve a update/transform workflow process to change the business rule from "b,p,r => c" initially sorted by product(s)/(ids) indicating pure association by the Apriori Model to "p,b,r => c" indicating the influence/calculation of the dominant sequential pattern amongst the antecendents.

    => Another advantage is to allow the Domain Analyst/Business User to perform adhoc reporting via standard BI operations like slice and dice on the dataset and recalculating the Rule/Pattern KPIs.

    => Re-evaluate a Rule/Pattern against a different dataset from that it was identified (say, against a recent/streaming input data stream). See how Patterns discovered during the "Big Sale" period are doing in current Promotion/Campaign.

    => Establish Rule/Pattern Lifecycle beyond that of a MBA 'model' -- Establish a Rules curation process to determine how a discovered Rule/Pattern can be designated as an 'Insight' for further use in related (downstream) systems.

  • Liked Maryam Jahanshahi
    keyboard_arrow_down

    Maryam Jahanshahi - Applying Dynamic Embeddings in Natural Language Processing to Analyze Text over Time

    Maryam Jahanshahi
    Maryam Jahanshahi
    Research Scientist
    TapRecruit
    schedule 6 months ago
    Sold Out!
    45 Mins
    Case Study
    Intermediate

    Many data scientists are familiar with word embedding models such as word2vec, which capture semantic similarity of words in a large corpus. However, word embeddings are limited in their ability to interrogate a corpus alongside other context or over time. Moreover, word embedding models either need significant amounts of data, or tuning through transfer learning of a domain-specific vocabulary that is unique to most commercial applications.

    In this talk, I will introduce exponential family embeddings. Developed by Rudolph and Blei, these methods extend the idea of word embeddings to other types of high-dimensional data. I will demonstrate how they can be used to conduct advanced topic modeling on datasets that are medium-sized, which are specialized enough to require significant modifications of a word2vec model and contain more general data types (including categorical, count, continuous). I will discuss how my team implemented a dynamic embedding model using Tensor Flow and our proprietary corpus of job descriptions. Using both categorical and natural language data associated with jobs, we charted the development of different skill sets over the last 3 years. I will specifically focus the description of results on how tech and data science skill sets have developed, grown and pollinated other types of jobs over time.

  • Liked Sunil Jacob
    keyboard_arrow_down

    Sunil Jacob - Automated Recognition of Handwritten Digits in Indian Bank Cheques

    Sunil Jacob
    Sunil Jacob
    Sr. Architect
    Philips
    schedule 3 months ago
    Sold Out!
    45 Mins
    Case Study
    Beginner

    Handwritten digit recognition and pattern analysis are one of the active research topics in digital image processing. Moreover, automatic handwritten digit recognition is of great technical interest and academic interest.

    In today’s digital realm, banks cheques are widely used around the world for various financial transactions. A rough estimate says that almost 120+ billion cheques move around the world. In the Indian banking scenario, CTS cheque clearance system has come. Even though the check is cleared quickly, there is still manual intervention needed to validate the date and amount fields. There is a lot of manual effort in this area.

    This case study, followed by a demo, will parade on how handwritten date and amount fields were extracted and validated. By adopting this automated way of recognising handwritten digits, banks can cut down the manual time and increase speed in their process. Although this is still in the proof of concept phase, this feat was achieved using computer vision and image processing techniques.

    This case study will briefly cover:

    • Detection of bounding and taking the region of interest
    • Fragment and Identify technique
    • Checking the accuracy of bounding box using Intersection over Union technique

    This case study/approach can be extended to other operative environments, where handwritten digits recognition is needed.

  • Liked Dr. Neha Sehgal
    keyboard_arrow_down

    Dr. Neha Sehgal - Open Data Science for Smart Manufacturing

    45 Mins
    Talk
    Intermediate

    Open Data offers a tremendous opportunity in transformation of today’s manufacturing sector to smarter manufacturing. Smart Manufacturing initiatives include digitalising production processes and integrating IoT technologies for connecting machines to collect data for analysis and visualisation.

    In this talk, an understanding of linkage between various industries within manufacturing sector through lens of Open Data Science will be illustrated. The data on manufacturing sector companies, company profiles, officers and financials will be scraped from UK Open Data API’s. The work I plan to showcase in ODSC is part of UK Made Smarter Project, where the work has been useful for major aerospace alliances to find out the champions and strugglers (SMEs) within manufacturing sector based on the open data gathered from multiple sources. The talk includes discussion on data extraction, data cleaning, data transformation - transforming raw financial information about companies to key metrics of interest - and further data analytics to create clusters of manufacturing companies into "Champions" and "Strugglers". The talk showcased examples of powerful R Shiny based dashboards of interest for suppliers, manufacturer and other key stakeholders in supply chain network.

    Further analysis includes network analysis for industries, clustering and deploying the model as an API using Google Cloud Platform. The presenter will discuss about the necessity of 'Analytical Thinking' approach as an aid to handle complex big data projects and how to overcome challenges while working with real-life data science projects.

  • Liked Krishna Sangeeth
    keyboard_arrow_down

    Krishna Sangeeth - The last mile problem in ML

    Krishna Sangeeth
    Krishna Sangeeth
    Data Scientist
    Ericsson
    schedule 3 months ago
    Sold Out!
    45 Mins
    Talk
    Intermediate

    “We have built a machine learning model, What next?”

    There is quite a bit of journey that one needs to cover from building a model in Jupyter notebook to taking it to production.
    I would like to call it as the “last mile problem in ML” , this last mile could be a simple tread if we embrace some good ideas.

    This talk covers some of these opinionated ideas on how we can get around some of the pitfalls in deployment of ML models in production.

    We would go over the below questions in detail think about solutions for them.

    • How to fix the zombie models apocalypse, a state when nobody knows how the model was trained ?
    • In Science, experiments are found to be valid only if they are reproducible. Should this be the case in Datascience as well ?
    • Training the model in your local machine and waiting for an eternity to complete is no fun. What are some better ways of doing this ?
    • How do you package your machine learning code in a robust manner?
    • Does an ML project have the luxury of not following good Software Engineering principles?
  • Liked Amit  Baldwa
    keyboard_arrow_down

    Amit Baldwa - PREDICTING AND BEATING THE STOCK MARKET WITH MACHINE LEARNING AND TECHNICAL ANALYSIS

    Amit  Baldwa
    Amit Baldwa
    Director
    Finastra Financial Software
    schedule 3 months ago
    Sold Out!
    45 Mins
    Demonstration
    Intermediate

    Machine learning provides systems the ability to automatically learn and improve from experience without being explicitly programmed.

    Technical analysis shows in graphic form investor sentiment, both greed and fear. Technical analysis attempts to use past stock price and volume information to predict future price movements. Technical analysis of various indicators has been a time-tested strategy for seasoned traders and hedge funds, who have used these techniques to effective turn our profits in Securities Industry.

    Some researchers claim that stock prices conform to the theory of random walk, which is that the future path of the price of a stock is not more predictable than random numbers. However, Stock prices do not follow random walks.

    We will evaluate whether stock returns can be predicted based on historical information.

    Coupled with Machine Learning, we further try to decipher the correlation between the various indicators and identify the set of indicators which appropriately predict the value

  • Liked Deepthi Chand
    keyboard_arrow_down

    Deepthi Chand / Shreya Agrawal - Samantar, an open assistive translation framework for Indic Languages

    45 Mins
    Case Study
    Beginner

    India is a land of many languages. There are 23 official and much more unofficial languages prevalently used in day-to-day conversations. Unfortunately, information dissemination to the low resource languages get difficult because of the geo-spatial distances. Popular translation platforms helped to fill this gap in major languages but their efficiency is challenged by the lack of availability of proper datasets and their generic nature. This problem is very evident when more domain information gets involved.

    We present Samantar, an open translation suggestion framework targeted at Indian languages. Samantar is built with open parallel corpora and opensource technologies. The translations can be tuned to suggest according to different target domains.

  • Liked Vidhya Veeraraghavan
    keyboard_arrow_down

    Vidhya Veeraraghavan - Story Teller - Analytics in Banking & Financial Sector

    45 Mins
    Case Study
    Beginner

    As kids, we always enjoyed stories. Some scary, some holy, some imbibing moral values & some just for fun.

    Analytics is fun when you approach it with passion and curiosity. I know this because I have done this. With few case studies, I wish to illuminate your wits about Analytics and how it is being actively used in Banking and Financial Sector.

    Come join me for a fun ride.