Prediction of Wilful Default using Machine Learning

Banks and financial institutes in India over the last few years have increasingly faced defaults by corporates. In fact, NBFC stocks have suffered huge losses in recent times. It has triggered a contagion which spilled over to other financial stocks too and adversely affected benchmark indices resulting in short term bearishness. This makes it imperative to investigate ways to prevent rather than cure such situations. However, the banks face a twin-faced challenge in terms of identifying the probable wilful defaulters from the rest and moral hazard among the bank employees who are many a time found to be acting on behest of promoters of defaulting firms. The first challenge is aggravated by the fact that due diligence of firms before the extension of loan is a time-consuming process and the second challenge hints at the need for placement of automated safeguards to reduce mal-practises originating out of the human behaviour. To address these challenges, the automation of loan sanctioning process is a possible solution. Hence, we identified important firmographic variables viz. financial ratios and their historic patterns by looking at the firms listed as dirty dozen by Reserve Bank of India. Next, we used k-means clustering to segment these firms and label them into various categories viz. normal, distressed defaulter and wilful defaulter. Besides, we utilized text and sentiment analysis to analyze the annual reports of all BSE and NSE listed firms over the last 10 years. From this, we identified word tags which resonate well with the occurrence of default and are indicators of financial performance of these firms. A rigorous analysis of these word tags (anagrams, bi-grams and co-located words) over a period of 10 years for more than 100 firms indicate the existence of a relation between frequency of word tags and firm default. Lift estimation of firmographic financial ratios namely Altman Z score and frequency of word tags for the first time uncovers the importance of text analysis in predicting financial performance of firms and their default. Our investigation also reveals the possibility of using neural networks as a predictor of firm default. Interestingly, the neural network developed by us utilizes the power of open source machine learning libraries and throws open possibilities of deploying such a neural network model by banks with a small one-time investment. In short, our work demonstrates the ability of machine learning in addressing challenges related to prevention of wilful default. We envisage that the implementation of neural network based prediction models and text analysis of firm-specific financial reports could help financial industry save millions in recovery and restructuring of loans.

 
2 favorite thumb_down thumb_up 2 comments visibility_off  Remove from Watchlist visibility  Add to Watchlist
 

Outline/Structure of the Case Study

1. Introduction: Wilful default by firms in the last 10 years and challenges faced by the financial industry in preventing it

2. How data speaks for different firms:

2a. Clustering of 3500 firms based on 12 different financial ratios and evolution of clusters over the last 10 years

2b. Analysis of financial reports of BSE, NSE listed and dirty dozen firms: Findings from Sentiment and Lift analysis

3. Deep learning basics and some tips on how to use open source python libraries for data cleaning and analysis

4. Development of the neural network model for default prediction: feature selection, selection of transfer function and number of hidden layers

5. Conclusion: Implications of machine learning techniques viz. text analysis and neural networks for the financial industry. Short term and long term perspective on the importance of machine learning in fintech space.

Learning Outcome

1. Understanding of current challenges faced by the financial industry and how technology can alleviate some of these challenges

2. Understanding of drivers behind wilful corporate default and how machine learning can make the prediction using these drivers

3. Understanding of various machine learning techniques viz. sentiment analysis, lift analysis, feature selection and deep learning

4. Understanding of how financial industry will be disrupted by ever-growing machine learning techniques and power

5. Understanding of methods of data collection, data cleaning and imputation techniques

6. Understanding of how to leverage academic research environment to maximize learning from an MBA programme

7. Financial engineers and analysts will be able to learn how they can leverage machine learning in their day to day work

8. Data scientists will learn how to gather, clean and analyse financial data

Target Audience

Fintech firms, data scientists, financial engineers, academic researchers, regulators, banks.

Prerequisites for Attendees

The presentation requires general awareness of machine learning, challenges being faced by fintech companies, banks and recent performance of financial institutions. It will touch upon the basics of finance and machine learning before getting into more in-depth technical analysis carried out. Hence, a person exposed to these areas via news, print media and blogs will also be able to understand the matter equally well as a seasoned financial expert and data scientist.

schedule Submitted 4 months ago

Public Feedback

comment Suggest improvements to the Speaker
  • Vidushi sharma
    By Vidushi sharma  ~  4 months ago
    reply Reply

     

    • The type of application of ML to the corporate finance problem is very interesting and different.
    • Anupam Purwar
      By Anupam Purwar  ~  4 months ago
      reply Reply

      Thank you so much for your comments. Please join me during the presentation at ODSC.


  • Liked Anupam Purwar
    keyboard_arrow_down

    Anupam Purwar - Prediction of Wilful Default using Machine Learning

    45 Mins
    Case Study
    Intermediate

    Banks and financial institutes in India over the last few years have increasingly faced defaults by corporates. In fact, NBFC stocks have suffered huge losses in recent times. It has triggered a contagion which spilled over to other financial stocks too and adversely affected benchmark indices resulting in short term bearishness. This makes it imperative to investigate ways to prevent rather than cure such situations. However, the banks face a twin-faced challenge in terms of identifying the probable wilful defaulters from the rest and moral hazard among the bank employees who are many a time found to be acting on behest of promoters of defaulting firms. The first challenge is aggravated by the fact that due diligence of firms before the extension of loan is a time-consuming process and the second challenge hints at the need for placement of automated safeguards to reduce mal-practises originating out of the human behaviour. To address these challenges, the automation of loan sanctioning process is a possible solution. Hence, we identified important firmographic variables viz. financial ratios and their historic patterns by looking at the firms listed as dirty dozen by Reserve Bank of India. Next, we used k-means clustering to segment these firms and label them into various categories viz. normal, distressed defaulter and wilful defaulter. Besides, we utilized text and sentiment analysis to analyze the annual reports of all BSE and NSE listed firms over the last 10 years. From this, we identified word tags which resonate well with the occurrence of default and are indicators of financial performance of these firms. A rigorous analysis of these word tags (anagrams, bi-grams and co-located words) over a period of 10 years for more than 100 firms indicate the existence of a relation between frequency of word tags and firm default. Lift estimation of firmographic financial ratios namely Altman Z score and frequency of word tags for the first time uncovers the importance of text analysis in predicting financial performance of firms and their default. Our investigation also reveals the possibility of using neural networks as a predictor of firm default. Interestingly, the neural network developed by us utilizes the power of open source machine learning libraries and throws open possibilities of deploying such a neural network model by banks with a small one-time investment. In short, our work demonstrates the ability of machine learning in addressing challenges related to prevention of wilful default. We envisage that the implementation of neural network based prediction models and text analysis of firm-specific financial reports could help financial industry save millions in recovery and restructuring of loans.

  • Liked Anupam Purwar
    keyboard_arrow_down

    Anupam Purwar - An Industrial IoT system for wireless instrumentation: Development, Prototyping and Testing

    45 Mins
    Talk
    Intermediate

    The next generation machinery viz. turbines, aircraft and boilers will rely heavily on smart data acquisition and monitoring to meet their performance and reliability requirements. These systems require the accurate acquisition of various parameters like pressure, temperature and heat flux in real time for structural health monitoring, automation and intelligent control. This calls for the use of sophisticated instrumentation to measure these parameters and transmit them in real time. In the present work, a wireless sensor network (WSN) based on a novel high-temperature thermocouple cum heat flux sensor has been proposed. The architecture of this WSN has been evolved keeping in mind its robustness, safety and affordability. WiFi communication protocol based on IEEE 802.11 b/g/n specification has been utilized to create a secure and low power WSN. The thermocouple cum heat flux sensor and instrumentation enclosure have been designed using rigorous finite element modelling. The sensor and wireless transmission unit have been housed in an enclosure capable of withstanding temperature and pressure in the range of 100 bars and 2500K respectively. The sensor signal is conditioned before being passed to the wireless ESP8266 based ESP12E transmitter, which transmits data to a web server. This system uploads the data to a cloud database in real time. Thus, providing seamless data availability to decision maker sitting across the globe without any time lag and with ultra-low power consumption. The real-time data is envisaged to be used for structural health monitoring of hot structures by identifying patterns of temperature rise which have historically resulted in damage using Machine learning (ML). Such type of ML application can save millions of dollars wasted in the replacement and maintenance of industrial equipment by alerting the engineers in real time.