In recent times, with the advent of advanced machine learning techniques especially neural networks, decision trees, etc., the hunger for data has increased dramatically. Several thousand, if not millions, of observations, are required to make a satisfactory model using these algorithms. But due to several reasons like operational challenges, cost considerations, time paucity, etc. we may not have enough number of observations. In such cases, we are either forced to use other statistical models or are forced to collect more data (which usually is time infeasible and expensive). Coming to one’s rescue, Generative Adversarial Networks (GAN-a class of neural networks) provide a method of creating synthetic data by learning the distribution of the smaller data you already have.
GANs have been very popular in creating synthetic images and other unstructured data. But little success has been seen in working with structured datasets. At American Express, we gather multiple data points about our customers in a structured format – used widely for assessing the credit risk of the customer. One of the key issues we face is lack of data – in quantity and quality - about our newly acquired customers as they are low tenured, and we can’t know their historical behavior to assess their risk better. GAN offers an interesting solution to this problem – Can we use GAN to create synthetic customers who look like our newly acquired portfolio and use this to augment our datasets and build superior and stable credit risk models?
We have seen interesting and promising results in this application and in this talk, we will share our story of how to work with GAN on structured data in Financial services domain – data pipelines, architectures, key changes needed, etc. as well as delve into the application of applying GAN for risk model advancement.