"In God we trust; all others must bring data." - W. E. Deming, Author & Professor
This philosophy is imbibed in the very core of American Express being a data-driven company makes all strategic decisions based on numbers. But who ensures that numbers are correct? That is the work of Data Quality and Governance. Given the dependency on Data, ensuring Data quality is one of our prime responsibilities
At American Express, we have Data getting generated and stored across multiple platforms. For example, in a market like the US, we process more than ~200 transactions every second and make an authorization decision. Given this speed and scale of data generation, ensuring Data quality becomes imperative and a unique challenge in itself. There are hundreds of models running in production platforms within AMEX having thousands of variables. Many variables are created/populated originally in legacy systems (or have components derived from there) which are then passed onto downstream systems for manipulation and creating new attributes. A tech glitch or a logic issue could impact any variable at any point of this process resulting in disastrous consequences in model outputs which can get transformed into real-world customer impact leading to financial and reputational risk for the bank. So how do we catch these anomalies before they adversely impact processes?
Traditional approaches to anomaly detection have relied on measuring the deviation from the mean of the variable. The more fancy ones employ time series based forecasting. But both these approaches are fraught with high levels of false positives. When every alert generated has to be analyzed by the business which has a cost, high levels of accuracy is desired. In this talk, we will discuss how AMEX has approached and solved this problem.