Automating Data Integration with Machine Learning
The world of data is a messy and unstructured place, making it difficult to gain value from data. Things get worse when the data resides in different sources or systems. Before we can perform any analytics in such a case, we need to combine the sources and build a unified view of the data. To handle this situation, a data scientist would typically go through each data source, identify which data is of interest, and define transformations and mappings which unify these data with other sources. This process usually includes writing lots of scripts with potentially overlapping code – a real headache in the everyday life of a data scientist! In this talk we will discuss how machine learning techniques and semantic modelling can be applied to automate the data integration process.