Videos account for about 75% of Internet traffic today. Enterprises are creating more and more videos and using them for various informational purposes, including marketing, training of customers, partners & employees and internal communications. However, videos are considered as the blackholes of the Internet because it is very hard to see what’s inside them. The opaque nature of videos equally impacts end users who spend a lot of time navigating to their point of interest, leading to severe underutilization of videos as a powerful medium of information.
In this talk, we will describe visual processing pipeline of VideoKen platform which includes
- Graph-based algorithm along with deep scene text detection to identify key visual frames in the video,
- FCN-based algorithm for semantic segmentation of screen content in visual frames,
- Transfer-learning based visual classifier to categorize screen content into different categories such as slides, code walkthrough, demo, handwritten, etc. and
- Algorithm to detect visual coherency and select indices from the video.
We will discuss challenges and experiences in implementing/iterating on these algorithms using our experience with processing 100K+ video hours of content.