Role API agonistic platform plays in evolving media intelligence industry in India

In July this year, Amazon quietly launched a glossy, square box that promised to make your TV smarter. Aptly called the Fire TV Cube , it connects to your TV and makes it ready for voice interaction. The initial reviews were terrible, it understood only 80% of the voice commands and everyone thought all it could do was turn volume up and down. Turns out, it is not only way smarter but also a glimpse into the future of how we interact with media.

-By Anup Gosavi, Co-Founder, Spext

The Cube can identify all the devices connected to your TV ports and based on what you ask it to do, will switch intelligently between them. For example, when you say “Alexa, play ESPN”, it will turn on the TV, know that ESPN is available on Tata Sky-which is connected to HDMI 2, switch to HDMI2 and then start the ESPN channel. The experience, when it works, is absolutely magical. Now, it only works with limited streaming services like Hulu and Playstation Vue, but it is clear that more companies will support it. It’s also clear is that Alexa wants to become the
voice OS for media interaction.

The adoption of Alexa and voice based devices has been extraordinary and in the past, we have seen that changes in interaction always create major shifts in applications and the broader tech ecosystem. The touchscreen was a shift in how we interacted with computers. The hardware moved from keyboards to touch screens almost instantaneously. Applications shifted to simpler designs, faster load times and large buttons you could press.

Similarly, it’s not a stretch to imagine that very soon, you can say things like “Alexa, show me when Kohli scored a century.” While Alexa can identify what the user wants to do, it is up to the app and the media to actually fulfill this request. This is a big, big opportunity for startups. With the amount of training data they have, big players like Google and Amazon will dominate the voice infrastructure and interaction layer, but API and platform agnostic startups will drive innovation at the application layer by building on top of this infrastructure.

Going back to the request, “Show me when Kohli scored a century”, look at all the things that need to be figured out – The name ‘Kohli’, the sport where a century can be scored – ‘Cricket’, the ‘Time’ where this century was scored and the ‘Channel’ where the match available. This request can come via Google Assistant, Alexa, Cortana or Siri. This simple request requires the media to be intelligent and to serve a large audience, it needs to be platform agnostic.

Making media intelligent is a tough task but we have the building blocks. Vision algorithms can analyze the media frame by frame to identify objects and famous faces. Speech to text algorithms can accurately convert the spoken words into text. Natural Language Processing (NLP) algorithms can analyze this text to create a knowledge graph. Microsoft, Google, Amazon all have APIs that provide these algorithms but they are not enough to realize the potential of intelligent media. All they provide is a solid foundation.

For example, vision algorithms can detect faces in the video but can’t accurately tell you the exact time they appear. They are not synchronous, making instant responses difficult. API agnostic platforms will make the most of this opportunity by building on top of these algorithms, inventing new algorithms and helping companies make their media compatible with voice interactions.

Going back to our Kohli example, here is what an API agnostic solution might look like: The platform can use Microsoft Video Indexer to analyze the media, identify Kohli and tag the media as Cricket, Test match etc. It can then use Google speech to text to convert the commentary to text and identify the segment when the word “century” was spoken.

After this, it needs to innovate – develop algorithms to accurately align speech-text to find the exact moment when Kohli scored. The commentators will say ‘Century’ many times, especially during replays, so the platform can analyze the audio to identify if there was any clapping around the word ‘Century/ 100/ What an innings’ etc. If yes, then that time is more likely when the century was actually scored. With all this information, it needs to automatically create a clip and intelligently tag it as a response to the search query “Show me when Kohli scored a century”. It then needs to send this clip to the streaming server so that only the clip is played
and not the entire 8 hour match.

Finally, to create a great experience, all of this has to happen very quickly and the clip has to be streamable across variety of devices. There is lots of work to be done in many areas, creating multi billion opportunities for API agnostic platforms. Devices like Alexa Cube TV are revolutionizing how we interact with media. The revolution of creating intelligent, conversant and interactive media has just begun.

 

Anup GosaviAPIspext
Comments (0)
Add Comment