What does a television station or film studio have in common with the typical business email in-box these days? Both are portals to seemingly limitless hours of video content.
It was that similarity that caught my attention when someone from a new company called CLIPr approached me to discuss how the company is making video searchable.
In a nutshell, CLIPr leverages machine learning tools in the Amazon cloud to wade through hours of uploaded video so people can find and watch what they need to see, rather than spending countless hours waiting for the good part.
CLIPr comes at this with the mass market in mind—the millions upon millions of hours of video generated during video conferences calls, virtual conference sessions, virtual keynotes and virtual classroom instruction that have taken the place of in-person experiences since the COVID-19 pandemic was declared a year ago.
But why couldn’t the same technology help reporters and video editors at television stations who need to search raw video for the right clip to use in a story or their counterparts at studios looking for a particular shot?
I interviewed Humphrey Chen, co-founder and CEO of CLIPr, to find out.
TVTech: CLIPr makes it easier for people to find the video clips they’re searching for. How?
Humphrey Chen: The way to think about CLIPr is we are a video analysis and management platform.
The way we think about things is that not all video moments are created equally. Some are more valuable, and some are less valuable. The problem right now is that when we hit the play button we are all forced to be at the full mercy of everything behind it.
So what CLIPr does is to create an automated index to surface content, which allows you to then pick and choose what you want to watch and what you don’t want to watch.
If you only care about 5% of a three hour meeting, we’re saving you 95%. The tools today just don’t exist for you to very efficiently find what you need.
TVT: So initially you are targeting CLIPr at business-type applications, right?
HC: During the pandemic, everything became video. Everything became digital, and my co-founder came to me and said, “Man, I need something to help me work through my video backlog.”
That was an aha moment. We realized we could build at scale something that applies to the masses because there are 15 million meetings every day, and they’re all remote. I’m hearing anecdotally that 30-40% are getting recorded.
Before CLIPr it wasn’t easy to get caught up. With CLIPr, you click to send it through, and we basically index it and do what we call “enriching” it. We’ll provide topics and labels to describe the content.
TVT: Have you been approached by any television broadcasters or film studios looking to use this technology to search for desired footage?
HC: The funny thing is that the roots of everything we’re doing were originally designed to serve Hollywood. They have thousands and thousands of hours of content, and they also have full-time people whose only job is to go through it.
We have customers who are reaching out to us after that Amazon blog came out who are asking us to help them with the post processing associated with creating trailers because they are looking for things.
The tools we’re making, designed to serve the masses, can also help studios.
We are already seeing movie trailers uploaded and gamers upload moments from their video game on Twitch.
TVT: Tell me about how AI and machine learning in the cloud power CLIPr.
HC: We’ve basically been building it with scale in mind, using the Amazon AI stack behind the scenes.
What’s also really important to realize is that although we feel machine learning is great, it’s still imperfect. It’s not perfect by any stretch at all.
With that in mind, we have humans in the loop who actually help annotate and improve what we see. So in the structured experience, there are humans in the loop who are actually helping to make the descriptions more precise and crisper.
Right now, if you simply were to fully rely on automation, you wouldn’t get a crisp summary of 10 sentences. The state of the art doesn’t allow 10 sentences to become three words. That just doesn’t exist yet.
The only way to do that is to get this into the hands of people—get them to use it as much as possible. They get utility. When they are happy [with the results] or when they’re unhappy, we learn and our models can improve.
We can build on top of that machine learning stack—the platform that Amazon has—and keep getting better.
TVT: What’s the difference between structured and unstructured content in this context?
HC: Step one of CLIPr was focused on structured content. That typically means there’s a slide [such as in a meeting or a keynote speech] that corresponds to the talk track. Unstructured content means there are no visual cues that are telling us what’s happening.
We’ve all been in meetings where the conversation can go all over the place, right? While it’s actually hard for a human to organize that, it’s impossible for a machine to organize that.
So our unstructured experience is going to be more equivalent to a word cloud or a topic cloud. At that point, what CLIPr will pre-create is effectively a treasure map, splash page, because now we’re surfacing all the major things that were discussed, and you roll over those things. That will then take you to those parts inside of the video.
TVT: How did this all start for you?
HC: My background is with Amazon in the computer vision team. We basically were empowering developers to see and hear at scale. So, that was fun and exciting. But it also was frustrating because I gave developers the tools and then they would have to build the solutions. Now we are developing the solution. TV Technology