close
close

Simple software translates videos into other languages

Thanks to AI, I speak Arabic like a native speaker.

In this video, John Jeffay speaks Japanese like a native speaker thanks to AI. Courtesy of D-ID

I uploaded a video from my phone and within minutes it had cloned my voice, translated my words and lip-synced them into a lifelike moving image of me. I also created versions in French and Japanese.

This is the latest in a series of technological advances from Israeli startup D-ID, which uses generative AI to create “photorealistic digital humans.”

It allows anyone who creates online content – from a how-to guide on how to fix your washing machine to a professional marketing video – to overcome language barriers and reach a global audience.

I found the Video Translate software very easy to use and the results are remarkably lifelike. The translation is largely correct, but there are moments where the AI ​​didn’t fully understand my point.

What I actually said in my 30-second video was, “I don’t speak French, Arabic, or Japanese.”

This came out in the French version as “Je ne parle [made-up word] no Arab, no [repeats the made-up word].” Then a random “Bibliothèque” (library) is inserted, which I definitely didn’t say.

We all know AI is great, except when it isn’t. There is currently no way to iron out such wrinkles, says D-ID, short of re-admission. However, the company is working on an upgrade that will allow you to review and change the translation before the video is produced.

This small error does not change the fact that it is an extremely clever technology.

Better than subtitles or dubbing

“The idea is to give you the ability to create videos in languages ​​you don’t speak,” Ron Friedman, head of content and creative marketing at D-ID, tells ISRAEL21c, “by uploading a single video and automatically translate and thus clone your voice.” an exact replica and matching the lip sync and facial movements in the original video.

“The result is a video of you speaking a variety of languages ​​that you don’t necessarily speak.”

It is aimed at both private users who want to send a birthday greeting in one of the 30 available languages, and corporate customers who want to reach many countries with a single campaign.

The easiest way to translate a video has always been subtitling – which wasn’t particularly appealing – and then dubbing – ditto – or recording multiple presenters in multiple languages, which can be very expensive.

D-ID’s solution surpasses them all. It opens new markets for business users by providing fast, efficient, affordable and compelling versions of a video in many languages.

So how convincing is it? “I think that, as with any AI feature, some people – those who look closely at every detail – might notice it [that it’s AI-generated]says Tal Ron-Pereg, product director at D-ID.

“But for a normal audience watching the footage, most won’t know it’s AI, and there are tips and tricks on how to make it look even better.” Some angles will look better than others, but most of the time it will look very, very real.”

Gets better with time

As with all AI, the AI ​​that powers Video Translate learns and improves.

“Over time, the translation and lip-syncing capabilities will get better,” says Ron-Pereg.

“We will also be adding additional features that will allow you to check the translation before submission – by reading the translation aloud before actually creating the video, so you can edit a specific word that you think we mistranslated .”

“We will also add the ability for the user to provide instructions – for example, the pronunciation of certain words or a little more detail about what the video is about and who you are addressing, so that the tone and style fit better.”

Over time, Video Translate will be able to cope with multiple speakers (at the moment it gets confused by more than one person and may end up cloning two voices into one).

Other languages ​​will also be available. For example, Video Translate currently understands Hebrew as an input language, but does not yet offer it for output.

De-identification

D-ID was founded in 2017 as a pioneering company in the field of de-identification (D-ID) – which means outsmarting facial recognition technology.

Founders Gil Perry (CEO), Sela Blundheim (COO) and Eliran Kota (CTO) were all veterans of the Israel Defense Forces’ elite Signals Intelligence Unit 8200.

They altered the photos of people so that they were still recognizable to humans but could not be identified by facial recognition algorithms.

The Tel Aviv-based company developed avatars – lifelike digital humans – that can have real-time natural language conversations, narrate videos instead of actors, and more. The company has raised $48 million in funding, has 90 employees and a very exciting future, says Friedman.

“We first interacted with computers by typing green text commands on a black screen. Then came the graphical user interface [GUI] – the mouse, drag-and-drop and scrolling features we use today.

“We believe the future is NUI [natural user interface]where you interact with your device – your laptop, your phone, your fridge, your car, anything – in a natural way that means face-to-face conversations.

“Instead of going to a website and scrolling through it, you have a conversation with that website’s avatar. You can ask him how to solve a problem with a vehicle.

“You can ask them to open a bank account or make a doctor’s appointment, all through a natural conversation with an avatar who can see and hear you.

“It recognizes your tone of voice and body language and responds to you in the most natural and stimulating way.”

For more information click here.

You may also like...