by Bob Yirka , Tech Xplore

We proposed EMO, an expressive audio-driven portrait-video generation framework. Input a single reference image and the vocal audio, e.g. talking and singing, our method can generate vocal avatar videos with expressive facial expressions, and various head poses, meanwhile, we can generate videos with any duration depending on the length of input audio. Credit: arXiv (2024). DOI: 10.48550/arxiv.2402.17485

A small team of artificial intelligence researchers at the Institute for Intelligent Computing, Alibaba Group, demonstrates, via videos they created, a new AI app that can accept a single photograph of a person’s face and a soundtrack of someone speaking or singing and use them to create an animated version of the person speaking or singing the voice track. The group has published a paper describing their work on the arXiv preprint server.

Prior researchers have demonstrated AI applications that can process a photograph of a face and use it to create a semi-animated version. In this new effort, the team at Alibaba has taken this a step further by adding sound. And perhaps, just as importantly, they have done so without the use of 3D models or even facial landmarks. Instead, the team has used diffusion modeling based on training an AI on large datasets of audio or video files. In this instance, the team used approximately 250 hours of such data to create their app, which they call Emote Portrait Alive (EMO).

By directly converting the audio waveform into video frames, the researchers created an application that captures subtle human facial gestures, quirks of speech and other characteristics that identify an animated image of a face as human-like. The videos faithfully recreate the likely mouth shapes used to form words and sentences, along with expressions typically associated with them.

Character: Mona Lisa Vocal Source: Shakespeare’s Monologue II As You Like It: Rosalind “Yes, one; and in this manner.” Credit: https://humanaigc.github.io/emote-portrait-alive/

The team has posted several videos demonstrating the strikingly accurate performances they generated, claiming that they outperform other applications regarding realism and expressiveness. They also note that the finished video length is determined by the length of the original audio track. In the videos, the original picture is shown alongside that person speaking or singing in the voice of the person who was recorded on the original audio track.

Credit: Emote Portrait Alive

The team concludes by acknowledging that use of such an application will need to be restricted or monitored to prevent unethical use of such technology.

More information:
Linrui Tian et al, EMO: Emote Portrait Alive—Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions, arXiv (2024). DOI: 10.48550/arxiv.2402.17485

EMO: humanaigc.github.io/emote-portrait-alive/

Journal information:
arXiv

Post Disclaimer

The information provided in our posts or blogs are for educational and informative purposes only. We do not guarantee the accuracy, completeness or suitability of the information. We do not provide financial or investment advice. Readers should always seek professional advice before making any financial or investment decisions based on the information provided in our content. We will not be held responsible for any losses, damages or consequences that may arise from relying on the information provided in our content.

AI system can convert voice track to video of a person speaking using a still image

Post Disclaimer

AI Infrastructure and Compute Strategy for 2026

Operationalizing Responsible AI for 2026 Enterprises

AI and Machine Learning Enterprise Readiness in 2026

Most Popular

Comprehending the Function and Duties of a Supply Chain Manager

IBM Sterling Supply Chain Management – Are they on top of the Elite?

Seagate Supply Chain Goes Live With Adexa | Adexa

The Future of Supply Chain Management: 2025–2026 Tech Trends to Watch

Recent Comments

EDITOR PICKS

Cloud-First IAM Solutions and Platform Consolidation

Modular blockchains: Unbundling the stack to scale Web3

Real-time payments and AI settlement acceleration in 2026

POPULAR POSTS

Space Solar And Microwave Beams: Building A Power Bridge From Orbit

When Storage Runs Itself: AI in Storage Operations

Laser Light Wireless Power Transfer: From Lab Curiosity to 2026 Breakout

POPULAR CATEGORY

ABOUT TECH ONLINE NEWS

FOLLOW US