What is InfiniteTalk ComfyUI Integration?
InfiniteTalk is a new talking avatar framework from the MultiTalk team that enables audio-driven video generation for creating talking avatar videos. One of its headline features is the ability to generate videos of infinite length.
This means you're no longer limited to short 10–15 second clips — you can generate minutes of content, or even longer, as long as your machine has enough RAM and VRAM. The model is still audio-driven, generating image-to-video output with natural lip syncing and enhanced body motions while the character speaks.
Why ComfyUI? The ComfyUI integration gives you a node-based visual workflow instead of command-line usage — ideal for experimenting, chaining post-processing nodes, and building repeatable pipelines.
Setting Up InfiniteTalk in ComfyUI
Follow these three steps to get InfiniteTalk running inside ComfyUI.
Update the Juan Video Wrapper
If you already use ComfyUI, update the Juan Video Wrapper custom node to its latest version — it now ships with InfiniteTalk support built in. New users can download it directly from GitHub.
Download InfiniteTalk Model Files
Go to the official Hugging Face repository for InfiniteTalk. Under the file versions tab you'll find a ComfyUI folder containing AI models exported specifically for ComfyUI. Inside you'll see two files: InfiniteTalk Single (one person) and InfiniteTalk Multi (multiple people). Start with the single version for initial testing.
Install Model Files
Drop the downloaded .safetensors files into the diffusion_models subfolder inside your ComfyUI models/ directory. You can create a dedicated subfolder (e.g. InfiniteTalk/) for better organisation.
Creating Your First Workflow
Using the Example Workflow
The easiest starting point is the example workflow bundled with the Juan Video Wrapper. After updating the custom node, you'll notice the MultiTalk nodes have been renamed: they now appear as MultiTalk and Infinite MultiTalk.
Model Selection
In the MultiTalk / InfiniteTalk model loader, select the InfiniteTalk model file you downloaded. For single-person use cases, choose the single variant. The surrounding node setup (block swap, Torch compile, VAE, CLIP text encoder) is identical to the previous MultiTalk workflow, so existing setups require minimal changes.
Optimisation Settings
By default the workflow uses the image-to-video LightX2V model to speed up sampling. Lowering the sampling step count reduces generation time at a small quality cost. 480p resolution is recommended for most machines — 720p requires significantly more VRAM and was unstable in early tests.
Advanced Features
Multiple People & Audio Tracks
InfiniteTalk inherits the multi-speaker capability from MultiTalk. You can pass in separate audio tracks and assign reference target masks to each person you want animated — ideal for dialogue scenes or podcast-style content.
Text-to-Speech Integration
Connect a TTS node (such as Chatterbox SRT Voice) upstream of the InfiniteTalk node. Type or load your script and the TTS node generates the audio automatically, removing the need to prepare audio files externally.
Long-Form Content Generation
The system calculates required video length from the audio duration automatically, making it straightforward to produce full podcast episodes or long explainer videos without manual trimming.
Frame Interpolation
After generation, run a frame interpolation node to double the FPS. This meaningfully improves perceived smoothness and reduces minor artefacts like rapid eye blinking that can appear at the native frame rate.
Performance & Quality
Chunk-Based Processing
During sampling you'll see the video processed in chunks — for example, 81 frames per chunk with 25 overlapping frames carried into the next segment. This overlap is what keeps the animation smooth and consistent across the full video duration.
Hardware Requirements
For 480p generation, most modern GPUs with 6 GB+ VRAM are sufficient. 720p or very long videos require more VRAM and system RAM. Torch compile support is recommended for best throughput on CUDA devices.
InfiniteTalk vs MultiTalk
InfiniteTalk
- Unlimited video length
- More natural body language
- Better lip sync accuracy
- Fewer artefacts and distortions
- Improved stability for long clips
MultiTalk Limitations
- Limited to short clips
- Occasional overreaction or weird motion
- Less natural body language
- More artefacts in longer sequences
- Inconsistent quality over time
Tips & Best Practices
Audio Quality
Use clean, high-quality audio without background noise. Better audio directly improves lip sync accuracy and facial expression fidelity.
Image Selection
Choose a well-lit, high-resolution photo with clear facial features. Input image quality has a direct impact on the output video quality.
Sampling Steps
Start with 4–8 steps for fast iteration. Increase steps for final renders when you're happy with the composition.
Post-Processing
Always apply frame interpolation after generation. Doubling FPS significantly smooths motion and reduces flickering artefacts.
Try InfiniteTalk Online — No Setup Required
Don't have a high-end GPU? Use our cloud-based InfiniteTalk tool directly in the browser. Upload an image, provide audio, and generate professional talking avatar videos in minutes.
Try InfiniteTalk Free →