ⓘ This page has been translated using artificial intelligence.

13 minutes

AI video generators: Take control

AI video generators have made huge strides in 2025. What was still characterised by stuttering, shaky transitions and inconsistent logic back in the spring has now given way to a significantly more mature state. Leading video models from Google and OpenAI now not only generate physically accurate movements, but also produce sound and image simultaneously and in sync. This brings us noticeably closer to the so-called ‘General World Models’.

Go directly to topic

How do AI video generators work?

What are the top video generators in 2025?

How do I generate videos?

Checklist: Sharing AI videos

Examples: This is what generated videos will look like in 2025

What are the opportunities and risks?

Recognising video deepfakes as such

Further content

You will find these topics on this page:

We examine how video AI works and highlight two different approaches.
We reveal how you can achieve better results more quickly when generating videos.
And we provide you with identifying features that will help you detect video deepfakes.

How is AI changing video production?

Topic

How do AI video generators work?

AI video generators represent the next generation of image generation, combining images with motion, physics and, increasingly, sound. Modern systems such as OpenAI’s Sora 2 or Google’s Veo 3.1 aim to understand the physical world and replicate its laws.This means that generated videos can appear very lifelike.

Technically, these models combine transformer and diffusion models and supplement them with a spacetime component: the transformer processes your original prompt into a technically feasible director’s script, which the diffusion model then implements frame by frame. To ensure these images can be joined together logically and harmoniously, the AI requires a basic understanding of physics. It gains this through so-called spacetime patches.

The AI develops an understanding of such spacetime patches by breaking down billions of videos into their smallest units and analysing individual frames, colour areas and pixel values. In this way, it learns the physical laws of our world. And can ultimately calculate them itself.

Native audio: sound and image are generated synchronously

In the early versions, AI video generators produced only silent images; the sound had to be added separately afterwards. This changed fundamentally in 2025: leading models such as Veo 3.1, Sora 2 and Kling 3.0 now generate sound and image simultaneously and in sync, using the same model. Dialogue, sound effects and ambient noise are created in a single step alongside the visual composition. This represents a major leap in quality for AI video generation.

How did it all begin?

Early systems such as AnimateDiff strung individual frames together, but without any underlying understanding of physics. The results often appeared slightly psychedelic. AnimateDiff remains available as an open-source project, but is now regarded more as a historical starting point for the technology.

Topic

Which are the leading AI video generators?

Would you like to generate a video yourself? You can find the most popular models currently available and what sets them apart here:

Model	Provider	Clip length	Special characteristics
Veo 3.1(opens in new tab)	Google DeepMind	8s (up to 4K)	Native audio (dialogue, sound effects, atmosphere), integration with Gemini, Flow and Vertex AI, watermarking with SynthID
Sora 2(opens in new tab)	OpenAI	10–15s (1080p)	Synchronised audio and dialogue, storyboard editor, ChatGPT integration*, watermarking with C2PA
Runway Gen-4.5(opens in new tab)	Runway	10s	Benchmark leader, high cinematic quality, consistent characters, 4K export
Kling 3.0(opens in new tab)	Kuaishou	15s (1080p)	Simultaneous audio-video generation, precise motion transfer from reference videos
Luma Ray3(opens in new tab)	Luma AI	up to 20s (1080p/4K HDR)	First ‘reasoning’ video model, native HDR export, Adobe Firefly integration
Pika 2.5(opens in new tab)	Pika	3–15s (1080p)	Fast generation, creative Pika effects for scene effects and transitions
WAN 2.6(opens in new tab)	Wan AI, Alibaba	up to 15s (1080p)	Open source, multi-shot storytelling, Chinese and English
Midjourney Video V1(opens in new tab)	Midjourney	5-21s	Image-to-video only, distinctive stylised look, seamless integration with Midjourney

And that's not all

Due to high demand, the market for video generators is also developing rapidly. There are already numerous video AIs available, and more are being added every day.

But with so much choice, it can be difficult to decide, right? That's why we recommend finding out about the specific capabilities and typical areas of application of the different models (you can also ask AI chatbots such as ChatGPT or Perplexity for advice) and then choosing the model that's right for you.

Incidentally, the Video Generation Arena Leaderboard(opens in new tab) provides an ongoing performance comparison.

Topic

How do I generate videos?

When generating videos, you proceed in a similar way to prompting images. However, there are a few additional things to consider to ensure that you end up with the videos you want.

The videos generated can be used for a variety of purposes:

In a private setting: Short videos for TikTok, Reels or Stories; personal greeting or invitation videos; memories
For learning: Explanatory videos; virtual excursions; bringing historical images to life; teaching media literacy by deliberately creating deepfake examples
If you want to be creative: experimental video projects; music projects, storytelling
At work: content marketing; training videos; brainstorming and prototyping

Here's how to proceed:

Prompts like when you use a image AI

When using video generators such as OpenAI Sora or Runway Gen-4, describe the desired scene in detail. Inform the AI about:

Content: What can be seen?
Style: cinematic, animated, 3D, surreal, retro, documentary, etc.
Movement: What perspective and movement does the camera film? Is there zooming, slow motion or a change of perspective in the scene?
Details: Atmospheric details such as lighting, weather, colours, etc.

Tip: You can also enlist the help of a text AI and ask it to optimise your prompt for the video conversion.

More prompt tips for image AIs

Think in scenes, write a storyboard

Think of your video as a series of mini-scenes with transitions in between. To ensure the AI knows exactly what you expect of it, create a storyboard with clear instructions for each mini-scene and transition. The storyboard feature in Sora 2 helps you with this scene breakdown. Google also offers a scene builder called Flow in Veo 3, which helps with editing, expanding and transitioning between scenes.

Tip: Describe only one movement per scene. AI will adhere better to your specifications if you don't specify too many changes at once. If a lot is happening in the scene, ask yourself: Can I subdivide the scene further? This makes it easier for the AI and, in return, you get better results.

An example? Let's take our apple example again:

Scene 1: Summer atmosphere

Setting: Wide shot of a sun-drenched orchard.
Details: Grass swaying gently, sunbeams breaking through the treetops.
Sound: Birds chirping, soft rustling of the wind.
Duration: 2 seconds

Scene 2: Camera movement along the tree

Shot: Slow tracking shot from bottom to top along the tree trunk.
Details: Focus on bark, light reflections flickering through the foliage.
Sound: Calm natural atmosphere remains.
Duration: 2 seconds

Scene 3: The apple comes loose

Shot: Close-up of a plump, red apple.
Details: In slow motion, it slowly comes loose from the branch – the stem visibly breaks off.
Light: Shine on the apple skin, sun reflections dance across the surface.
Sound: Slight crackling as it comes loose.
Duration: 2 seconds

Pay attention to the image format

Depending on the model, different aspect ratios (e.g. 9:16 or 16:9) are available. Since subsequent editing of the video can reduce its quality, it is best to consider the final format at the outset. Then, allow the AI to generate it directly.

Test, adjust and generate again

AI video generation is not an exact science, but rather a creative process. And creative processes rarely run smoothly. So if it takes two or three attempts per scene to get the video to meet your expectations, be patient with the AI – and with yourself.

Tip: Small changes to the prompt can sometimes have a big impact. Here's another example:

Original prompt: A red apple falls from a tree on a summer's day into a basket full of apples.
Prompt variant 1 – more emotion: In dramatic slow motion, a shiny red apple falls from a tree while dark clouds gather in the background. The apple lands in the basket with a resounding plop.
Prompt variant 2 – more fairy tale: A completely red apple (like the apple in Snow White) falls from a tree in a fairy-tale summer landscape bathed in warm light and lands gently in a woven basket.

Put the finishing touches to your video

If you’re generally happy with the video that’s been generated, you can edit it further. There are plenty of tools available to help you do this. One example is Recut(opens in new tab): this tool helps you automatically remove pauses and dead air, allowing you to get the most out of your clips.
For targeted post-production work on content (transitions, subtitles, merging multiple clips), video editing tools such as CapCut(opens in new tab), Adobe Premiere(opens in new tab) or Da Vinci Resolve(opens in new tab).

Are you a visual learner or want to learn more? Then we recommend the AI tutorials from Futurepedia(opens in new tab).

Checklist: Sharing AI videos

How do you proceed responsibly when sharing AI-generated videos?

Are real people recognisable in the video (voice, appearance)? Make sure that you are not violating anyone's personal rights or exposing anyone to embarrassment by sharing AI videos.
Are there any copyrighted elements in the video? It is best to avoid copyrighted elements such as logos, music or artwork in videos to avoid provoking disputes.
Could other people misunderstand the video? Put yourself in different perspectives and ask yourself: Could the video lead to misunderstandings or the spread of fake news? If so, it is better not to share it.
Have I labelled the AI video as AI-generated? Out of ethical responsibility towards others, we recommend that you always declare AI-generated works as AI-generated.

Even if you did not generate a video yourself, you are still part of its distribution chain once you share it. Always be aware of this responsibility.

Topic

Examples: this is what generated videos will look like in 2026

Author Dan Taylor Watt has compared numerous AI video generators in his blog, always using the same prompt to test the capabilities of the different systems. Here is an overview of five of the most popular generators.

Video generator: Runway Gen 4.5.

Video generator: Pika 2.2.

Video generator: Kling v3.

Video generator: Ray 3.14.

Video generator: Sora 2.

Video generator: Wan 2.6.

Video generator: Midjourney v1.

Video generator: Veo 3.

The prompt used:

A woman pushing a buggy across a zebra crossing whilst talking on her phone and walking her whippet.

Source of the videos(opens in new tab)

Topic

What are the opportunities and risks?

Newer models achieve higher quality through physical understanding. Both images and videos in photorealistic style can appear deceptively real as a result. This brings with it both opportunities and risks.

Opportunities

Efficient video production: Elaborate shoots, expensive visual effects or hours of 3D rendering – much of this could soon be superfluous. AI can significantly speed up the process, especially in animation. This is because the models generate a 2D output that looks like 3D, which means a fraction of the computing time.
Anything is possible: AI can realise any image sequence – dreams, surreal images and fantasy worlds. What used to require a whole team of artists can now be done with a single creative prompt.

Risks

Deepfakes & Manipulation: Deepfakes are videos that look genuine but have been altered to present false content. Whilst the technology behind deepfakes is not new, video generators are making it even more accessible. The viral ‘Trump Gaza’ video vividly demonstrates how quickly fiction can become perceived reality – or how the two can become blurred. Since 2025, leading models have been generating synchronised audio and video, making deepfakes even harder to detect, as previously, out-of-sync lip movements were considered a classic warning sign. To combat this, leading tech firms and publishers have launched the C2PA initiative. This aims to identify the source of digital media using invisible watermarks.
Danger of simplicity: Different artificial intelligences are trained with similar data. By predicting the most likely outcomes, the constant emphasis on already general, widespread patterns and themes leads to uniformity of results over time. Original creativity thus shifts to the implementation stage, where the conception and formulation of the storyboard will mainly determine the creativity of the videos.

We also consider ethical and social issues in our digital guide to generative image AI.

Topic

Recognising video deepfakes as such

Video deepfakes are videos that have been manipulated using AI. This involves falsifying statements or misusing personal data to superimpose one face onto another. Celebrities are particularly affected, as a lot of digital data for face generation is available on the internet.

What exactly is a deepfake? Datenschutzgesetze.eu defines deepfakes as follows:

The term ‘deepfake’ refers to AI-generated or manipulated image, audio or video content that resembles real people, objects, places, facilities or events and would falsely appear to a person to be genuine or truthful.

Deepfakes are characterised by the use of AI for manipulation. Shallowfakes are conceptually distinct from deepfakes. They include fakes created using traditional editing and image processing programmes.

How to spot video deepfakes

Given the current quality of AI video models, it is now virtually impossible even for a trained eye to detect deepfakes with 100% certainty. The models produce videos with synchronised audio, fluid movements and faces that look deceptively real. Classic tell-tale signs such as out-of-sync lip movements or unnatural hand movements are no longer reliable. Added to this is the fact that when consuming videos quickly in social media feeds, there is hardly any time left for critical scrutiny. One would have to make a conscious effort to do so.

Therefore, the better the video models become, the fewer technical ‘errors’ remain as identifying features. Contextual reasoning thus becomes the most important skill when dealing with deepfakes.

Technical identifying features

Technical features are no guarantee when it comes to detecting deepfakes. However, if you want to critically examine a video, technical features can still provide valuable clues. Watch the video in question in full-screen mode and look out for:

Inconsistent shadows and light sources

Does the light fall evenly on the face, neck and background, and from the same direction? Are reflections in glass realistic and accurate? Inconsistent shadows are one of the most reliable indicators, as many models still struggle with this.

Unnatural physics and unstable edges

Hair, fabrics, liquids, smoke or crowds in the background – such complex physical interactions remain a weakness for many models. Pay particular attention to hair contours and transitions between people and their surroundings. The more that is happening in the image and the more that is moving, the more likely artefacts will appear there.

Abrupt scene changes

Illogical camera angles, sudden jumps in the image or changes in lighting and image quality may indicate subsequent manipulation.

Embedded watermarks

Some AI video generators now embed C2PA metadata into their videos. Google uses SynthID. These invisible watermarks can identify the origin of a video and facilitate verification. The method is slowly gaining acceptance, but is not yet in widespread use. And here too, there is no absolute certainty: such metadata is not captured in screen recordings.

Use scanning tools

Deepware Scanner(opens in new tab) or Deepfake-o-meter(opens in new tab) are two examples. They can provide you with useful clues, but they do not guarantee a reliable result, as they cannot always keep pace with developments in AI.

Contextual analysis

Especially when a video appears visually authentic, the most effective weapon for detecting deepfakes is not your eyes, but your common sense.

Ask about the context and assess the video:

Where does the video come from?

Was it shared by a verified account, a reputable media outlet or an unknown source? Credibility is determined not by the number of likes or shares, but by the source.

Does the content fit the context?

Is a person saying something that is typical or atypical of them? If a video stirs up emotions or shocks you, it is rarely a coincidence – deepfakes often aim to provoke strong reactions.

Can the content be verified elsewhere?

Are reputable media outlets reporting on the same event? If not, scepticism is warranted.

Rule of thumb: If you’re unsure whether a video is genuine, it’s better not to share it. You bear responsibility for doing so.

Test yourself in SRF’s Deepfake Quiz: How good are you at spotting deepfakes?(opens in new tab)

Distinguishing features: How to spot video deepfakes

As AI continues to improve, it is becoming increasingly difficult to detect deepfakes. A few characteristics you can look out for to expose video deepfakes are:

Face and head do not match

Look at the proportions of the face and head – are they in proportion? With deepfakes, the head is sometimes slightly twisted or sits unnaturally on the body. The transitions from face to neck may also be worth a second look.

Unnatural cuts and scene transitions

Pay attention to sudden jumps in the image, illogical camera angles or abrupt cuts. Look closely, especially during scene changes.

Lip movements do not match the sound

Are the image and sound synchronised? Especially in earlier deepfakes, the lip movements often do not match the spoken text perfectly. Check whether the mouth is forming correctly (especially for difficult words).

Stiff or mechanical body language

Our body language is complex and context-dependent. Deepfakes lack the natural connection between mind and body that intuitively controls our movements. The movements in deepfakes can therefore appear uniform or simply not quite match what is being said or a particular emotion.

Empty or artificial gaze

A person's gaze reveals a lot, because even a glance can be a form of communication. So check: do the eyes appear lively? In deepfakes, the eyes are often fixed, empty or unnaturally shiny. Sometimes the blinking is also irritating because it is robotic or completely absent.

Inconsistent shadows on the face

Are the light sources in the image logical and consistent? Do the shadows fall correctly and in the same direction everywhere on the face and body? This can be a valuable clue, as deepfakes can often be exposed by inconsistencies in the shadows.

Untidy movements and details

The representation of hands is still a weak point in many models. Therefore, take a close look at the fingers of the AI and the people in the video: are there any strange finger positions or unrealistic situations, such as fingers overlapping or appearing to move through an object?

As with fake news, check the source of the video. Watch the video in full screen mode to see as many details as possible. And always remain sceptical and cautious: if you are unsure whether the content is true, it is better not to share the video.

Incidentally, there are now platforms that can help you expose deepfakes: Deepware scanner(opens in new tab), Deepfake-o-meter(opens in new tab), etc. However, depending on the technical sophistication of the platform, the results should be treated with caution (see this study from February 2025(opens in new tab)). Ultimately, the best tool is and remains common sense.

Test yourself in SRF's deepfake quiz: How good are you at recognising deepfakes?(opens in new tab)

Teaching materials: Deepfakes explained for children

In 2020, SRF school provided teaching materials for secondary levels I and II (media and IT, society, ethics): Explained for children – What are deepfakes?(opens in new tab)

This is important

So-called spacetime patches enable models such as Sora, Veo 3.1 and Runway Gen-4.5 to acquire a basic physical understanding of our world. This allows the AI to generate videos that look deceptively real.
When generating videos, it helps the AI if you think in scenes and create a storyboard with precise directing instructions for each mini-scene.
Deepfakes are becoming increasingly difficult to detect. Our list of distinguishing features can help.

Useful links

Further content

We have compiled further information and content on the topic of ‘AI video generators’ here.

Useful links

SRF: Is the new video AI Sora a game changer?(opens in new tab)

Deepfake pornography: A manipulated video can ruin a life(opens in new tab)

Tutorial: 5 Best AI Video Generators (2025)(opens in new tab)

SRF: Videos that look real but are not real(opens in new tab)

SRF: Disinformation with deepfakes: Showing what isn't there(opens in new tab)

SRF: Deepfake experiment with a member of the National Council(opens in new tab)

‘AI, turn the old family photo into a video.’

AI video generators: Take control

You will find these topics on this page:

Topic

How do AI video generators work?

Native audio: sound and image are generated synchronously

How did it all begin?

Topic

Which are the leading AI video generators?

And that's not all

Topic

How do I generate videos?

Here's how to proceed:

Prompts like when you use a image AI

Think in scenes, write a storyboard

Pay attention to the image format

Test, adjust and generate again

Put the finishing touches to your video

Checklist: Sharing AI videos

How do you proceed responsibly when sharing AI-generated videos?

Topic

Examples: this is what generated videos will look like in 2026

The prompt used:

Topic

What are the opportunities and risks?

Opportunities

Risks

Topic

Recognising video deepfakes as such

How to spot video deepfakes

Technical identifying features

Inconsistent shadows and light sources

Unnatural physics and unstable edges

Abrupt scene changes

Embedded watermarks

Use scanning tools

Contextual analysis

Where does the video come from?

Does the content fit the context?

Can the content be verified elsewhere?

Distinguishing features: How to spot video deepfakes

Face and head do not match

Unnatural cuts and scene transitions

Lip movements do not match the sound

Stiff or mechanical body language

Empty or artificial gaze

Inconsistent shadows on the face

Untidy movements and details

Teaching materials: Deepfakes explained for children

In 2020, SRF school provided teaching materials for secondary levels I and II (media and IT, society, ethics): Explained for children – What are deepfakes?(opens in new tab)

This is important

Useful links

Further content

Useful links

Other interesting topics