ⓘ This page has been translated using artificial intelligence.
AI video generators have made huge strides in 2025. What was still characterised by stuttering, shaky transitions and inconsistent logic back in the spring has now given way to a significantly more mature state. Leading video models from Google and OpenAI now not only generate physically accurate movements, but also produce sound and image simultaneously and in sync. This brings us noticeably closer to the so-called ‘General World Models’.
Go directly to topic
Share this page
How is AI changing video production?
AI video generators represent the next generation of image generation, combining images with motion, physics and, increasingly, sound. Modern systems such as OpenAI’s Sora 2 or Google’s Veo 3.1 aim to understand the physical world and replicate its laws.This means that generated videos can appear very lifelike.
Technically, these models combine transformer and diffusion models and supplement them with a spacetime component: the transformer processes your original prompt into a technically feasible director’s script, which the diffusion model then implements frame by frame. To ensure these images can be joined together logically and harmoniously, the AI requires a basic understanding of physics. It gains this through so-called spacetime patches.
The AI develops an understanding of such spacetime patches by breaking down billions of videos into their smallest units and analysing individual frames, colour areas and pixel values. In this way, it learns the physical laws of our world. And can ultimately calculate them itself.
In the early versions, AI video generators produced only silent images; the sound had to be added separately afterwards. This changed fundamentally in 2025: leading models such as Veo 3.1, Sora 2 and Kling 3.0 now generate sound and image simultaneously and in sync, using the same model. Dialogue, sound effects and ambient noise are created in a single step alongside the visual composition. This represents a major leap in quality for AI video generation.
Early systems such as AnimateDiff strung individual frames together, but without any underlying understanding of physics. The results often appeared slightly psychedelic. AnimateDiff remains available as an open-source project, but is now regarded more as a historical starting point for the technology.
Would you like to generate a video yourself? You can find the most popular models currently available and what sets them apart here:
| Model | Provider | Clip length | Special characteristics |
| Veo 3.1(opens in new tab) | Google DeepMind | 8s (up to 4K) | Native audio (dialogue, sound effects, atmosphere), integration with Gemini, Flow and Vertex AI, watermarking with SynthID |
| Sora 2(opens in new tab) | OpenAI | 10–15s (1080p) | Synchronised audio and dialogue, storyboard editor, ChatGPT integration*, watermarking with C2PA |
| Runway Gen-4.5(opens in new tab) | Runway | 10s | Benchmark leader, high cinematic quality, consistent characters, 4K export |
| Kling 3.0(opens in new tab) | Kuaishou | 15s (1080p) | Simultaneous audio-video generation, precise motion transfer from reference videos |
| Luma Ray3(opens in new tab) | Luma AI | up to 20s (1080p/4K HDR) | First ‘reasoning’ video model, native HDR export, Adobe Firefly integration |
| Pika 2.5(opens in new tab) | Pika | 3–15s (1080p) | Fast generation, creative Pika effects for scene effects and transitions |
| WAN 2.6(opens in new tab) | Wan AI, Alibaba | up to 15s (1080p) | Open source, multi-shot storytelling, Chinese and English |
| Midjourney Video V1(opens in new tab) | Midjourney | 5-21s | Image-to-video only, distinctive stylised look, seamless integration with Midjourney |
Due to high demand, the market for video generators is also developing rapidly. There are already numerous video AIs available, and more are being added every day.
But with so much choice, it can be difficult to decide, right? That's why we recommend finding out about the specific capabilities and typical areas of application of the different models (you can also ask AI chatbots such as ChatGPT or Perplexity for advice) and then choosing the model that's right for you.
Incidentally, the Video Generation Arena Leaderboard(opens in new tab) provides an ongoing performance comparison.
When generating videos, you proceed in a similar way to prompting images. However, there are a few additional things to consider to ensure that you end up with the videos you want.
When generating videos, you proceed in a similar way to prompting images. However, there are a few additional things to consider to ensure that you end up with the videos you want.
The videos generated can be used for a variety of purposes:
When using video generators such as OpenAI Sora or Runway Gen-4, describe the desired scene in detail. Inform the AI about:
Tip: You can also enlist the help of a text AI and ask it to optimise your prompt for the video conversion.
Think of your video as a series of mini-scenes with transitions in between. To ensure the AI knows exactly what you expect of it, create a storyboard with clear instructions for each mini-scene and transition. The storyboard feature in Sora 2 helps you with this scene breakdown. Google also offers a scene builder called Flow in Veo 3, which helps with editing, expanding and transitioning between scenes.
Tip: Describe only one movement per scene. AI will adhere better to your specifications if you don't specify too many changes at once. If a lot is happening in the scene, ask yourself: Can I subdivide the scene further? This makes it easier for the AI and, in return, you get better results.
An example? Let's take our apple example again:
Scene 1: Summer atmosphere
Scene 2: Camera movement along the tree
Scene 3: The apple comes loose
Depending on the model, different aspect ratios (e.g. 9:16 or 16:9) are available. Since subsequent editing of the video can reduce its quality, it is best to consider the final format at the outset. Then, allow the AI to generate it directly.
AI video generation is not an exact science, but rather a creative process. And creative processes rarely run smoothly. So if it takes two or three attempts per scene to get the video to meet your expectations, be patient with the AI – and with yourself.
Tip: Small changes to the prompt can sometimes have a big impact. Here's another example:
If you’re generally happy with the video that’s been generated, you can edit it further. There are plenty of tools available to help you do this. One example is Recut(opens in new tab): this tool helps you automatically remove pauses and dead air, allowing you to get the most out of your clips.
For targeted post-production work on content (transitions, subtitles, merging multiple clips), video editing tools such as CapCut(opens in new tab), Adobe Premiere(opens in new tab) or Da Vinci Resolve(opens in new tab).
Are you a visual learner or want to learn more? Then we recommend the AI tutorials from Futurepedia(opens in new tab).
Even if you did not generate a video yourself, you are still part of its distribution chain once you share it. Always be aware of this responsibility.
Author Dan Taylor Watt has compared numerous AI video generators in his blog, always using the same prompt to test the capabilities of the different systems. Here is an overview of five of the most popular generators.
Video generator: Runway Gen 4.5.
Video generator: Pika 2.2.
Video generator: Kling v3.
Video generator: Ray 3.14.
Video generator: Sora 2.
Video generator: Wan 2.6.
Video generator: Midjourney v1.
Video generator: Veo 3.
A woman pushing a buggy across a zebra crossing whilst talking on her phone and walking her whippet.
Newer models achieve higher quality through physical understanding. Both images and videos in photorealistic style can appear deceptively real as a result. This brings with it both opportunities and risks.
We also consider ethical and social issues in our digital guide to generative image AI.
Video deepfakes are videos that have been manipulated using AI. This involves falsifying statements or misusing personal data to superimpose one face onto another. Celebrities are particularly affected, as a lot of digital data for face generation is available on the internet.
What exactly is a deepfake? Datenschutzgesetze.eu defines deepfakes as follows:
The term ‘deepfake’ refers to AI-generated or manipulated image, audio or video content that resembles real people, objects, places, facilities or events and would falsely appear to a person to be genuine or truthful.
Deepfakes are characterised by the use of AI for manipulation. Shallowfakes are conceptually distinct from deepfakes. They include fakes created using traditional editing and image processing programmes.
Given the current quality of AI video models, it is now virtually impossible even for a trained eye to detect deepfakes with 100% certainty. The models produce videos with synchronised audio, fluid movements and faces that look deceptively real. Classic tell-tale signs such as out-of-sync lip movements or unnatural hand movements are no longer reliable. Added to this is the fact that when consuming videos quickly in social media feeds, there is hardly any time left for critical scrutiny. One would have to make a conscious effort to do so.
Therefore, the better the video models become, the fewer technical ‘errors’ remain as identifying features. Contextual reasoning thus becomes the most important skill when dealing with deepfakes.
Technical features are no guarantee when it comes to detecting deepfakes. However, if you want to critically examine a video, technical features can still provide valuable clues. Watch the video in question in full-screen mode and look out for:
Does the light fall evenly on the face, neck and background, and from the same direction? Are reflections in glass realistic and accurate? Inconsistent shadows are one of the most reliable indicators, as many models still struggle with this.
Hair, fabrics, liquids, smoke or crowds in the background – such complex physical interactions remain a weakness for many models. Pay particular attention to hair contours and transitions between people and their surroundings. The more that is happening in the image and the more that is moving, the more likely artefacts will appear there.
Illogical camera angles, sudden jumps in the image or changes in lighting and image quality may indicate subsequent manipulation.
Some AI video generators now embed C2PA metadata into their videos. Google uses SynthID. These invisible watermarks can identify the origin of a video and facilitate verification. The method is slowly gaining acceptance, but is not yet in widespread use. And here too, there is no absolute certainty: such metadata is not captured in screen recordings.
Deepware Scanner(opens in new tab) or Deepfake-o-meter(opens in new tab) are two examples. They can provide you with useful clues, but they do not guarantee a reliable result, as they cannot always keep pace with developments in AI.
Especially when a video appears visually authentic, the most effective weapon for detecting deepfakes is not your eyes, but your common sense.
Ask about the context and assess the video:
Was it shared by a verified account, a reputable media outlet or an unknown source? Credibility is determined not by the number of likes or shares, but by the source.
Is a person saying something that is typical or atypical of them? If a video stirs up emotions or shocks you, it is rarely a coincidence – deepfakes often aim to provoke strong reactions.
Are reputable media outlets reporting on the same event? If not, scepticism is warranted.
Rule of thumb: If you’re unsure whether a video is genuine, it’s better not to share it. You bear responsibility for doing so.
Test yourself in SRF’s Deepfake Quiz: How good are you at spotting deepfakes?(opens in new tab)
As AI continues to improve, it is becoming increasingly difficult to detect deepfakes. A few characteristics you can look out for to expose video deepfakes are:
Look at the proportions of the face and head – are they in proportion? With deepfakes, the head is sometimes slightly twisted or sits unnaturally on the body. The transitions from face to neck may also be worth a second look.
Pay attention to sudden jumps in the image, illogical camera angles or abrupt cuts. Look closely, especially during scene changes.
Are the image and sound synchronised? Especially in earlier deepfakes, the lip movements often do not match the spoken text perfectly. Check whether the mouth is forming correctly (especially for difficult words).
Our body language is complex and context-dependent. Deepfakes lack the natural connection between mind and body that intuitively controls our movements. The movements in deepfakes can therefore appear uniform or simply not quite match what is being said or a particular emotion.
A person's gaze reveals a lot, because even a glance can be a form of communication. So check: do the eyes appear lively? In deepfakes, the eyes are often fixed, empty or unnaturally shiny. Sometimes the blinking is also irritating because it is robotic or completely absent.
Are the light sources in the image logical and consistent? Do the shadows fall correctly and in the same direction everywhere on the face and body? This can be a valuable clue, as deepfakes can often be exposed by inconsistencies in the shadows.
The representation of hands is still a weak point in many models. Therefore, take a close look at the fingers of the AI and the people in the video: are there any strange finger positions or unrealistic situations, such as fingers overlapping or appearing to move through an object?
As with fake news, check the source of the video. Watch the video in full screen mode to see as many details as possible. And always remain sceptical and cautious: if you are unsure whether the content is true, it is better not to share the video.
Incidentally, there are now platforms that can help you expose deepfakes: Deepware scanner(opens in new tab), Deepfake-o-meter(opens in new tab), etc. However, depending on the technical sophistication of the platform, the results should be treated with caution (see this study from February 2025(opens in new tab)). Ultimately, the best tool is and remains common sense.
Test yourself in SRF's deepfake quiz: How good are you at recognising deepfakes?(opens in new tab)
We have compiled further information and content on the topic of ‘AI video generators’ here.