ⓘ​  This page has been translated using artificial intelligence.

13 minutes

AI video generators: Take control

First text, then images – and now videos too. AI video generators are still in their infancy in spring 2025. And yet, despite occasional glitches, shaky transitions and sometimes inconsistent logic, they are already producing impressive results. Generated videos are considered the next major milestone in AI because of their potential: models such as Runway Gen-4 and OpenAI Sora are expected to enable so-called ‘general world models’. These are AI systems that not only generate content, but also have a deep, physics-based understanding of the world.

You will find these topics on this page:

How is AI changing video production?

Topic

How do AI video generators work?

AI video generators are the next generation of image generation, combining images with movement. You can either create AI videos from scratch or modify existing videos. But how does this work technically, and what mechanisms are behind it?

Early approaches to AI video generation are based on image generation and string individual images together. Modern systems, on the other hand, strive for a physical understanding and recreate the world by adhering to these physical principles.

AnimateDiff: one of the early approaches, it is a further development of text-to-image models such as Midjourney, in which the generated individual images are moved dynamically. This is how it works:

The AI learns how to derive subsequent images from an original image by training with real video data. The AI strings the generated image series together to create a video.

The catch? Image 1 is simply appended to image 2 without conforming to an overarching script or the physical principles of our world. This often results in slightly psychedelic effects, showing objects that transform into one another or irritate the viewer with their slight jerkiness. The design varies from image to image, and the quality of the movement often simply does not feel real when viewed.

Video examples and further information about AnimateDiff(opens in new tab)

Sora (OpenAI): belongs to the category of ‘world building’. This means that AI video generators such as Sora combine transformer and diffusion models on the one hand and incorporate the space-time component (we explain what this is below) on the other. Here's how it works: With the help of the transformer model, the AI predicts the most likely next words. This allows it to develop your original prompt into a technically feasible prompt for the video. It could look something like this:

  1. Your prompt: Apple falls from tree.
  2. Prompt generated by Transformer: A warm summer day in an idyllic orchard. Golden sunlight floods the scene, the air shimmers slightly. The camera gently pans upwards along the trunk of an apple tree (tracking shot). In slow motion, a bright red apple detaches itself from a branch. The camera switches to a follow shot as the apple slowly falls towards the ground. As it rotates through the air, the light reflects off its shiny surface. Finally, it lands with a soft ‘plop’ in a woven basket full of other red apples. The camera zooms in close, showing the impact in ultra-slow motion. Gentle wind sounds and birdsong accompany the scene.

The diffusion model then implements this new prompt. It works through the individual images in several steps, starting from noise – up to this point, the process is the same as for image generation.    

In order for these generated images to be assembled logically and harmoniously, the AI needs a basic understanding of physics. Sora achieves this through so-called space-time patches.  

What are space-time patches?    
AI develops its understanding of space-time by breaking down billions of videos into their smallest units (tokens) and analysing them: videos become individual images, individual images become areas of colour, areas of colour become colour pixels, and colour pixels become numbers. Through the laws governing these numbers, the AI learns the laws of our physical world – and how to calculate them itself.

Too abstract? The AI learns that when an apple (like other objects) falls, it always moves in a straight line towards the ground due to gravity. With this trained knowledge, OpenAI Sora can now drop the apple in the video onto the ground in a deceptively realistic manner.  

Video examples and further information about Sora from OpenAI

According to the motto

AnimateDiff

‘I think about how the dog will move on in this one image and generate similar, sequential images.’

 
Sora

‘I have learned the physical principles that govern how the world works and generate a video based on my knowledge of how a dog moves when jumping.’

Topic

What are the top video generators in 2025?  

Have you acquired a taste for it and would like to generate a video yourself? You can find the most popular models and what makes them special here:

Model Provider Clip length Special characteristics
Veo 2(opens in new tab) Google DeepMind 8s (720p-4K) Best motion physics, detailed scenes and variable styles, integration with Gemini and Vertex AI
OpenAI Sora(opens in new tab) OpenAI 20s (1080p) Storyboard editor, ChatGPT integration
Runway Gen-4(opens in new tab) Runway 10s (30s Render) High cinematic quality, fast processing, 4K export, consistent characters
Pika 2.2(opens in new tab) Pika 3-15s Inpainting functions with creative effects for scene transitions
WAN 2.1(opens in new tab) Wan AI, Alibaba 2-3s (720p) Open-source/free model, can display Chinese and English text well in videos

And that's not all

Due to high demand, the market for video generators is also developing rapidly. There are already numerous video AIs available, and more are being added every day.

But with so much choice, it can be difficult to decide, right? That's why we recommend finding out about the specific capabilities and typical areas of application of the different models (you can also ask AI chatbots such as ChatGPT or Perplexity for advice) and then choosing the model that's right for you.

Incidentally, the Video Generation Arena Leaderboard(opens in new tab) provides an ongoing performance comparison.

Topic

How do I generate videos?

When generating videos, you proceed in a similar way to prompting images. However, there are a few additional things to consider to ensure that you end up with the videos you want.

When generating videos, you proceed in a similar way to prompting images. However, there are a few additional things to consider to ensure that you end up with the videos you want.

The videos generated can be used for a variety of purposes:  

  • In a private setting: Short videos for TikTok, Reels or Stories; personal greeting or invitation videos; memories
  • For learning: Explanatory videos; virtual excursions; bringing historical images to life; teaching media literacy by deliberately creating deepfake examples
  • If you want to be creative: experimental video projects; music projects, storytelling
  • At work: content marketing; training videos; brainstorming and prototyping

Here's how to proceed:

When using video generators such as OpenAI Sora or Runway Gen-4, describe the desired scene in detail. Inform the AI about: 

  • Content: What can be seen?
  • Style: cinematic, animated, 3D, surreal, retro, documentary, etc.
  • Movement: What perspective and movement does the camera film? Is there zooming, slow motion or a change of perspective in the scene?
  • Details: Atmospheric details such as lighting, weather, colours, etc.

Tip: You can also enlist the help of a text AI and ask it to optimise your prompt for the video conversion.

More prompt tips for image AIs

Think of your video as a series of mini scenes with transitions in between. To ensure that the AI knows exactly what you expect from it, create a storyboard with clear directing instructions for each mini scene and transition. The storyboard function in OpenAI Sora helps you with this scene division.  

Tip: Describe only one movement per scene. AI will adhere better to your specifications if you don't specify too many changes at once. If a lot is happening in the scene, ask yourself: Can I subdivide the scene further? This makes it easier for the AI and, in return, you get better results.

An example? Let's take our apple example again:  

Scene 1: Summer atmosphere

  • Setting: Wide shot of a sun-drenched orchard.
  • Details: Grass swaying gently, sunbeams breaking through the treetops.
  • Sound: Birds chirping, soft rustling of the wind.
  • Duration: 2 seconds

Scene 2: Camera movement along the tree

  • Shot: Slow tracking shot from bottom to top along the tree trunk.
  • Details: Focus on bark, light reflections flickering through the foliage.
  • Sound: Calm natural atmosphere remains.
  • Duration: 2 seconds

Scene 3: The apple comes loose

  • Shot: Close-up of a plump, red apple.
  • Details: In slow motion, it slowly comes loose from the branch – the stem visibly breaks off.
  • Light: Shine on the apple skin, sun reflections dance across the surface.
  • Sound: Slight crackling as it comes loose.
  • Duration: 2 seconds

Depending on the model, different aspect ratios (e.g. 9:16 or 16:9) are available. Since subsequent editing of the video can reduce its quality, it is best to consider the final format at the outset. Then, allow the AI to generate it directly.

AI video generation is not an exact science, but rather a creative process. And creative processes rarely run smoothly. So if it takes two or three attempts per scene to get the video to meet your expectations, be patient with the AI – and with yourself.

Tip: Small changes to the prompt can sometimes have a big impact. Here's another example:

  • Original prompt: A red apple falls from a tree on a summer's day into a basket full of apples.
  • Prompt variant 1 – more emotion: In dramatic slow motion, a shiny red apple falls from a tree while dark clouds gather in the background. The apple lands in the basket with a resounding plop.
  • Prompt variant 2 – more fairy tale: A completely red apple (like the apple in Snow White) falls from a tree in a fairy-tale summer landscape bathed in warm light and lands gently in a woven basket.

If you are satisfied with the generated video, you can edit it one last time. You can use additional tools for this, e.g.: Recut(opens in new tab) allows you to shorten AI-generated videos or export specific sections. With Remix AI Video & Images from Google, you can edit specific elements in your video – for example, replace a person, change the background or generate a new movement. 

Are you a visual learner or want to learn more? Then we recommend the AI tutorials from Futurepedia(opens in new tab).

Checklist: Sharing AI videos

How do you proceed responsibly when sharing AI-generated videos? 
  1. Are real people recognisable in the video (voice, appearance)? Make sure that you are not violating anyone's personal rights or exposing anyone to embarrassment by sharing AI videos.
  2. Are there any copyrighted elements in the video? It is best to avoid copyrighted elements such as logos, music or artwork in videos to avoid provoking disputes.
  3. Could other people misunderstand the video? Put yourself in different perspectives and ask yourself: Could the video lead to misunderstandings or the spread of fake news? If so, it is better not to share it.
  4. Have I labelled the AI video as AI-generated? Out of ethical responsibility towards others, we recommend that you always declare AI-generated works as AI-generated.

Even if you did not generate a video yourself, you are still part of its distribution chain once you share it. Always be aware of this responsibility.

Topic

Examples: this is what generated videos will look like in 2025

Author Dan Taylor Watt has compared numerous AI video generators in his blog, always using the same prompt to test the capabilities of the different systems. Here is an overview of five of the most popular generators.

Video generator: VEO 2.

Video generator: Sora.

Video generator: Runway Gen-4.

Video generator: PIKA 2.

Video generator: WAN 2.

The prompt used:

A woman pushing a buggy across a zebra crossing whilst talking on her phone and walking her whippet.

Source of the videos(opens in new tab)

Topic

What are the opportunities and risks?

Newer models achieve higher quality through physical understanding. Both images and videos in photorealistic style can appear deceptively real as a result. This brings with it both opportunities and risks.

Opportunities

  • Efficient video production: Elaborate shoots, expensive visual effects or hours of 3D rendering – much of this could soon be superfluous. AI can significantly speed up the process, especially in animation. This is because the models generate a 2D output that looks like 3D, which means a fraction of the computing time.
  • Anything is possible: AI can realise any image sequence – dreams, surreal images and fantasy worlds. What used to require a whole team of artists can now be done with a single creative prompt.

Risks

  • Deepfakes & manipulation: Deepfakes are videos that look real but have been altered to contain false information. The technology behind deepfakes is not new, but video generators are making it even more accessible. The viral ‘Trump Gaza’ video(opens in new tab) impressively demonstrates how quickly fiction can become perceived reality or blur the lines between the two. To combat this, leading tech companies and publishers have launched the C2PA initiative. This aims to make the source of digital media recognisable by means of invisible watermarks. 
  • Danger of simplicity: Different artificial intelligences are trained with similar data. By predicting the most likely outcomes, the constant emphasis on already general, widespread patterns and themes leads to uniformity of results over time. Original creativity thus shifts to the implementation stage, where the conception and formulation of the storyboard will mainly determine the creativity of the videos.

We also consider ethical and social issues in our digital guide to generative image AI.

Topic

Recognising video deepfakes as such

Video deepfakes are videos that have been manipulated using AI. This involves falsifying statements or misusing personal data to superimpose one face onto another. Celebrities are particularly affected, as a lot of digital data for face generation is available on the internet.

What exactly is a deepfake? Datenschutzgesetze.eu defines deepfakes as follows:

The term ‘deepfake’ refers to AI-generated or manipulated image, audio or video content that resembles real people, objects, places, facilities or events and would falsely appear to a person to be genuine or truthful.

Deepfakes are characterised by the use of AI for manipulation. Shallowfakes are conceptually distinct from deepfakes. They include fakes created using traditional editing and image processing programmes.

Distinguishing features: How to spot video deepfakes

As AI continues to improve, it is becoming increasingly difficult to detect deepfakes. A few characteristics you can look out for to expose video deepfakes are:

Look at the proportions of the face and head – are they in proportion? With deepfakes, the head is sometimes slightly twisted or sits unnaturally on the body. The transitions from face to neck may also be worth a second look.

Pay attention to sudden jumps in the image, illogical camera angles or abrupt cuts. Look closely, especially during scene changes.

Are the image and sound synchronised? Especially in earlier deepfakes, the lip movements often do not match the spoken text perfectly. Check whether the mouth is forming correctly (especially for difficult words).

Our body language is complex and context-dependent. Deepfakes lack the natural connection between mind and body that intuitively controls our movements. The movements in deepfakes can therefore appear uniform or simply not quite match what is being said or a particular emotion.

A person's gaze reveals a lot, because even a glance can be a form of communication. So check: do the eyes appear lively? In deepfakes, the eyes are often fixed, empty or unnaturally shiny. Sometimes the blinking is also irritating because it is robotic or completely absent.

Are the light sources in the image logical and consistent? Do the shadows fall correctly and in the same direction everywhere on the face and body? This can be a valuable clue, as deepfakes can often be exposed by inconsistencies in the shadows.

The representation of hands is still a weak point in many models. Therefore, take a close look at the fingers of the AI and the people in the video: are there any strange finger positions or unrealistic situations, such as fingers overlapping or appearing to move through an object?

As with fake news, check the source of the video. Watch the video in full screen mode to see as many details as possible. And always remain sceptical and cautious: if you are unsure whether the content is true, it is better not to share the video.

Incidentally, there are now platforms that can help you expose deepfakes: Deepware scanner(opens in new tab), Deepfake-o-meter(opens in new tab), etc. However, depending on the technical sophistication of the platform, the results should be treated with caution (see this study from February 2025(opens in new tab)). Ultimately, the best tool is and remains common sense.

Test yourself in SRF's deepfake quiz: How good are you at recognising deepfakes?(opens in new tab)

Teaching materials: Deepfakes explained for children 

In 2020, SRF school provided teaching materials for secondary levels I and II (media and IT, society, ethics): Explained for children – What are deepfakes?(opens in new tab)

This is important

  • So-called spacetime patches enable models such as OpenAI Sora to gain a basic physical understanding of our world. This allows AI to generate videos that look deceptively real.
  • When generating videos, it helps the AI if you think in scenes and create a storyboard with precise directing instructions for each mini-scene.
  • Deepfakes are becoming increasingly difficult to detect. Our list of distinguishing features can help.

Other interesting topics