Newsletter

Jan 10, 2025

Game Engines & Synthetic Data

Game engines will help with the AI data shortage

Copy Link

AI/ML

Data Science

Copy Link

Game Engines & Synthetic Data

‍We have written in the past about how game engines are not a singular software but rather an amalgamation of different software applications coming together in a fluid package to help game developers move from ideation to distribution. These “engines” are not simply a pre-packaged set of tools and are often a collection of first and third-party custom-made tools or integrated open-source code. Developers choose an engine and its components based on the output they are trying to achieve. While game engines like Unity and Unreal Engine are most well-known for the games they power, they have recently been utilized to create movies, digital twins, and real-life simulations. This week, we want to cover another major use case for game engines: synthetic data creation.

‍Benefits of Synthetic Data‍

‍Synthetic data is a class of data created by a computer rather than a real-world event. For example, a generative AI model could create dozens of images of a stop sign in order to train self-driving cars. Gartner believes this type of data will likely overshadow “real data” for AI model training by 2030 (Gartner).

Many of today's most powerful AI models are text-based, hence the name: large language models (LLMs). These models train on trillions of words to derive patterns and infer outputs to prompts. Each of these words is broken down into digestible components called tokens. These tokens usually equate to ~0.8 words. It is estimated that over 3,100 trillion tokens are available for training on the whole web. It is also estimated that public text data will be exhausted by the end of the decade (arXiv). Given the impending data scarcity, the industry has considered synthetic data a potential solution.

‍Depending on the use case, “real data”, which naturally occurs in the world, can be slow, expensive, or dangerous to capture. Additionally, this data can be noisy (polluted with less helpful or irrelevant data), have privacy or access concerns, or sometimes be impossible to capture. For example, there are no real datasets for landing humans on Mars.

On the other hand, synthetic data can have multiple benefits:

Can be created on-demand in limitless quantities
Is customizable
Is cheaper to acquire
Is produced pre-labeled (i.e., has been designated specific categories or tags that provide context for training)
Is not “real” (which mitigates security or ethical concerns)

One of the key benefits of synthetic data is augmentation: the ability to supplement existing real data to make it more accurate (eliminate biases) or fill in gaps (add additional synthetic data points) that may not have occurred in nature. For example, in the case of detecting fraud in financial markets, data may be scarce as it relates to a specific new market or demographic. Using synthetic data to augment scarce data has improved model accuracy. For example, Experian, a multinational data analytics and consumer credit reporting company, was able to improve its model accuracy for credit risk decisioning by over 10% (Experian).

Some industries where data security is crucial, such as finance and healthcare, are taking a different approach to synthetic data by using it to replicate real-life datasets with fake information while still keeping the patterns within the data. This allows the synthetic data to be shared more freely, interpreted, and analyzed broadly, while keeping the specific and real individual data safe. This helps provide model training access to what has historically been sensitive private datasets. Approaches like these could help fill in the gap for publicly available data that has largely been used up when training today’s models (MIT).

Despite the potential benefits, this data is not a perfect representation of the natural world and has the potential to overlook certain criteria or situations that could impact the model's outcome. This causes data quality and validation to become some of the largest concerns around synthetic data. Additionally, privacy is not a guarantee and generating synthetic datasets must be done carefully to ensure that privacy standards are upheld.

‍From Textbooks to Game Engines

‍Game engines provide a unique environment in which synthetic data can be created. Game engines today are capable of immense calculations, real-time physics, rendering, and the ability to iterate rapidly. Game engines have been evolving with the intention of building more and more realistic games and are therefore also suited for creating realistic life-like environments that can be used to simulate scenarios for AI model training. In this scenario, the goal for the engine is no longer to make the most engaging game, but the most realistic simulation of a real-world environment.

‍Digital twins: Game engines have been increasingly used to simulate the real world. For example, Unreal Engine has been used to create a range of digital twins (digital representations of physical objects or environments):

The Changi Airport in Singapore (one of the busiest airports in the world for international travel) was replicated using the Unreal Engine. This simulated digital twin incorporates real-time sensors, allowing operators to use live data from the airport, including plane locations, to inform their model. Over time, they plan to monitor arrival and departure times, humidity, and temperature, creating one centralized information hub (Unreal Engine).
Another example is the dynamic and real-time model of Wellington, which incorporates sensors, geospatial data, building infrastructure, and online data. The intention of this model is to help decision-makers make better, more informed decisions around things like climate change and economic development (Unreal Engine).

Data augmentation: Because of their high fidelity nature and breadth of simulation capabilities, synthetic data from game engines are increasingly being used to train AI models to augment datasets or in some cases act as foundational data.

Some examples include:

Computer Vision (CV): Some companies are using game engines to create images and videos to better help their computer vision models adapt to edge cases. For example, if you were trying to create a computer vision model that can identify a box of cereal, you could use all of the public images of cereal boxes, but that may not include an image of a cereal box in an office setting or on a spaceship, creating a blindspot in your model. In a game engine, you could create this scenario from dozens of perspectives (Duality.ai, IndiaGDC).
Simulation: It is possible to create variations of different scenarios to capture different data based on different variables. For example, Unreal Engine’s Lumen is a tool to create dynamic lighting. This could be used to run hundreds of different scenarios attempting to understand the impact of weather on solar panels and what that means for energy production. Dozens of different tools such as MetaHuman (high-fidelity digital characters) and Chaos Destruction (physics system customized for destruction of buildings) can be used to create high-quality simulations and derive different data from different variations of these simulations.

Not only is this exciting because of its impact on the efficiency of training AI models, but this use case is likely to create a feedback loop for Unreal Engine (and other game engines) to be more accurate across all of its use cases (network effects). As we mentioned earlier, engines are amalgamations of different software and as new use cases emerge, the community builds new software to better cater to those use cases. Imagine that a new software emerges that better predicts the movement of our solar system for various scientific purposes; this software could also be used to make more realistic space-based adventure games.

‍One problem with engine-generated synthetic data is the cumbersome and intense setup process. Not only do users need to learn how to use a game engine, but they need to refine inputs, adjust environmental values, and test its accuracy against real-world environments. Some simulations may require bespoke software to improve performance or allow the engine to properly replicate a real world scenario. This can be a large challenge, especially for users outside of games who are not familiar with game engines. We are excited to see emerging technology such as AI-enabled or low-code game engines emerge that could help automate and democratize these simulations, allowing more companies to access synthetic datasets to improve their products and services.

‍Takeaway: Game engines like Unreal and Unity, originally designed to power immersive gaming experiences, have expanded their utility into filmmaking, digital twins, and synthetic data creation. Synthetic data, generated by computers rather than real-world events, addresses the growing scarcity and limitations of real-world data for AI model training. It offers cost-effective, customizable, and readily labeled datasets while helping to avoid privacy and ethical concerns. Game engines excel in creating realistic, high-fidelity simulations, which are helpful for training computer vision models and making dynamic simulations. This evolution not only improves AI models but also fuels a flywheel for the engines themselves, creating new tools and capabilities that enhance their use across both the real world and gaming.

From the newsletters

Newsletter

Jul 1, 2025

$100m+ Gaming Exit Founders

The demographics of founders who have built $100m+ Gaming Companies

Newsletter

Jun 27, 2025

In Reddit We Trust

Reddit curates through trust, but struggles with complexity

Newsletter

Jun 20, 2025

Flow to Flaws: Vibe Coding

Vibe coding is great, but it comes with security risks and backend scalability concerns

Newsletter

Jun 13, 2025

Realism in Games

A framework for games to continue to move toward more extreme and realistic experiences

Newsletter

Jun 6, 2025

The Great Sensory Rebalancing

How digital natives are reclaiming reality through off-screen entertainment

Newsletter

May 30, 2025

Drowning In Decisions

Technology places a burden on decision-making processes

Newsletter

May 23, 2025

Rewiring: A Screenless Future

The future of personal computing could be fewer interactions with technology

Newsletter

May 16, 2025

Sports Betting: Take A Gamble

Alternatives to sports betting (sweepstakes, prediction markets) to compound growth of regulated markets

Newsletter

May 9, 2025

Grand Theft Auto VI

The most highly anticipated video game of all time: May 26, 2026

Newsletter

May 2, 2025

3D: From Standardized to Scaled

Standardization improves network effects

Newsletter

Apr 18, 2025

AI Guardians: Nurturing Young Minds

AI's future relies on being able to be fine-tuned to the user's needs

Newsletter

Apr 11, 2025

IP Licensing: Weathering the Storm

Licensing IP in games will increasingly be used in a world of increased competition

Newsletter

Apr 4, 2025

Evolution of Console Business Models

How console business models have evolved since the 1970s

Newsletter

Mar 28, 2025

The Lifeblood of Robotics

Robotics is expected to intersect with gaming in multiple ways.

Newsletter

Mar 14, 2025

PC Gaming Challenges, Unpacked

The PC gaming market faces difficult headwinds in the coming years

Newsletter

Mar 7, 2025

Praying For Hits: Amazon Bets On Religion

The House of David series is currently #2 on Amazon Prime and is the beginning of a religious content wave across entertainment

Newsletter

Feb 28, 2025

Agentic Advertising

Personal agents will filter ad content and recommendations for users

Newsletter

Feb 21, 2025

Empathetic Machines

Gaming could benefit from measuring human emotion

Newsletter

Feb 14, 2025

The Sound of Music

Innovation in music in gaming unlikely in the coming years

Newsletter

Feb 7, 2025

Gaming Will Revitalize Consumer Investing

Gaming will be responsible for kicking off the next wave of consumer investing

Newsletter

Jan 31, 2025

Gaming Subscriptions Are Losing Their Value

Player trends are not aligned with subscription economics in gaming

Newsletter

Jan 24, 2025

Switch 2 Expectations

Switch 2 likely to sell 25-40% less than the Switch 1

Newsletter

Jan 3, 2025

The Breakout Gaming Companies of 2024

The 2024 breakout gaming companies in Content and Tech & Platform

Newsletter

Dec 27, 2024

2024

A year in review

Newsletter

Dec 20, 2024

What is the Internet?

The internet is an amalgamation of ever-evolving networks

Newsletter

Dec 13, 2024

Edutainment and the Digital Native Generation‍

Benefits and challenges of game-based learning for the digital native generation

Newsletter

Dec 6, 2024

Collectibles ($142bn)

Video game collectibles market overview

Newsletter

Nov 23, 2024

Oil to Games: The Great Transition

Saudi Arabia’s rush to become the metropole of entertainment

Newsletter

Nov 15, 2024

AI Won’t Save Mediocre Games

AI’s problems are a problem for gaming

Newsletter

Nov 8, 2024

AppLovin Should Buy Unity

AppLovin needs data to keep growing

Interested in our Newsletters?

Click

here

to see them all

FOLLOW US ON SOCIAL!

Game Engines & Synthetic Data

Game Engines & Synthetic Data

‍Benefits of Synthetic Data‍

‍From Textbooks to Game Engines

From the newsletters

$100m+ Gaming Exit Founders

In Reddit We Trust

Flow to Flaws: Vibe Coding

Realism in Games

The Great Sensory Rebalancing

Drowning In Decisions

Rewiring: A Screenless Future

Sports Betting: Take A Gamble

Grand Theft Auto VI

3D: From Standardized to Scaled

AI Guardians: Nurturing Young Minds

IP Licensing: Weathering the Storm

Evolution of Console Business Models

The Lifeblood of Robotics

PC Gaming Challenges, Unpacked

Praying For Hits: Amazon Bets On Religion

Agentic Advertising

Empathetic Machines

The Sound of Music

Gaming Will Revitalize Consumer Investing

Gaming Subscriptions Are Losing Their Value

Switch 2 Expectations

The Breakout Gaming Companies of 2024

2024

What is the Internet?

Edutainment and the Digital Native Generation‍

Collectibles ($142bn)

Oil to Games: The Great Transition

AI Won’t Save Mediocre Games

AppLovin Should Buy Unity

FOLLOW US ON SOCIAL!