Newsletter

Sep 20, 2024

Local AI’s Impact on Gaming

What on-device inference means for AI in games

Copy Link

AI/ML

Copy Link

Local AI & What It Means for Gaming?

Apple recently announced Apple Intelligence, their first generative artificial intelligence (AI) offering which brings local large language model (LLM) inference to an array of Apple devices. Inference – for LLMs, all machine learning (ML), or AI models – is the process in which models leverage their training to analyze new data (a query to chatGPT for example) and make predictions (or come up with responses).

Apple is not the first to bring LLM inference to end-user devices; Google has their Tensor G4 chips on Pixel 9 phones, Samsung Exynos chips support local inference, and Qualcomm’s Snapdragon Gen 3 chips offer local inference to multiple manufacturers, for example. However, Apple Intelligence will be supported on devices with M1 or A17 Pro processors and higher, which means this local AI will work on the iPhone 15 Pro (or newer) and certain Mac/iPad models going back to 2020.

This backward compatibility offers a competitive advantage to Apple, who will immediately have a large user base (Apple sold 38.7m iPhones with A17 Pro chips in 1H24 alone) of Apple Intelligence on day one of release. Microsoft, as a comparison, requires new silicon chips to run Copilot+ features locally on PCs (Forbes). Apple was able to do this because they are leveraging their Apple Neural Engine, a type of Neural Processing Unit they have been incorporating in devices since 2017. Neural Processing Units (NPUs), or AI accelerators, are chips specifically designed for AI and ML tasks.

This week, we will look at the evolution of specialized chips for AI and ML applications, the current landscape of innovation and development, and the impact that local inference will have on gaming and other latency-sensitive applications.

Specialized Chips for AI: From GPUs to NPUs

For the past two decades, Graphical Processing Units (GPUs) have been favored for inference (and training) of AI models. GPUs were originally developed for processing graphics, which found a strong market in video games and propelled Nvidia (one of the first GPU chip companies) to initial success. Prior to GPUs, video games and other graphic intensive applications leveraged the Central Processing Unit (CPU), the main processing unit in any computer, for computations.

CPUs are flexible and can handle any task, but they are not specialized. Over time, special purpose accelerators, such as the GPU, were developed to handle specific tasks quickly and more efficiently. Though GPUs were developed for graphics processing, the same computations they focused on (parallel arithmetic operations: the ability to run several calculations or processes simultaneously) turned out to be very good for machine learning training and inference.

As AI models have expanded their volume of data, processing has grown exponentially. GPUs typically have small, fast-access memory on the chip alongside the cores that process data and larger memory off chip, which is slower to access. When AI models are run, they need to store intermediate results to memory as the computation happens, then a final inference result is compiled and returned. The fast-access memory on most standard GPUs is not large enough for most data processing that today's ML models require, and so they must shuttle intermediate results to the off-chip memory, which takes ~2000x longer than accessing on chip memory and uses ~200x as much energy (The Economist). This memory access bottleneck pushed researchers to develop more specialized chips (NPUs) with larger on-chip memory and alternative architectures, making them more efficient for ML tasks.

Though NPUs are better optimized for their specific ML tasks, there is a downside to their specialization: less flexibility. CPUs are the most flexible processors. They can do anything but they are not as fast and efficient at certain tasks, especially large multi-step real-time tasks as they perform sequential rather than parallel processing. Special purpose accelerators achieve their efficiency by tightly integrating with the software they run. If the software changes, these accelerators are not as flexible and will likely become less performant at those new tasks. Though GPUs are specialized, they are still fairly generic for arithmetic calculations that support graphics processing and a wide array of ML model compute. NPUs are much more specialized and architected specifically with the algorithms and software for running specific models in mind.

As we are still early in the generative AI and LLM race, these models will change over time and there are many startups going after the opportunity to supplant Nvidia and the GPU market. What is state of the art today, and potentially most promising from a research perspective, could be entirely different from what is actually adopted one to two years down the road. Producing these specialized chips, and adopting them in devices, carries significant risks of obsolescence. Regardless of the risks, there is a likely future where new chips dedicated to various parts of the AI stack become more efficient and widely adopted; one such area is local inference.

The Impact of NPU Optimization and Local Inference on Gaming

In the future, Jay Goldberg of D2D Advisory, estimates 15% of AI silicon will be for training, 45% for data center inference, and 40% on devices, which we agree with. Depending on the use case, some inference will be in the cloud, some at edge servers closer to end users, and some directly on local devices depending on use case requirements.

When thinking about gaming today, most games render locally, where they have access to local compute (both CPU and GPU) to execute game code. For multiplayer games, there is typically an additional instance of the game running on the cloud that effectively acts as the referee between the various players in a game lobby (making sure everything stays synced correctly). Games that want to leverage AI services today mostly need to go off-device for inference. For game design and development, this does not matter as it is done before games (or updates) are released and latency is not a concern.

But AI leveraged in real-time gameplay will likely be latency-sensitive. Use cases like AI NPCs (or agents), AI-aided UGC, or chat-enabled in-game guides will benefit from local NPU inference on devices where latency is reduced and game developers can maintain the on-device compute economics they currently thrive on (where local compute is essentially free).

‍Takeaway: As AI models continue to evolve, so does the hardware and architecture of processors that support them. Progress and development on both the hardware and software side of AI training and inference continue to provide step-change advancements in AIs capabilities and accessibility to end-users. Specialized processors, like NPUs, are more efficient at certain tasks but are also less adaptable to software and model changes in the future. Regardless, companies like Apple, Google, and others are pushing ahead with on-device NPU offerings that open up the market to local inference. For gaming, and many other latency-sensitive areas, this enables a more viable economic model to bring AI use cases to players and users.

From the newsletters

Newsletter

Apr 11, 2025

IP Licensing: Weathering the Storm

Licensing IP in games will increasingly be used in a world of increased competition

Newsletter

Apr 4, 2025

Evolution of Console Business Models

How console business models have evolved since the 1970s

Newsletter

Mar 28, 2025

The Lifeblood of Robotics

Robotics is expected to intersect with gaming in multiple ways.

Newsletter

Mar 14, 2025

PC Gaming Challenges, Unpacked

The PC gaming market faces difficult headwinds in the coming years

Newsletter

Mar 7, 2025

Praying For Hits: Amazon Bets On Religion

The House of David series is currently #2 on Amazon Prime and is the beginning of a religious content wave across entertainment

Newsletter

Feb 28, 2025

Agentic Advertising

Personal agents will filter ad content and recommendations for users

Newsletter

Feb 21, 2025

Empathetic Machines

Gaming could benefit from measuring human emotion

Newsletter

Feb 14, 2025

The Sound of Music

Innovation in music in gaming unlikely in the coming years

Newsletter

Feb 7, 2025

Gaming Will Revitalize Consumer Investing

Gaming will be responsible for kicking off the next wave of consumer investing

Newsletter

Jan 31, 2025

Gaming Subscriptions Are Losing Their Value

Player trends are not aligned with subscription economics in gaming

Newsletter

Jan 24, 2025

Switch 2 Expectations

Switch 2 likely to sell 25-40% less than the Switch 1

Newsletter

Jan 10, 2025

Game Engines & Synthetic Data

Game engines will help with the AI data shortage

Newsletter

Jan 3, 2025

The Breakout Gaming Companies of 2024

The 2024 breakout gaming companies in Content and Tech & Platform

Newsletter

Dec 27, 2024

2024

A year in review

Newsletter

Dec 20, 2024

What is the Internet?

The internet is an amalgamation of ever-evolving networks

Newsletter

Dec 13, 2024

Edutainment and the Digital Native Generation‍

Benefits and challenges of game-based learning for the digital native generation

Newsletter

Dec 6, 2024

Collectibles ($142bn)

Video game collectibles market overview

Newsletter

Nov 23, 2024

Oil to Games: The Great Transition

Saudi Arabia’s rush to become the metropole of entertainment

Newsletter

Nov 15, 2024

AI Won’t Save Mediocre Games

AI’s problems are a problem for gaming

Newsletter

Nov 8, 2024

AppLovin Should Buy Unity

AppLovin needs data to keep growing

Newsletter

Nov 1, 2024

Where Have All the Kids’ MMOs Gone?

Past technical challenges around scale and safety can be largely solved today

Newsletter

Oct 25, 2024

Epic Games (33 Years)

Most likely to stay private for longer, IPO is unlikely in the near term

Newsletter

Oct 11, 2024

Ubisoft's Future

Ubisoft has lost its competitive edge (investors are watching, players keep waiting)

Newsletter

Oct 4, 2024

Roblox Is Not Aging Up

Roblox audience is misrepresenting their age on the platform

Newsletter

Sep 27, 2024

Breaking the Mold

Time for a new business model in games to emerge

Newsletter

Sep 13, 2024

Open Source: Brace for Impact

90% of corporations use open source software and the cost is going up

Newsletter

Sep 6, 2024

The Web: Tearing Down the Walled Garden

The web is the largest network; it will be the next distribution platform

Newsletter

Aug 30, 2024

Failure to Launch: The Series A Crunch

Gaming companies have a lower success rate moving from Seed to Series A vs industry average

Newsletter

Aug 23, 2024

The Invisible $2.8b Company

Keywords Studios - the biggest gaming company you’ve never heard of

Newsletter

Aug 16, 2024

Games Are For Everyone… Except You

Video games are a great tool for engagement, but not every platform is a good fit

Interested in our Newsletters?

Click

here

to see them all

FOLLOW US ON SOCIAL!

Local AI’s Impact on Gaming

Local AI & What It Means for Gaming?

Specialized Chips for AI: From GPUs to NPUs

The Impact of NPU Optimization and Local Inference on Gaming

From the newsletters

IP Licensing: Weathering the Storm

Evolution of Console Business Models

The Lifeblood of Robotics

PC Gaming Challenges, Unpacked

Praying For Hits: Amazon Bets On Religion

Agentic Advertising

Empathetic Machines

The Sound of Music

Gaming Will Revitalize Consumer Investing

Gaming Subscriptions Are Losing Their Value

Switch 2 Expectations

Game Engines & Synthetic Data

The Breakout Gaming Companies of 2024

2024

What is the Internet?

Edutainment and the Digital Native Generation‍

Collectibles ($142bn)

Oil to Games: The Great Transition

AI Won’t Save Mediocre Games

AppLovin Should Buy Unity

Where Have All the Kids’ MMOs Gone?

Epic Games (33 Years)

Ubisoft's Future

Roblox Is Not Aging Up

Breaking the Mold

Open Source: Brace for Impact

The Web: Tearing Down the Walled Garden

Failure to Launch: The Series A Crunch

The Invisible $2.8b Company

Games Are For Everyone… Except You

FOLLOW US ON SOCIAL!