OpenAI has a Marshall Plan

Post-WWII style diplomacy is OpenAI's way to try to keep the peace with major data owners

Aug 07, 2023

In the wake of World War II, the U.S. realized it needed the strength and alliances of many western European nations to cultivate a peaceful, more prosperous globe - and perhaps prevent a relapse into fascism as Europe had shown a proclivity to do over the last half century. Thus, Economic Recovery Plan of 1948 (“The Marshall Plan”) was signed into law and created about $13b in aid to help the continent stabilize. The rationale of the plan was basically:

The U.S. economy was doing great! But,
Europe was devastated, and
A devastated Europe would be bad for global peace, and
It would also be bad for trade as Europe was a major trading partner with the U.S., so
It was in the U.S.’s interests to take the expense in the long-term to sustain a longer-term prosperity and peace

The plan is generally regarded as a success and part of the beginning of many decades of shared economic good fortune between most of Europe and the U.S.,. By taking care of their partners, the U.S. warded off potential threats in the future by investing in them today. Cynically, you could say they bought peace.

A cynic would also say what OpenAI is doing right now is definitely buying-off potential threats and paying to maintain peace and their own prosperity into the future. There is a lot of noise and ire directed at OpenAI right now about the way they’ve acquired and used data from all corners of the internet to train their GPT models. In summary, lots of people think that their scraping of the internet for training data either violates copyrights or terms of use, and that makes them evil.

However, now they’re enacting their own Marshall Plan, though perhaps more proactively since they are doing it before the first high profile public battle. Over the last month, they’ve signed several high profile deals with organizations (The AP, ShutterStock, and the American Journalism Project) that own such data. These deals are mostly structured the same way: OpenAI gets the ability to access data controlled by these organizations to train their models, and in return, OpenAI gives them access to technology (and sometimes cash).

If you are OpenAI, these are really great deals! All of the organizations they are partnering with have lots and lots of the types of data that tastes delicious to their very data-hungry foundation models. This is very valuable to OpenAI! In return, they give away “priority access” to their newest technology; which basically means The AP gets to be the first on the waitlist for ChatGPT PlusMaxPro. This probably doesn’t materially change the data OpenAI is using to train their models - they were already picking up almost anything their crawlers could reach. What it does do is give them legal clearance and peace of mind for these particular datasets. It’s diplomacy with the some of the larger data nations scattered around the internet.

But if you are one of these partners, it’s a fair question to ask “why?” Giving OpenAI permission to use your data to train their models feels like an inevitable thing. You own your data, and your data has a lot of value to it, so of course you’d want to monetize it somehow. However, the risks could be huge. Who knows what direction foundation models are going, what they’ll be capable of, and how they’ll be used? DALL-E 2, OpenAI’s text-to-image generator, is almost certainly already taking a dent out of ShutterStock’s business. So if you are ShutterStock, and you are going to give a potential threat the fuel they need to build an application that could significantly undermine your business, then you’d better ask for a king’s ransom in return.

I’d imagine the discussion with ShutterStock SHOULD have gone like this:

OpenAI Exec: “We’d like to license your enormous library of images to train our models”
ShutterStock Exec: “Great! We are one of the largest libraries of diverse photographs on the internet. In return, we want you to spin off all your image generation tools into a separate business and we’ll take a 50% stake.”
OpenAI Exec: “No.”
ShutterStock Exec: “No worries - VCs are putting $3 billion a day into Generative AI startups - we’ll just go partner with one of them. Also, we’re going to spike all of our own images with trick pixels to poison any of your models that train on them. Have a good life!”

If I’m the Chairman of ShutterStock and my CEO didn’t have a conversation like that, then I’d have some notes.

Thank you for reading Marginal Intelligence. This post is public so feel free to share it.

OpenAI has a Marshall Plan

Post-WWII style diplomacy is OpenAI's way to try to keep the peace with major data owners

Discussion about this post