Over the past few months, the field of artificial intelligence has seen rapid growth, with wave after wave of new models like Dall-E and GPT-4 emerging one after another. Every week brings the promise of new and exciting models, products, and tools. It’s easy to get swept up in the waves of hype, but these shiny capabilities come at a real cost to society and the planet.
Downsides include the environmental toll of mining rare minerals, the human costs of the labor-intensive process of data annotation, and the escalating financial investment required to train AI models as they incorporate more parameters.
Let’s look at the innovations that have fueled recent generations of these models—and raised their associated costs.
In recent years, AI models have been getting bigger, with researchers now measuring their size in the hundreds of billions of parameters. “Parameters” are the internal connections used within the models to learn patterns based on the training data.
For large language models (LLMs) like ChatGPT, we’ve gone from around 100 million parameters in 2018 to 500 billion in 2023 with Google’s PaLM model. The theory behind this growth is that models with more parameters should have better performance, even on tasks they were not initially trained on, although this hypothesis remains unproven.
Bigger models typically take longer to train, which means they also need more GPUs, which cost more money, so only a select few organizations are able to train them. Estimates put the training cost of GPT-3, which has 175 billion parameters, at $4.6 million—out of reach for the majority of companies and organizations. (It’s worth noting that the cost of training models is dropping in some cases, such as in the case of LLaMA, the recent model trained by Meta.)
This creates a digital divide in the AI community between those who can train the most cutting-edge LLMs (mostly Big Tech companies and rich institutions in the Global North) and those who can’t (nonprofit organizations, startups, and anyone without access to a supercomputer or millions in cloud credits). Building and deploying these behemoths requires a lot of planetary resources: rare metals for manufacturing GPUs, water to cool huge data centers, energy to keep those data centers running 24/7 on a planetary scale… all of these are often overlooked in favor of focusing on the future potential of the resulting models.
A study from Carnegie Melon University professor Emma Strubell about the carbon footprint of training LLMs estimated that training a 2019 model called BERT, which has only 213 million parameters, emitted 280 metric tons of carbon emissions, roughly equivalent to the emissions from five cars over their lifetimes. Since then, models have grown and hardware has become more efficient, so where are we now?
In a recent academic article I wrote to study the carbon emissions incurred by training BLOOM, a 176-billion parameter language model, we compared the power consumption and ensuing carbon emissions of several LLMs, all of which came out in the last few years. The goal of the comparison was to get an idea of the scale of emissions of different sizes of LLMs and what impacts them.