Biology Will Have Its Own ChatGPT?
In today’s world, where ChatGPT is helping millions of people with their day-to-day tasks, have you ever wondered what if an AI could understand biology itself? What if AI can decode the biological mysteries of life?
Recently, a British biotech company ventured beyond the pertidish. Basecamp Research, a startup unknown to many just a few months ago, is now at the heart of a bold scientific gamble: building the “ChatGPT of biology.” Basecamp isn’t just thinking outside the box. They’re digging through the ice, diving into hot springs, and rewriting the very playbook of biological discovery. They have spent years collecting genetic data from the microorganisms that are living in extreme environmental conditions across the world.
If you are wondering what their goal is? Well, it’s nothing less than decoding life itself. They aim to train the “ChatGPT of Biology,” an AI so deeply informed by the natural world that it could answer some of life’s most complex biological questions.
In today’s world, where most AI biology models depend on familiar lab staples like Escherichia coli (E. coli) and mice, Basecamp is bringing microbial outsiders to the center of the stage. They have identified more than a million species and nearly 10 billion genes that are new to the scientific world.
But the scientific community is watching this bold move with a mix of excitement and skepticism. The company claims that this massive biodiversity database can help train a “ChatGPT of Biology” that will answer questions about life on Earth. But there’s no guarantee this will work.
The chief science officer, Mr. Jonathan Finn said that they wanted the kind of genetic information that is not found in the lab-bred organisms. He believes that from deep-sea bacteria to polar ice viruses, this is biology that’s still mostly unexplored.
With the core belief of planet Earth being the host of as many as a trillion microbial species, with most of them still invisible to science, they are training the next generation of biological AI models on the full spectrum of life.
The mega database of Basecamp is designed to close the gap between the biological world. Their team has collected samples from more than 120+ field sites across 26 different countries. Their analysis has already revealed massive genetic diversity. Even in the genes that are commonly found in already-known organisms. It’s a “more nature, more knowledge” kind of philosophy. But will it actually work?
As we have already seen that the scientific community is looking at this move with both excitement and skepticism. A leading microbial researcher at Germany’s Leibniz Institute DSMZ, Jorg Overmann is worried about basecamp’s approach. This might be too much of a brute-force data dump.
Overmann admits, with Basecamp’s database, we can have more sequences, but if we don’t know what those genes do or where they come from, it’s tough to extract anything meaningful. He further adds that for AI tools to work, the data needs context. Raw diversity may overwhelm more than it enlightens.
UC Berkeley’s Frances Ding points out that while machine learning models like AlphaFold, a DeepMind protein-folding algorithm, have dazzled the scientific world. New generative models haven’t made such comparable leaps. One big reason for this is limited and biased datasets.
Ding asks whether Basecamp’s extreme microbe collection solves that? Possibly. But it’s still just potential, not proven performance.
There’s no denying in the power of biological AI. The success of AlphaFold in protein structure prediction has earned a Nobel Prize in chemistry in 2024. This innovation by its developers has inspired many, leading to a wave of interest in AI for biology. In this wave, the big players such as Genentech, Google DeepMind, and Meta are all diving in.
But most efforts focus on better algorithms or more in-lab data. Basecamp is doing the opposite: dragging the laboratory into the wilderness. “This is one of the most exciting things I’ve seen in a long time,” says Nathan Frey, a machine learning researcher at Genentech. “They’re bringing real-world diversity into the digital domain.”
Even with 10 billion new genes in their digital vault, the real challenge for Basecamp isn’t just what they’ve got. It’s what they can prove.
Can these new genes lead to breakthroughs in drug discovery? In climate resilience? In plastic-eating proteins or next-gen CRISPR tools? Right now, nobody knows. “They have to show this novelty is useful in some way,” warns Leopold Parts from the Wellcome Sanger Institute. “Otherwise, it’s just a very cool collection.”
And there’s another curveball: function prediction. If the newly found genes are so unlike anything we’ve seen, how do you even guess what they do?
Today’s bio-AI models rely on existing databases to predict gene functions, and when everything in your dataset is unfamiliar, those predictions may be meaningless.
“It’s like giving an AI a dictionary of made-up words and expecting it to write a novel,” Overmann says. “The creativity is there. But it might not make sense.”
At best, Basecamp’s massive undertaking could transform how we understand life on Earth. A better-trained biological AI could turbocharge research in everything from medicine to sustainability, discovering solutions buried deep in nature’s unexplored corners.
But at worst, the project may turn out to be a spectacular over-collection—tons of data, little actionable insight.
Still, in an era where most companies build on recycled code and recycled genomes, there’s something undeniably refreshing about a team that hikes into volcanoes and polar ice to collect samples in person. It’s the kind of fieldwork-meets-frontier-tech hybrid that science tabloids should be writing about.
So, will Basecamp build the ChatGPT of biology?
Maybe. Maybe not.
But they’ve definitely given the field a serious jolt—and sometimes, that’s exactly what science needs.