As companies pour billions into AI, a ranking system by UC Berkeley students has all eyes on it
Chatbot Arena, which allows users to rank and compare AI models, has quickly grown to 1 million monthly users.

Courtesy Anastasios Angelopoulos and Wei-Lin Chiang
May 6, 2025
In the fast-developing world of AI, Chatbot Arena is king — or at least referee. The website presents visitors with the opportunity to enter a virtual “arena,” enter a prompt of their choosing, and have anonymized AI models “battle” by each generating a response. Users vote for the better of the two answers; a leaderboard tracks where AI giants and startups alike stand in the resultant rankings. And major players in the AI industry — from OpenAI to Meta to DeepSeek — are paying close attention.
Run by Anastasios Angelopoulos and Wei-Lin Chiang, recent graduates of UC Berkeley’s computer science doctoral program and now postdoctoral fellows (as well as housemates), Chatbot Arena has roots in Berkeley’s Sky Computing Lab. As students there developed Vicuña in 2023, an early open-source language learning model (or LLM), the need arose to evaluate chatbot apps and prove that small independent platforms like theirs could go toe-to-toe with well-known models like ChatGPT. A type interface site was posted, not unlike Craigslist in its barebones aesthetic.
Today, Chatbot Arena fields 1 million monthly unique users, on average, in over 100 languages. Fans sport T-shirts referencing insider code names assigned to the anonymized models. In December, the Wall Street Journal profiled the pair, likening Chatbot Arena to the Billboard Hot 100. The site also draws comparisons to Wikipedia in its embrace of crowdsourcing and an egalitarian ethos.
Not everyone is as elated about the new horizons ushered in by developments in AI. In addition to general fears about the technology’s implications for jobs and how it will affect young people’s educational development, there are corresponding geopolitical tensions. In early February, a Chinese AI company, DeepSeek, spurred what The New York Times termed a “giant AI tech freakout” in the U.S. after it released a competitive LLM at a dramatically lower cost. And in mid-April, Congress released a report describing DeepSeek as “a profound threat” to U.S. security interests.
Angelopoulos and Chiang are now looking to transition Chatbot Arena into its next phase. While the site will remain free to users, they are in the very preliminary stages of building the platform into a company. UC Berkeley News recently spoke with Angelopoulos about what he has learned from his experience and how it pertains to the future of an industry that is quickly reshaping the world we live in.

Rosa Norton: It sounds like it’s been a roller coaster ride since launching the site. Could you share a standout moment when you realized you’d hit on something big?
Anastasios Angelopoulos: Before GPT 4.0 was released, OpenAI gave us a few different flavors of the model to test so they could see which one was the best to release, or at least measure how our community felt about all three different models. There was so much excitement in the community because they were like, “Oh, my God! Is this OpenAI’s next model? This is such a big deal!” They were coming to us to try out the pre-release version of the model because they wanted to see what Open AI’s next AI will be. And that was really an exciting time, a highlight moment.
How has collaboration helped shape Chatbot Arena in general?
It’s been a super collaborative project. We’re collaborating with all the top labs — Google and OpenAI and xAI and Meta. We’re in the loop with them on a daily basis. Then there’s so many academic collaborations for research. We have collaborators at Carnegie Mellon University; University of California, San Diego; Mohamed bin Zayed University of Artificial Intelligence; there’s a bunch of different schools that have people who want to contribute to this larger mission of reliable AI and community-driven evaluations.
AI is definitely the fastest moving area in the world right now. The speed and intensity of competition in AI is really remarkable, and that’s part of what contributes to our growth.
Is that where comparisons to Wikipedia come in?
That’s exactly right, because the community gets to vote on which models they like, and that has a Wikipedia flavor to it. It’s not determined by a few experts, everybody gets a voice.
You describe Chatbot Arena’s rating system as “battles,” and you have images of crossed swords accompanying them on the website. Do you think that a gaming element lends to the buzz over companies jostling over leaderboard placements?
For sure. I think we want to lean more into the game aspect of it, actually, to try and make people really enjoy coming here and battling the models against one another. And on the company side, companies are really competitive to be No. 1 — that really matters to them. AI is definitely the fastest moving area in the world right now. The speed and intensity of competition in AI is really remarkable, and that’s part of what contributes to our growth.
Your method centers human preference, or what The Wall Street Journal refers to as “vibes-based evaluation.” Can you talk about the significance of that choice?
The reality is that human preference really matters — people want to create software that humans like. That’s not to say that the evaluations that we do replace everything. They’re not the only kind of evaluation you might want. But how many companies are there that want to achieve consumer satisfaction? People have opinions, they like what they like. And we should instrument that so that we can understand it. You want to decompose preference into its constituent components. But that’s different from arguing that people’s preferences aren’t relevant, which is clearly a bad argument.
Also related to design, how did you land on user-generated prompts as opposed to presenting people with preset inputs?
We landed on that somewhat randomly, but it has proven extremely important. Model providers and the general community care a lot about organic usage; they don’t want to see contrived prompts. They want to see how people are using AI, and they want to be good at what people are doing. If you and I were to sit here and come up with a thousand prompts, we would not do a good job of capturing the diversity of human interactions. The only way to do that is to allow people to say what they want. And that’s what the platform’s about — that is the main reason why it has become useful.
Does that also come with more risks? Do you want to comment on the fears that people have about AI?
It’s not really my place to say whether those fears are well-placed or not. I will say that everybody who works on engineering systems has to care about safety and reliability. If you’re going to deploy a software with AI built into it, then you need to make sure that the effect on the users is as desired, as opposed to buggy software that might recommend users do dumb stuff or that’s giving users wrong answers. You get that all the time these days, because nobody really knows how to build reliable software with AI. I do think that what we’re building can provide a huge contribution to building safe and reliable AI software. But my perspective on it is totally apolitical and sort of down to brass tacks engineering, which is that I care about making great tools to make people’s lives better and more efficient, as opposed to the more existential risks.
Did you foresee the recent shake-up surrounding DeepSeek and Chinese AI companies?
We didn’t foresee the stock market change, but we’ve been working with DeepSeek for a long time, so we’ve known that they were improving, and open source AI, in general, has been improving for quite a while. So there really wasn’t that much of a surprise that an open source model would eventually do well, and an open source Chinese model, right? Because China has been doing well in the AI world. My hope is that consumers will benefit, because having great open models is just a pure win for the whole world.
Everything that we’ve done so far basically has been open source, and the fact that Berkeley has this sort of earthy, crunchy commitment to open products and open research has had a huge influence on us.
Do you have predictions about the near future of developments in AI?
My expectation is that in a decade, every major business is going to be deploying AI in significant parts of their pipeline, and capabilities will improve so much that there will be many jobs in the economy that become more productive because of the use of AI.
And make some of people’s positions redundant, I imagine?
I don’t know about redundant, but I certainly think that one person will be able to be more productive. My hope personally is that it helps a lot of people solve drudgery paperwork. There’s so much value to be provided there: I don’t want to have to fill out forms, I don’t want to have to do repetitive tasks. Just solving that pain point for people, it might sound small, but that is huge.
What are your hopes for the future of Chatbot Arena?
We are just now starting to work on a company out of Chatbot Arena. The site is always going to be free to use for people — that’s for sure. We just know that we want to keep working on it and keep building it out in a way that requires more resources than we could possibly assemble from an academic context. And we’ll continue to support open research and work closely with Berkeley.
Is there anything else you’d like to add that I haven’t covered?
The Berkeley ecosystem has been so helpful in all of this, it’s such an honor and a privilege to be part of the Berkeley community. Without Skylab and the help of Ion Stoica and other professors in the lab, we would never have gotten started on this. Everything that we’ve done so far basically has been open source, and the fact that Berkeley has this sort of earthy, crunchy commitment to open products and open research has had a huge influence on us. I just feel very grateful for it. And I know that Wei-Lin does, too.