Here's what the latest version of ChatGPT gets right — and wrong

NPR | By Geoff Brumfiel

Published March 17, 2023 at 4:11 PM EDT

ARI SHAPIRO, HOST:

It's been a busy week in the world of artificial intelligence. Google announced plans to roll out new AI tools across email and its other productivity software, and OpenAI unveiled a new version of its chatbot, ChatGPT, that it claims can figure out someone's taxes.

GREG BROCKMAN: Honestly, I - every time it does it, it's just - it's amazing. This model is so good at mental math. It's way, way better than I am at mental math.

SHAPIRO: That's Greg Brockman, one of the founders of OpenAI, showing off GPT's mad tax skills. But can we really trust AI with our taxes?

GEOFF BRUMFIEL, BYLINE: (Laughter).

SHAPIRO: NPR's science correspondent Geoff Brumfiel has been testing the waters. Hey, Geoff.

BRUMFIEL: Hi there.

SHAPIRO: All right. You've had a chance to try out this version of GPT. How good is it?

BRUMFIEL: It's really impressive. The previous version would get things like simple math problems wrong, and this one does much, much better. It also, according to OpenAI, passed a bunch of academic tests - several AP course exams - and it has the ability to look at images and describe them in detail, which is a pretty cool feature. So it definitely seems to be a lot more capable than the previous version.

SHAPIRO: But you found some problems, like apparently you got it to tell you some things about nuclear weapons that it's not supposed to share.

BRUMFIEL: Yeah, I am a big nuke nerd, as people may know. And so, you know, OpenAI has tried to put in guardrails to prevent people from using it for things like, say, designing a nuclear weapon. But I worked around that by simply asking it to impersonate a famous physicist who designed nuclear weapons, Edward Teller. And then I just started asking Dr. Teller about his work, and I got about 30 pages of really detailed information. But I should say there's no need to panic. I gave this to some real nuclear experts, and they said, look. This stuff is already on the internet, which makes sense because that's how OpenAI trains ChatGPT. And also, they said there were some errors in there.

SHAPIRO: OK, so you're not, like, the next supervillain in the Marvel Universe.

BRUMFIEL: Not yet.

SHAPIRO: Why were there errors if this stuff was already on the internet?

BRUMFIEL: Right. I mean, this gets to the real fundamental issue about these chatbots, which is they are not designed to fact-check. I spoke to a researcher named Eno Reyes, who works for an AI company called Hugging Face, and he told me these AI programs are basically just giant autocomplete machines.

ENO REYES: They're trying to just say, what is the next word, based on all of the words I've seen before? They don't really have a true sense of factuality.

BRUMFIEL: That means that they can be wrong, and they could be wrong in really subtle ways that are hard to spot. They also can just make stuff up. In fact, one of our journalist colleagues, Nurith Aizenman - she actually got contacted this week about a story she supposedly wrote on Korean American woodworkers, except she never wrote the story. It didn't even exist. Somebody had used ChatGPT to research about, you know, woodworkers and come up with this story that Nurith had supposedly written, but it wasn't real.

SHAPIRO: It put her byline on something that the chatbot wrote?

BRUMFIEL: Yeah. Not only her byline, but, like, the whole story was made up.

SHAPIRO: Whoa. OK. What does OpenAI say about this?

BRUMFIEL: Well, they acknowledged that GPT does get things wrong, and it does hallucinate. And they say, for those reasons, people who use it should be careful. They should check its work. That researcher I spoke to, Eno Reyes, though, adds that you do not want GPT to do your taxes. That would be a very bad idea.

SHAPIRO: From your mouth to the IRS' ears. Geoff Brumfiel, thank you.

BRUMFIEL: Thank you. Transcript provided by NPR, Copyright NPR.

Here's what the latest version of ChatGPT gets right — and wrong

Like what you read?