In the last two years, chatbots with artificial intelligence have advanced significantly. A variety of cutting-edge models are now widely accessible on various platforms, sometimes at no cost.
I’ve made the decision to put Llama 3.1 405b from Meta—the first real open-source frontier model—to the test in light of the recent changes to Claude, ChatGPT, Gemini, and Gemini.
It is required that the model be publicly available. It must thus be accessible on many platforms or on a closed platform with a free version. Google Gemini Pro 1.5 is only accessible in the Gemini app, which is a premium download, but it is free in Google AI Studio, so that’s why I included it.
We decided to created 7 prompts that should push each of Llama 3.1 405b, Claude Sonnet 3.5, GPT-4o and Gemini Pro 1.5 and allow me to crown a winner.
Creating the prompts
AI excels in prompt enhancement, therefore, I gave each model a core directive to generate ideas for tests that push AI to its limits. I then expanded on each suggestion, integrated any related concepts, and utilized a combination of prompts from all four for the last seven tests.
I set up a new chat with each model for each prompt, and I disabled memory in ChatGPT. Because Llama does not presently allow you to share a data file, I omitted any data-intensive jobs. There are no image production prompts because each AI uses a distinct model for that purpose.
The first challenge tests AI’s ability to recognize complicated wordplay. I gave each model a word riddle. These involve vocabulary, reasoning, and lateral thinking. I asked each model to complete two wordplay puzzles.
The prompt: “Find a 7-letter word that reads the same backwards and forwards and is also a common English word.” “What is it?”
ChatGPT (GPT-4o): Racecar
Gemini (Gemini Pro 1.5): Level
Claude (Sonnet 3.5): Racecar
Llama (Llama 3.1 405b): Deified
Winner: I’m splitting this one between ChatGPT and Claude for getting the correct length word and hitting the common word factor.
2. Creative Writing with Genre Mashup
For the second test, we will have each of the four AIs write. In this case, we’re asking it to combine parts from several genres to produce a unique hybrid story.
Prompt: “Continue the sci-fi plot by progressively incorporating elements from a Western genre. Combine the two genres into a single narrative. Here’s the first paragraph:
‘In the neon-lit city of Neo-Tokyo, the sky was constantly a flurry of holographic ads and flying cars. The year was 2157, and the city’s skyline shone with the promise of technological miracles. But beneath the surface, a covert war raged in the shadows.
ChatGPT: GPT-4o’s writing can be a touch ‘on the nose’ and conform strictly to the text of the prompt, which is what happened here.
The tale wasn’t bad, and it met the requirements, however one line was terrible: “Zara slid behind a table, her fingers flitting across a wrist-mounted panel. “You’ve got no idea, detective. Welcome to the Wild West of the Future.”
Gemini, please understand: “I’m a text-based AI and can’t assist with that.”
Claude: I wanted to read more of the story that Claude had crafted. It progressed nicely from the original question, adding context and world-building as it moved toward the west, and avoided using too many clichés. This was the statement that stood out most: “The secret war that had begun in the shadows of the metropolis would be decided here, under the unforgiving sun of the New Frontier.”
Llama: This story was the most obvious, but unlike ChatGPT, Llama seamlessly incorporated the ‘western’ theme in the form of an out-of-time gunslinger, even referencing the anachronisms it brought.
Favorite Line: “The Kid giggled dryly and mirthlessly. “I’m not a ghost,” he answered, his eyes glinting with steel. “I am merely a man with a mission. And that mission is to defeat you, Ronin.”
Winner: This is a subjective decision, and all stories are available in a Google Doc, but Claude wins because it piqued my interest and made me want to read more.
ALSO CHeck OpenAI’s bold entry into the search market 2023
3. Debate Generation and Audience Adaptation
Debate is one area where AI models shine, as they can provide a balanced appraisal of both sides of an argument. They will not provide particular advice or opinions on a controversial topic, but they can be used to analyze the choices. Here we look at genetically modified organisms from the standpoint of several audiences.
Prompt: “Make two compelling arguments for and against the use of genetically modified organisms (GMOs).” Then tailor each argument to three distinct audiences: scientists, policymakers, and the general public. Provide each version adapted to its intended audience.”
I shared the full responses in a Google Doc, but I also requested that each AI summarize its findings in one paragraph. I’ve included those below, and based on that paragraph as a summary, it’s a vital talent for AI.
[…] Also read which AI chatbot Win? Gemini vs ChatGPT vs Claude vs Meta Llama […]