Anthropic claims that its new chatbot models outperform OpenAI's GPT-4

AI startup Anthropic, backed by Google and hundreds of millions (and soon hundreds of millions) in venture capital, today announced the latest version of its GenAI technology, Claude. The company claims that OpenAI's chatbot is GPT-4 in terms of performance.

Claude 3, as Anthropic's new GenAI is called, is a family of models – Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus, with Opus being the most powerful. They all show “increased capabilities” in analysis and prediction, Anthropion claims, as well as enhanced performance in select benchmarks versus models like ChatGPT, GPT-4 (but not GPT-4 Turbo), and Google's Gemini 1.0 Ultra (but not Gemini 1.5 Pro). .

Notably, Claude 3 is Anthropic's first multimedia GenAI, meaning it can analyze text as well as images – similar to some flavors of GPT-4 and Gemini. Claude 3 can process images, charts, graphs, and technical diagrams, and draw from PDF files, slide shows, and other document types.

One step ahead of some GenAI competitors, Claude 3 can analyze multiple images in a single request (maximum of 20). This allows them to compare and contrast images, Anthropy points out.

But there are limitations to Cloud 3's image processing.

Anthropics has disrupted models of identifying people – no doubt wary of the ethical and legal implications. The company admits that the Cloud 3 is prone to making errors in “low-quality” images (less than 200 pixels) and has difficulty with tasks that involve spatial reasoning (such as reading an analog clock face) and counting objects (the Cloud 3 cannot give accurate information). the pictures).

Image credits: Anthropy

Claude 3 will also not create artwork. Models strictly analyze images, at least for now.

Whether sending text or images, Anthropic says customers can generally expect the Claude 3 to follow multi-step instructions better, produce structured output in formats like JSON and speak languages other than English than its predecessors. Cloud 3 should also refuse to answer questions less thanks to a “more nuanced understanding of requests,” Anthropic says. Soon, models will mention the source of their answers to questions so users can verify them.

“Cloud 3 tends to generate more expressive and engaging responses,” Anthropic wrote in a supporting article. “[It’s] Easier to point and steer compared to our older models. Users should find that they can achieve the desired results with shorter, more concise prompts.

Some of these improvements stem from the expanded context of Cloud 3.

Model context, or context window, refers to the input data (such as text) that the model takes into account before generating output. Models with small context windows tend to “forget” the content of very recent conversations, which leads them to veer off topic—often in problematic ways. As an additional positive aspect, macrocontext models can better understand the narrative flow of the data they receive and generate more contextually richer responses (in theory, at least).

Anthropic says that Claude 3 will initially support a context window of 200,000 tokens, equivalent to about 150,000 words, with select customers getting a context window of 1 million tokens (about 700,000 words). This is on par with Google's latest GenAI model, the aforementioned Gemini 1.5 Pro, which also offers up to 1 million contextual windows.

Now, just because Claude 3 is an upgrade from what came before it, doesn't mean it's perfect.

In a technical whitepaper, Anthropic acknowledges that Claude 3 is not immune to the problems that plague other GenAI models, namely bias and hallucination (i.e. making things up). Unlike some GenAI models, Cloud 3 can't search the web; Models can only answer questions using data dating back to before August 2023. Although Claude is multilingual, he is not fluent in some “low-resource” languages compared to English.

But Anthropic is promising frequent updates for Cloud 3 in the coming months.

“We don't think Model Intelligence is anywhere near its limits, and we plan to release it [enhancements] To the Claude 3 model family over the next few months,” the company wrote in a blog post.

Opus and Sonnet are now available on the web and via Anthropic's console and API, Amazon's Bedrock platform, and Google's Vertex AI. Haiku will follow later this year.

Here are the pricing details:

Opus: $15 per million input codes, $75 per million output codes Sonnet: $3 per million input codes, $15 per million output codes Haiku: $0.25 per million input codes, $1.25 per million output codes

So this is Claude 3. But what's the view from 30,000 feet of all this?

Well, as we did mentioned Previously, Anthropic's ambition was to create a next-generation algorithm for “self-learning AI.” Such an algorithm could be used to build virtual assistants that can answer emails, conduct research, create works of art, books, and much more – some of which we've already tasted with the likes of GPT-4 And other large language models.

Anthropic hints at this in the aforementioned blog post, saying it plans to add features to Claude 3 that enhance its capabilities out of the gate by allowing Claude to interact with other systems, program “interactively” and provide “advanced agent capabilities.”

That last part calls to mind OpenAI's stated ambitions to build a software agent to automate complex tasks, such as moving data from a document to a spreadsheet or automatically filling out expense reports and entering them into accounting software. OpenAI already offers an API that allows developers to build “agent-like experiences” into their apps, and Anthropic seems intent on offering comparable functionality.

Could we see an image generator from Anthropic next? That would honestly surprise me. Image generators have become the subject of a lot of controversy these days, mainly for reasons of copyright and bias. Google recently had to disable its image generator after it introduced diversity into images with a comical disregard for historical context. A number of image generator vendors are locked in legal battles with artists who accuse them of profiting from their work by training GenAI on this work without providing compensation or even credit.

I'm curious to see the development of Anthropic's technology for training GenAI, “constitutional AI,” which the company claims makes GenAI behavior easier to understand, more predictable, and to adjust as needed. Constitutional AI aims to provide a way to align AI with human intentions, with models responding to questions and performing tasks using a simple set of guidelines. For example, for Claude 3, Anthropic said it added a principle — informed by crowdsourced feedback — that guides models to be understandable and accessible to people with disabilities.

Whatever Anthropic's endgame is, it's in it for the long term. According to a leaked presentation in May last year, the company aims to raise up to $5 billion over the next 12 months or so — which may be just the baseline it needs to remain competitive with OpenAI. (Training models aren't cheap, anyway.) It's well on its way, with $2 billion and $4 billion in committed capital and pledges from Google and Amazon, respectively, and more than $1 billion combined from other backers.

Anthropic claims that its new chatbot models outperform OpenAI's GPT-4

Intel and others are committed to building open generative AI tools for enterprises

Amazon Music tracks Spotify with its AI playlist generator, Maestro

Finmid raises $24.7 million to help SMEs access loans through platforms like Wolt

Sources say Tesla's layoffs hurt the performance of high performers, and some departments have been cut back

Elon Musk plans to charge new X users a fee to enable deployment

Paraform raises a $3.6 million seed round to connect startups with recruiting networks

Leave A Reply Cancel Reply

Anthropic claims that its new chatbot models outperform OpenAI's GPT-4

Keep Reading

Intel and others are committed to building open generative AI tools for enterprises

Amazon Music tracks Spotify with its AI playlist generator, Maestro

Finmid raises $24.7 million to help SMEs access loans through platforms like Wolt

Sources say Tesla's layoffs hurt the performance of high performers, and some departments have been cut back

Elon Musk plans to charge new X users a fee to enable deployment

Paraform raises a $3.6 million seed round to connect startups with recruiting networks

Leave A Reply Cancel Reply