In the realm of large language models (LLMs), two titans stand out: Both are formidable models that can translate languages, produce human-quality material, create a variety of imaginative content, and provide you with enlightening answers to your queries. But which one is the best when they all seem to have such comparable strengths? To assist you in selecting which of these two AI wonders best suits you, let's examine their main distinctions.
Parentage and Funding
ChatGPT is the brainchild of OpenAI, an illustrious research lab backed by Microsoft. Claude, on the other hand, is the creation of Anthropic, a research outfit financially supported by Google and Amazon.
Underlying Technology
ChatGPT leverages OpenAI's proprietary technology, while Claude is built upon a foundation called "Constitutional AI," which prioritizes safety and responsible development.
Cost
Both models offer free tiers, making them accessible to a broad audience. However, for advanced features, ChatGPT offers a paid tier called "ChatGPT Plus" at $20 per month, while Claude has a similar tier called "Claude PRO" at the same price point.
Token Limit
ChatGPT's free tier comes with a limitation of 8,192 tokens, while Claude offers a significantly higher limit of 150,000 tokens in its free tier. This allows Claude to handle longer and more complex tasks compared to the free tier of ChatGPT.
Feature Focus
While both models boast a diverse range of capabilities, their areas of expertise differ. ChatGPT shines in its creative flair, excelling in tasks like writing poems, code, scripts, musical pieces, emails, letters, etc. It also offers chatbot functionalities, allowing you to have interactive conversations with the model. Claude, on the other hand, leans towards factual accuracy and safety. It adheres to a strict set of principles designed to minimise bias and factual errors in its responses. Additionally, Claude excels at data analysis and can readily summarize complex information, making it a valuable tool for research and analysis tasks.
Accuracy and Trustworthiness
Both models are constantly being improved, but their approaches to accuracy differ. ChatGPT prioritizes fluency and coherence in its outputs, which can sometimes lead to factual inconsistencies. Claude, on the other hand, prioritizes factual correctness and adheres to its "constitution" to ensure responsible AI development. This makes Claude a more trustworthy choice for tasks requiring high accuracy, such as legal or financial tasks.
What is Claude?
A group of former OpenAI employees who played a key role in the development of GPT-3, the language model that preceded ChatGPT, co-founded the AI business Anthropic. Among the notable founders are the Amodei siblings, Dario and Daniela, who founded Anthropic after departing from OpenAI in 2021 over worries about AI safety. With significant funding from Google and Amazon, Anthropic, now valued at $20 billion, has created its own huge language model, named Claude, with an emphasis on AI safety and ethics.
Claude is Anthropic's AI chatbot powered by large language models. It excels at tasks like summarization, editing, Q&A, decision-making, and code writing. Anthropic offers three Claude models – Claude 1, Claude 2, and Claude-Instant – which are regularly trained on up-to-date information. The flagship Claude-3 is Anthropic's most capable general-purpose AI model to date. It demonstrates significantly improved performance over prior versions on a wide array of language tasks, from question-answering and analysis to writing and coding. Claude-3 represents a major step forward in natural language understanding and generation capabilities. Claude 3 simply comprises three models, which are;
- Claude 3 Haiku
- Claude 3 Sonnet
- Claude 3 Opus
A comparison of three such models, Opus, Sonnet, and Haiku is presented below.
Performance on Undergrad Level Knowledge
Opus outperforms Sonnet and Haiku in the "Undergraduate level knowledge" benchmark, achieving an accuracy of 86.8%. Sonnet follows closely at 79.0%, while Haiku trails behind at 75.2%.
Graduate Level Reasoning
Moving on to "Graduate level reasoning," Opus maintains its lead with a score of 50.4%. However, the gap between the models widens here, with Sonnet achieving 40.4% and Haiku falling to 33.3%.
Grade School Math
The table showcases the dominance of Opus in "Grade school math," where it achieves a near-perfect score of 95.0%. Sonnet follows at a respectable 92.3%, while Haiku comes in at 88.9%. This suggests that Opus might be particularly adept at handling fundamental mathematical problems.
Math Problem Solving
For "Math problem-solving," the lead switches, with Sonnet achieving the highest score of 60.1%. Opus follows closely at 43.1%, and Haiku trails behind at 38.9%. This benchmark might involve more intricate mathematical problems compared to grade school math.
Multilingual Math
Opus maintains a significant lead in "Multilingual math," scoring 90.7%. Sonnet comes in at 83.5%, and Haiku falls behind at 75.1%. This suggests that Opus might be better equipped for handling mathematical problems in multiple languages.
Code
The table shows an interesting dynamic in the "Code" benchmark. While Opus edges out the competition with a score of 84.9%, Haiku trails closely at 75.9%, and Sonnet falls behind at 73.0%. This might indicate Opus's potential strength in tasks related to code generation or understanding.
Reasoning over Text
For "Reasoning over text," all three models display similar performance, with scores ranging from 78.1 to 83.1. This suggests that all three models might be well-suited for tasks involving textual reasoning.
What is ChatGPT?
OpenAI created the extensive language model known as ChatGPT. It can converse like a human and help with a wide range of jobs like writing, analysis, coding, math, and creative endeavours because it has been trained on a massive amount of text data from the internet.
Important details of ChatGPT include:
- It uses transformer-based neural networks and self-attention mechanisms to generate highly coherent and contextually relevant responses.
- It has broad knowledge spanning many topics, but its knowledge is static based on its training data cutoff, so it may not have the latest information on rapidly evolving subjects.
- It cannot learn or update its knowledge on its own. It relies on its initial training by OpenAI.
- While very capable, it is a language model without true sentience or understanding. Its responses are generated based on patterns in its training data.
- It strives to be helpful, harmless and honest, but can sometimes produce biased, inconsistent or incorrect outputs, especially on sensitive topics.
- Privacy, security and content filtering are key concerns being addressed by OpenAI for the responsible use of such powerful AI models.
GPT Models
All GPT-3 and GPT-4 models are designed to understand and generate natural language or code and have been optimized for chat conversations using the Chat Completions API, while still being suitable for non-chat tasks as well.
- gpt-3.5-turbo-0125: This is the latest and most updated model, boasting higher accuracy in responding to requests in specific formats, and featuring a fix for a bug that caused text encoding issues for non-English language function calls. It can return a maximum of 4,096 output tokens.
- gpt-3.5-turbo: This model currently points to gpt-3.5-turbo-0125, meaning it essentially refers to the same model.
- gpt-3.5-turbo-1106: This model offers improved instruction following, JSON mode, reproducible outputs, and parallel function calling, also with a maximum output of 4,096 tokens.
- gpt-4-0125-preview: This is the latest GPT-4 model, designed to reduce instances where the model fails to complete a task, and can return a maximum of 4,096 output tokens.
- gpt-4-turbo-preview: This model currently points to gpt-4-0125-preview, meaning it essentially refers to the same model.
- gpt-4-1106-preview: This model offers improved capabilities in following instructions, JSON mode, generating consistent outputs, and handling multiple function calls simultaneously. It also has a maximum output of 4,096 tokens.
- gpt-4-vision-preview: This is the only model among the four with the ability to understand and respond to visual information, in addition to all the other functionalities mentioned above. It currently points to gpt-4-1106-vision-preview.
According to openreview on an evaluation of the performance of the three large language models - GPT-4, GPT-3.5, and Claude on a dataset of 1,002 questions across 27 subcategories.
The results show that GPT-4 emerged as the top performer, correctly answering 84.1% of the questions, followed by GPT-3.5 (78.3%), and Claude (64.5%). GPT-4 ranked best in 10 out of the 12 main categories, with GPT-3.5 winning in 3 categories. The models demonstrated strong performance in areas like bias/discrimination, ethics/morality, language understanding, reading comprehension, and temporal/commonsense reasoning.
However, all models struggled in areas like spatial reasoning, physical reasoning, symbolic reasoning, and logic, indicating limitations in real-world understanding.
In math and coding, GPT-3.5 and GPT-4 excelled, while Claude performed best in the facts category. GPT-4 showed a refined sense of humour and aptitude for riddles/IQ questions compared to others. Although the models performed well on self-awareness tests, the authors point out that this does not always reflect genuine self-awareness, which is still a topic of ongoing research.
Conclusion
Strong big language models like ChatGPT and Claude can write, analyze, code, and answer queries in a manner akin to that of a human. Their parent companies, however, are different: Claude is developed by Anthropic, while being supported by Google and Amazon, with an emphasis on AI safety and ethics, whereas ChatGPT is developed by OpenAI, supported by Microsoft. Although both have free and premium tiers, the free edition of Claude has a far larger token cap.