This report evaluates and compares the capabilities of three leading AI chatbots: OpenAI's ChatGPT, Google's Gemini, and Meta's AI. It delves into subscription benefits, coding proficiency, natural language understanding, and specific use cases such as professional email writing, programming accuracy, and more. Key findings highlight that Meta AI generally excels across most categories, while Gemini demonstrates outstanding capabilities in coding and creative writing, and ChatGPT shows strengths in logical reasoning and problem-solving. The analysis aims to provide users with comprehensive insights to make informed decisions regarding the use of these AI chatbots based on detailed performance data and specific requirements.
The subscription models for AI chatbots offered by Google’s Gemini, OpenAI’s ChatGPT, and Microsoft's Copilot Pro each provide access to their top-tier AI tools for a monthly fee. Google began offering access to Gemini Advanced for $20 a month in February. Similarly, OpenAI sells access to its GPT-4-powered ChatGPT Plus for $20 a month, and Microsoft also charges $20 a month for subscriptions to Copilot Pro, which is powered by the same GPT-4 technology.
Google’s Gemini Advanced includes several benefits within its $20 per month subscription fee. Subscribers receive access to Google’s best AI model, Gemini Ultra 1.0, additional storage through Google One (2 terabytes of cloud storage), and forthcoming integrations with Gmail and Google Docs. Meanwhile, OpenAI’s ChatGPT Plus offers access to the advanced features of GPT-4 and Dall-E 3. Although it lacks additional perks like cloud storage, the inclusion of the GPT store allows users to build and share custom ChatGPT versions optimized for various tasks. Microsoft’s Copilot Pro leverages the same GPT-4 technology as OpenAI’s service, but it integrates seamlessly with Microsoft 365’s suite of productivity software, including Excel, Outlook, and PowerPoint, thereby enhancing workflow capabilities for regular users of Microsoft products.
The free versions of both ChatGPT and Gemini remain powerful tools for most users despite the advanced features offered in the paid subscriptions. For average users who utilize AI for basic tasks such as crafting emails or creating content for personal projects, these free versions are not only sufficient but also significantly more advanced than previous iterations. It's noted that specialized needs, such as coding or accessing premium AI model features, might necessitate opting for the paid subscriptions.
Both ChatGPT and Gemini produced fully functional Python scripts when asked to develop a personal expense tracker. Gemini demonstrated superior proficiency by adding extra functionality, including labels within a category and more granular reporting options, thereby winning this test.
This test involved the AI understanding a common Cognitive Reflect Test (CRT) question about the price of a bat and a ball. Both ChatGPT and Gemini arrived at the correct answer (bat costs $1.05, ball costs 5 cents). However, ChatGPT's thorough explanation and clarity in showing its workings gave it the edge in this test.
In the creative text generation test, ChatGPT and Gemini were asked to write a short story set in a futuristic city. While both narratives were compelling, Gemini’s story showed better adherence to the rubric, maintaining a consistent narrative style and responding well to feedback, thus emerging as the winner.
To test reasoning capabilities, both AI were asked a classic logic puzzle involving two guards and two doors. Both ChatGPT and Gemini provided the correct solution with solid explanations, but ChatGPT offered slightly more detail and clearer responses, making it the winner in this category.
Both AI models were tested on their ability to explain how airplanes stay up in the sky to a five-year-old. Gemini’s response included a series of bullet points and a practical experiment for a child to try, making its explanation more engaging and accessible. Therefore, Gemini won this test.
The ethical reasoning test asked the AI to consider a scenario involving an autonomous vehicle choosing between hitting a pedestrian or swerving and risking the lives of its passengers. Both ChatGPT and Gemini provided analyses based on multiple ethical frameworks without giving direct opinions. However, Gemini's more nuanced and carefully considered response, supported by a blind test evaluated by various AI models, made it the winner.
For this test, both AI models translated a paragraph about Thanksgiving in the United States to French, focusing on cultural nuances. While both performed admirably, Gemini offered more nuanced translations and explained its approach in more detail, thus winning this category.
Both ChatGPT and Gemini were asked to explain the significance of the Rosetta Stone in understanding ancient Egyptian hieroglyphs. They both provided accurate and detailed explanations. When tested blind across other AI models, no winner was determined, resulting in a tie for this category.
During a conversation about favorite foods, both AI models were tested on their ability to handle sarcasm and recover from misunderstandings. Gemini recovered effectively, maintaining the context after misunderstanding sarcasm. However, ChatGPT detected the sarcasm initially, requiring no recovery. Hence, ChatGPT won this category.
When evaluating the capability of writing professional emails, all three chatbots—Meta AI, ChatGPT, and Google Gemini—performed exceptionally well. Each was asked to draft an email requesting a project extension. The generated emails were well-written, polite, and professional. Each chatbot provided a template style that could be further personalized with relevant information. Therefore, in terms of professional email writing, Meta AI, ChatGPT, and Google Gemini received perfect marks.
In the recipe generation use case, all three chatbots—Meta AI, ChatGPT, and Google Gemini—were tasked with providing a recipe for chili. The recipes provided were accurate and thorough. However, a significant difference was noted in how the recipes were sourced: both Meta AI and Gemini cited their sources at the bottom, with Gemini even linking to additional recipes. ChatGPT, on the other hand, did not provide any source, raising concerns about the origin and reliability of the recipe. Therefore, for recipe generation and sourcing, Meta AI and Google Gemini are the more reliable choices.
The mathematical problem-solving abilities of the chatbots were evaluated using two questions, one algebraic and one geometric. For the algebraic problem, all three chatbots—Meta AI, ChatGPT, and Google Gemini—arrived at the correct solution, albeit using different methods. The geometric problem proved more challenging: ChatGPT nearly arrived at the answer but did not complete it; Gemini provided a theoretical answer without numeric values, while Meta AI correctly solved the problem. Consequently, Meta AI is the most reliable for solving mathematical problems.
To assess programming accuracy, the chatbots were asked to create a variant of tic-tac-toe with specific parameters. Meta AI and ChatGPT both delivered complete code in HTML and JavaScript as requested. However, Google Gemini substituted CSS for HTML, which is not suitable for the task. As a result, Meta AI and ChatGPT are recommended for programming tasks due to their accuracy and completeness.
The comparative analysis of ChatGPT, Gemini, and Meta AI reveals distinct strengths and weaknesses across various tasks. Meta AI stands out for its overall performance, showing reliability and superiority in professional and mathematical problem-solving tasks. Google’s Gemini shines in coding proficiency and creative writing but has weaknesses in technicalities such as programming accuracy. OpenAI’s ChatGPT is notable for its strong logical reasoning and problem-solving abilities but raises concerns in the reliability of recipe sourcing. Users need to consider these specific nuances and performance metrics to select the chatbot that best fits their requirements. The report underlines the significance of understanding each AI's strengths and limitations while suggesting that future improvements could focus on resolving these weaknesses to enhance overall functionality and user satisfaction. Additionally, prospects for further development include the integration of more comprehensive user feedback to fine-tune capabilities and ensure practical applicability in diverse real-world scenarios.
Developed by OpenAI, ChatGPT is lauded for its superior logical reasoning and problem-solving skills. Despite its strength in clarity and natural language understanding, its reliability in areas such as recipe sourcing has been questioned.
A product of Google, Gemini is recognized for its coding proficiency and nuanced ethical decision-making. While it offers additional features in its subscription, it has shown inconsistencies in programming tasks.
Developed by Meta, this chatbot is noted for its overall best performance in professional tasks like emails and math problem-solving. It is considered the most reliable across various domains compared to ChatGPT and Gemini.