Best Practices

Selecting the Right AI Model for Every Task

PT

There are a plethora of foundational AI models available today, with more being developed every month. From OpenAI to Google, there is no shortage of AI capabilities to choose from.

MindStudio developers can select from 13 different models created by 5 leading companies: Mistral, Google, OpenAI, Anthropic, and Meta. Advanced users can also integrate fine-tuned models from cloud services via API requests.

With all of these options, how can developers determine the model that is most suitable for their specific use case?

In this article, we provide an overview of the models accessible in MindStudio and outline their optimal and suboptimal use cases.

Summary

  • For creative tasks like copywriting, Claude 2.1 sounds more human and can articulate its thoughts clearly.
  • For reasoning, GPT-4 Turbo outperforms others, leading the pack in the LLM arena by testing the best models for broad usage.
  • To save on AI costs, Mistral’s 7b model offers great value. It's cheaper yet performs well and is less restrictive compared to most others.
  • Google’s Gemini Ultra struggles with slightly controversial topics, including politics or storytelling, yet it is faster and matches GPT-4 in output quality.
  • Meta’s Llama can be fine-tuned for coding and partially open source, making it a popular base for customizing models specific to codebases.

AI Alignment & Policies

Many companies are starting to implement safeguards and outright ban specific content, such as NSFW material or disruptive imagery. 

Among the models available in MindStudio, Google and Anthropic are the most controlled. Anthropic will avoid discussing controversial topics, and Google's AI restricts many requests. 

OpenAI is relatively open and imposes fewer restrictions. While every model inherently contains bias, GPTs are not overly censored for business use cases.

Finally, Mistral does not have stringent safeguards beyond basic decency rules, such as avoiding advice on creating weapons. It's the only model that permits NSFW interactions - using any of the others could trigger a warning or a ban from the platform.

Speed

To test model speed Perplexity AI playground is a great tool, which allows anyone to test open source models. For example, Mixtral 7b returns 192 tokens per second and takes 0.29 to start.

Using the MindStudio debugger, you can check the usage and speed of all models in your AI application. Navigate to the debugger, click on a thread, and you’ll see something like this under “performance” on the right side: 

This example AI averaged 64.76 tokens per second, for a total usage of 7,927.

Generally speaking, model speed can be ranked as follows: 

  1. GPT-3.5: by far the fastest AI model. Whenever you need a quick execution and don’t necessarily need top-tier quality outputs, go with GPT-3.5.
  2. Gemini Pro: Google also focuses on speed. While not as speedy as GPT-3.5 Turbo, Gemini Pro typically outperforms the other models available in MindStudio.
  3. Mistral & Meta’s Llama: similar performance, but it depends on how big the input is.
  4. GPT-4 Turbo: best model for most reasoning tasks, but not the fastest AI.
  5. GPT-4: incredibly slow, you’ll need to wait 2-3m for many long-form generations.

These aren’t technical tests, but the MindStudio team tried them all and this is what we feel is right as an overall list. We’ll continue to provide updates on new models and model changes in Discord.

Reliability

OpenAI raised 10 billion from Microsoft, Anthropic raised 2 billion from Google, and Mistral quickly became a unique AI unicorn in Europe.

All of these companies are well-funded, supported by large cash reserves, and are working to augment the future of work.

That being said, some are definitely more reliable than others.

OpenAI and Microsoft are highly reliable. Other than the 2023 scandal where the board fired OpenAI’s CEO, the company has been steadily growing and shipping amazing products for developers and end users. 

Google and Meta are reliable. However, both have a tendency to quickly change their product offering, naming, etcetera. Google is infamous for killing products that don’t reach mass adoption. 

Mistral and Anthropic are the smallest, and communication with the developers is sparse. Currently, they seem unable to develop their models as quickly as others. 

Mistral is the only truly open source player on the list, and most Hugging Face LLM models are fine tuned starting from a base Mistral model. Technical users love it, and it has the added benefit of self-hosting capabilities on any server. 

Output Quality by Use Case

The LLM Arena on Hugging Face, basically Githubs for AI, ranks models according to user votes on two variant outputs for the same prompt.

You can participate here. The system will display two responses to a given prompt: choose one, and you'll see which model you selected.

According to LLM Arena: 

  • GPT-4 Turbo ranks #1 amongst the publicly available APIs, and it’s included in our MindStudio packages. The standard GPT-4 ranks #2. OpenAI dominates the leaderboard; 
  • Gemini Pro ranks slightly below GPT-4; 
  • Claude ranks below Mistral, proving an open-source model can actually outperform closed models.
  • GPT-3.5 ranks last, and Llama is #24 on the list.

Remember that these scores don’t consider specific use cases. In our experience, GPT-4 Turbo is indeed the best model overall, but Claude 2.1 performs significantly better at creative copywriting. 

A good combination of GPT-4 Turbo (reasoning, logic, coding) and Claude 2.1 (writing) can generate incredible results for your business.

Cost

Mistral pricing highly depends on your hosting. It’s typically the cheapest option available, usually standing below the $1 per million input tokens mark. 

Gemini Pro is technically free for now while in beta, with a limit of 60 requests per minute. Once live, it will cost $0.000125 / 1K in input and $0.000375 / 1K tokens in output.

GPT-3.5 can be very cost effective. At $0.0005/1k tokens in input and $0.0015/1k tokens in output, it’s the cheapest closed model available. Gemini Pro will be cheaper when they launch officially. 

GPT-4 Turbo currently stands at $0.01/1k tokens in input and $0.03 in output. The more expensive GPT-4 starts at $0.03/1k in input and $0.06/1k in output.

Meta’s llama pricing depends on which host you use. On Microsoft Azure, it costs $3.09 per hour. 

Long story short: choose Gemini Pro or GPT-3.5 for quick savings. Mistral can be much cheaper if you know how to use it. GPT-4 Turbo is the best value for a closed model.

Context Size

If you want to learn more about the context size and release date of models we support, visit our models page (no signup required).

Most models support a context size up to 16k or 32k, which is very low compared to state-of-the-art models like GPT-4 Turbo and Claude 2.1. 

For AI-powered applications that require large context like long books, csv files, documents, etc. opt for a larger context size.

The context size determines how much of the current conversation is stored in the AI memory. 

WhenChatGPT starts “forgetting”, it’s a sign that there is too much context The context size of GPT-3.5, which stands at 16,385 tokens, is approximately 12,000 words.

If you want a large memory, the current best options are: 

  • GPT-4 Turbo (128k tokens), short max output size so not ideal for very long outputs
  • Claude 2.1 (200k tokens), gigantic max output size, ideal for very long outputs

Gemini 1.5 is supposed to come out with 1M token context size, but it’s not live yet.

With MindStudio, you can use vector databases to fine-tune the AI without a large context window. If the model you want doesn’t allow for large injections, you might be able to circumvent the issue with MindStudio’s RAG. 

Language

Unfortunately, the only models with good performance in other languages are OpenAI’s state of the art GPT-4 and GPT-4 Turbo.

Claude and Mistral can perform decently well in major languages like Spanish, Portuguese, Italian, German, and French, but the performance won’t be on par with the English counterpart. 

________________________________________________________________________________________

We hope this article helped you pick the best model for your next AI. Remember the decision isn’t final.

In MindStudio, all AIs are model agnostic. You can build your AI with GPT-3.5, remix it 5 times, and test it with other 4 models. Then, you can keep the one that performs the best.

We don’t restrict your creativity or freedom in any way, letting you choose custom temperatures, output size, and adding features like RAG and API endpoints.

Learn more about MindStudio on our YouTube channel or join the next webinar.

If you’re ready to get started, sign up for a free account here

Register now ->
Event ended. Watch recording here ->