Sign up for Runchat Pro or connect OpenRouter to install and configure models.
The Models and Usage page at https://runchat.app/dashboard/models allows you to see token consumption for all installed models. When you sign up for Runchat Pro or link Cerebras or Groq, supported models are automatically added to the models list. You can select any installed model as the default to use when creating new Prompt
nodes by clicking the checkbox at the end of the model name row. Models typically trade off between speed, cost and features. A comparison of Gemini, Llama (on Groq) and OpenAI (on OpenRouter) models is below:
Provider | Model | Cost per million output tokens | Function Calling? | Multimodal support | Structured Outputs? | Tokens per second |
---|---|---|---|---|---|---|
Gemini | 1.5 Flash | $0.30 | ✅ | ✅ | ✅ | 150 |
Gemini | 2.0 Flash | Free (rate limits) | ✅ | ✅ | ✅ | 150 |
Groq | Llama 3.1 8B | Free (rate limits) | ✅ | - | - | 750 |
Groq | Llama 3.2 11B | Free (rate limits) | ✅ | ✅ | - | 500 |
OpenRouter | openai/gpt-4o-mini-2024-07-18 | $0.60 | ✅ | ✅ | ✅ | 60 |
OpenRouter | Gemini 1.5 Flash 8B | $0.15 | ✅ | - | - | 250 |
When you sign up for Runchat Pro, the Gemini 1.5 Flash and Gemini 2.0 Flash models are automatically installed for you with 10M token per month limits. Gemini 2.0 Flash is a more capable model, but is currently experimental and rate limited. Gemini 1.5 Flash offers the best tradeoff of any model on the market for speed, price and functionality and supports multimodal requests with images, documents (PDF), video and audio. It can reliably produce valid JSON to call external APIs or format data for saving to databases, spreadsheets or further processing.
After you have connected your OpenRouter account, you can install supported models by clicking the dropdown at the top of the models and usage page in the dashboard. Select the model you want to add, then click Install
to add it to your models list. You will now be able to select this model from the Prompt
node settings.
If you are developing applications that consist of many Prompt
nodes with only text input and output, then there are obvious benefits to using Groq or Cerebras models as these will run at 5 times the speed of Gemini. These models are also great for experimentation and play, as instant response times will provide you with fast feedback on whether a prompt or application logic is working as expected.
If your application requires more advanced reasoning capability, you should install one of the frontier models from OpenAI, Anthropic or Deepseek using OpenRouter. These models are typically much slower and more expensive, but will be able to produce better responses to some prompts. Always consider combining these slower reasoning models with faster, low cost models where possible.