using-secureai
Articles (4)
- How to Choose the Right AI Model
SecureAI offers multiple AI models through OpenWebUI. Picking the right one for each task saves you time and keeps costs down for your organization. T...
- Model Comparison and Selection Guide
SecureAI gives you access to multiple AI models through your organization's OpenWebUI instance. Each model has different strengths — speed, accuracy, ...
- Multi-Model Conversations in SecureAI
SecureAI gives you access to multiple AI models through the OpenWebUI interface. You can switch between models mid-conversation, compare outputs side ...
- Understanding Token Usage and Costs
Every time you send a message in SecureAI, the AI model processes your text by breaking it into small units called **tokens**. Understanding how token...
FAQs (4)
Can SecureAI run with private models?
Yes. SecureAI supports private, self-hosted models alongside cloud-hosted providers. You can connect models running on your own infrastructure using Ollama, vLLM, llama.cpp, LocalAI, or any OpenAI-compatible server, so that no data leaves your network.
What counts as a "private model"?
A private model is any LLM that runs on infrastructure you control rather than calling an external provider's API. Common setups include:
- Ollama running open-weight models like Llama, Mistral, or CodeLlama on a local server or GPU cluster.
- vLLM or llama.cpp serving fine-tuned or specialized models behind an OpenAI-compatible endpoint.
- Azure OpenAI deployed in your own Azure tenant, where data stays within your controlled cloud environment.
How do I connect a private model?
An administrator configures private models in Admin Panel > Settings > Connections. The setup requires only the model server's endpoint URL — no API key is needed if the server is on the same network as SecureAI.
For step-by-step instructions, see Adding Custom Model Providers.
Can I use private and cloud models at the same time?
Yes. SecureAI aggregates models from all configured providers into a single model selector. Users can choose between private models (for sensitive data) and cloud models (for tasks where speed or capability matters) on a per-conversation basis.
Why use private models?
| Benefit | Details |
|---|---|
| Data residency | Prompts and responses never leave your infrastructure |
| Compliance | Meet regulatory requirements that prohibit sending data to third-party APIs |
| Cost control | No per-token charges — only your infrastructure costs |
| Custom fine-tuning | Run models fine-tuned on your organization's proprietary data |
Are there any limitations?
Private models depend on your hardware. Response speed and quality vary based on the model size and the GPU/CPU resources available. Smaller open-weight models may not match the capability of large cloud models for complex reasoning tasks. Monitor performance in Admin Panel > Dashboard > Usage Statistics.
Can we use OpenAI, Anthropic, or Azure OpenAI?
Yes. SecureAI supports multiple AI model providers. You can connect OpenAI, Anthropic, Azure OpenAI, and local models -- all accessible through a single unified interface.
Supported providers
| Provider | Connection method | Example models |
|---|---|---|
| OpenAI | API key | GPT-4o, GPT-4 Turbo, o1, o3 |
| Anthropic | API key | Claude Opus, Claude Sonnet, Claude Haiku |
| Azure OpenAI | Endpoint URL + API key | GPT-4o, GPT-4 (deployed in your Azure tenant) |
| Local models | Ollama, vLLM, or any OpenAI-compatible server | Llama, Mistral, CodeLlama, Gemma |
Administrators can enable any combination of these providers. Users see all available models in a single model selector dropdown.
How are providers configured?
An administrator adds providers in Admin Panel > Settings > Connections:
- OpenAI -- Enter your OpenAI API key. All models available on your OpenAI account appear automatically.
- Anthropic -- Enter your Anthropic API key. Claude models appear in the model selector.
- Azure OpenAI -- Enter your Azure endpoint URL and API key. Only models deployed in your Azure tenant are listed.
- Local models -- Enter the endpoint URL of your Ollama, vLLM, or compatible server. No API key is needed for servers on the same network.
For step-by-step instructions, see Adding Custom Model Providers.
Can I use multiple providers at the same time?
Yes. SecureAI aggregates models from all configured providers into one model selector. You can switch between providers on a per-conversation basis. For example, use Claude for analysis tasks and a local Llama model for conversations involving sensitive internal data.
Which provider should I choose?
It depends on your priorities:
- OpenAI or Anthropic -- Best model quality and speed. Data is sent to the provider's API.
- Azure OpenAI -- Enterprise-grade cloud models with data residency in your Azure tenant. Useful for compliance requirements.
- Local models -- No data leaves your network. Best for strict data privacy requirements, but performance depends on your hardware.
Your administrator may restrict which providers and models are available to you. See Can admins restrict models and integrations? for details.
Do I need my own API keys?
No. Administrators configure provider API keys centrally. Individual users do not need their own keys -- they simply select a model from the dropdown and start chatting.
What AI models are supported?
SecureAI supports a wide range of AI models from multiple providers. Your administrator controls which models are available in your organization's instance.
Cloud-hosted model providers
SecureAI includes built-in support for the following cloud-hosted providers:
| Provider | Example models | Strengths |
|---|---|---|
| OpenAI | GPT-4o, GPT-4, GPT-4o-mini | General-purpose reasoning, code generation, creative writing |
| Anthropic | Claude 4 Sonnet, Claude 4 Opus, Claude 3.5 Haiku | Long-context analysis, nuanced reasoning, document comprehension |
| Gemini 2.5 Pro, Gemini 2.5 Flash | Multimodal tasks, large context windows, fast responses |
Model availability depends on your organization's subscription and the API keys your administrator has configured. You will only see models that your administrator has enabled.
Self-hosted and local models
SecureAI also supports self-hosted model providers, giving your organization full control over data residency and model selection:
- Ollama -- run open-source models locally. Popular choices include Llama 3, Mistral, and Phi-3.
- vLLM -- high-performance inference server for hosting large models on your own GPU infrastructure.
- Any OpenAI-compatible API -- SecureAI can connect to any endpoint that implements the OpenAI API format, including custom fine-tuned models and specialized inference servers.
With self-hosted models, prompts never leave your infrastructure. This is the preferred option for organizations with strict data residency requirements.
How to see which models are available to you
- Open a new chat in SecureAI.
- Click the model selector dropdown at the top of the chat area.
- The list shows all models your administrator has enabled for your account.
If a model you need is not listed, ask your administrator to enable it. Administrators manage model availability from Admin Panel > Settings > Models.
Choosing the right model
Different models are suited for different tasks:
| Task | Recommended approach |
|---|---|
| Quick questions and simple tasks | Use a smaller, faster model (e.g., GPT-4o-mini, Claude 3.5 Haiku, Gemini 2.5 Flash) for lower latency and cost. |
| Complex analysis and reasoning | Use a larger model (e.g., GPT-4o, Claude 4 Opus, Gemini 2.5 Pro) for tasks requiring deeper reasoning or multi-step problem solving. |
| Working with long documents | Choose a model with a large context window. Claude and Gemini models support context windows up to 200K+ tokens. |
| Sensitive or regulated data | Use a self-hosted model (via Ollama or vLLM) to keep all data within your infrastructure. |
Your organization may have guidelines on which models to use for specific types of work. Check with your administrator if you are unsure.
Can my administrator restrict model access?
Yes. Administrators can set each model's visibility to "All users" or "Admins only." Models set to "Admins only" are hidden from standard users entirely. For details, see Can admins restrict models and integrations?.
Related articles
- Can admins restrict models and integrations? -- model visibility and access controls
- How is SecureAI different from ChatGPT? -- multi-model access as a key differentiator
- Adding Custom Model Providers -- connecting additional model providers
- How is data encrypted in SecureAI? -- encryption for data sent to model providers
What is a token and how is usage measured?
A token is the basic unit AI models use to process text. Understanding tokens helps you estimate costs, choose the right model for a task, and stay within your organization's usage limits.
What is a token?
AI models do not read text word by word. Instead, they break text into smaller pieces called tokens. A token can be a whole word, part of a word, a punctuation mark, or a space.
As a rough guide:
- 1 token is approximately 4 characters or 0.75 words in English.
- A short sentence like "How do I reset my password?" is about 8 tokens.
- A full page of text (roughly 500 words) is about 650--700 tokens.
- A 10-page document is roughly 6,500--7,000 tokens.
The exact number of tokens depends on the specific model's tokenizer. Different providers (OpenAI, Anthropic, Google) use slightly different tokenization methods, so the same text may produce slightly different token counts across models.
How is usage measured?
Every time you send a message in SecureAI, usage is measured in two parts:
| Component | What it counts |
|---|---|
| Input tokens | Your message, any system prompt, conversation history sent for context, and any documents or knowledge base content retrieved via RAG |
| Output tokens | The AI model's response |
Total tokens per message = input tokens + output tokens.
A few things that affect your token count:
- Conversation history -- as a conversation grows longer, each new message includes prior messages as context, increasing input tokens. Starting a new chat resets this.
- Knowledge base (RAG) retrieval -- when SecureAI pulls relevant documents to answer your question, those retrieved passages count as input tokens.
- Attachments and uploads -- files you attach to a message (PDFs, text files, images) are converted to tokens and included in the input.
- System prompts -- your organization's system prompt is included with every message as input tokens. Longer system prompts increase per-message costs.
How many tokens does a typical message use?
Token usage varies widely depending on the task:
| Scenario | Approximate tokens |
|---|---|
| Simple question and short answer | 200--500 total |
| Question with a few paragraphs of context | 1,000--2,000 total |
| Analyzing an uploaded document (10 pages) | 7,000--10,000 total |
| Long conversation (20+ back-and-forth messages) | 10,000--30,000+ total |
These are estimates. Actual usage depends on the model, the length of your prompts and responses, and how much context is included.
Why do costs vary by model?
Different models charge different rates per token. Generally:
- Smaller, faster models (e.g., GPT-4o-mini, Claude 3.5 Haiku, Gemini 2.5 Flash) cost less per token and are suited for routine tasks.
- Larger, more capable models (e.g., GPT-4o, Claude 4 Opus, Gemini 2.5 Pro) cost more per token but handle complex reasoning and analysis better.
- Self-hosted models (via Ollama or vLLM) have no per-token API cost -- you pay only for the infrastructure to run them.
Your administrator may set usage limits per user or per model to manage costs. If you hit a limit, you will see a message indicating that your usage quota has been reached.
Tips for managing token usage
- Start new chats for unrelated questions instead of continuing a long conversation. This keeps input token counts low.
- Use smaller models for simple tasks like summarization, formatting, or quick lookups.
- Be specific in your prompts. Clear, focused questions produce shorter, more relevant responses.
- Limit attachments to the relevant pages or sections rather than uploading entire large documents when possible.
Where can I see my usage?
Usage visibility depends on your role:
- Users -- check your usage in Settings > Account > Usage. You can see token counts per conversation and overall totals for the current billing period.
- Administrators -- view organization-wide usage in Admin Panel > Usage. This includes per-user breakdowns, per-model costs, and trend data.
For billing details, see How is SecureAI billed?.
Related articles
- How is SecureAI billed? -- pricing tiers and billing cycles
- What AI models are supported? -- available models and their strengths
- How does RAG work in SecureAI? -- how knowledge base retrieval affects token usage
- What are workspaces, models, tools, and knowledge bases? -- core concepts overview