← Home

using-secureai

Articles (4)

FAQs (4)

Can SecureAI run with private models?

Yes. SecureAI supports private, self-hosted models alongside cloud-hosted providers. You can connect models running on your own infrastructure using Ollama, vLLM, llama.cpp, LocalAI, or any OpenAI-compatible server, so that no data leaves your network.

What counts as a "private model"?

A private model is any LLM that runs on infrastructure you control rather than calling an external provider's API. Common setups include:

  • Ollama running open-weight models like Llama, Mistral, or CodeLlama on a local server or GPU cluster.
  • vLLM or llama.cpp serving fine-tuned or specialized models behind an OpenAI-compatible endpoint.
  • Azure OpenAI deployed in your own Azure tenant, where data stays within your controlled cloud environment.

How do I connect a private model?

An administrator configures private models in Admin Panel > Settings > Connections. The setup requires only the model server's endpoint URL — no API key is needed if the server is on the same network as SecureAI.

For step-by-step instructions, see Adding Custom Model Providers.

Can I use private and cloud models at the same time?

Yes. SecureAI aggregates models from all configured providers into a single model selector. Users can choose between private models (for sensitive data) and cloud models (for tasks where speed or capability matters) on a per-conversation basis.

Why use private models?

Benefit Details
Data residency Prompts and responses never leave your infrastructure
Compliance Meet regulatory requirements that prohibit sending data to third-party APIs
Cost control No per-token charges — only your infrastructure costs
Custom fine-tuning Run models fine-tuned on your organization's proprietary data

Are there any limitations?

Private models depend on your hardware. Response speed and quality vary based on the model size and the GPU/CPU resources available. Smaller open-weight models may not match the capability of large cloud models for complex reasoning tasks. Monitor performance in Admin Panel > Dashboard > Usage Statistics.

Can we use OpenAI, Anthropic, or Azure OpenAI?

Yes. SecureAI supports multiple AI model providers. You can connect OpenAI, Anthropic, Azure OpenAI, and local models -- all accessible through a single unified interface.

Supported providers

Provider Connection method Example models
OpenAI API key GPT-4o, GPT-4 Turbo, o1, o3
Anthropic API key Claude Opus, Claude Sonnet, Claude Haiku
Azure OpenAI Endpoint URL + API key GPT-4o, GPT-4 (deployed in your Azure tenant)
Local models Ollama, vLLM, or any OpenAI-compatible server Llama, Mistral, CodeLlama, Gemma

Administrators can enable any combination of these providers. Users see all available models in a single model selector dropdown.

How are providers configured?

An administrator adds providers in Admin Panel > Settings > Connections:

  1. OpenAI -- Enter your OpenAI API key. All models available on your OpenAI account appear automatically.
  2. Anthropic -- Enter your Anthropic API key. Claude models appear in the model selector.
  3. Azure OpenAI -- Enter your Azure endpoint URL and API key. Only models deployed in your Azure tenant are listed.
  4. Local models -- Enter the endpoint URL of your Ollama, vLLM, or compatible server. No API key is needed for servers on the same network.

For step-by-step instructions, see Adding Custom Model Providers.

Can I use multiple providers at the same time?

Yes. SecureAI aggregates models from all configured providers into one model selector. You can switch between providers on a per-conversation basis. For example, use Claude for analysis tasks and a local Llama model for conversations involving sensitive internal data.

Which provider should I choose?

It depends on your priorities:

  • OpenAI or Anthropic -- Best model quality and speed. Data is sent to the provider's API.
  • Azure OpenAI -- Enterprise-grade cloud models with data residency in your Azure tenant. Useful for compliance requirements.
  • Local models -- No data leaves your network. Best for strict data privacy requirements, but performance depends on your hardware.

Your administrator may restrict which providers and models are available to you. See Can admins restrict models and integrations? for details.

Do I need my own API keys?

No. Administrators configure provider API keys centrally. Individual users do not need their own keys -- they simply select a model from the dropdown and start chatting.

What AI models are supported?

SecureAI supports a wide range of AI models from multiple providers. Your administrator controls which models are available in your organization's instance.

Cloud-hosted model providers

SecureAI includes built-in support for the following cloud-hosted providers:

Provider Example models Strengths
OpenAI GPT-4o, GPT-4, GPT-4o-mini General-purpose reasoning, code generation, creative writing
Anthropic Claude 4 Sonnet, Claude 4 Opus, Claude 3.5 Haiku Long-context analysis, nuanced reasoning, document comprehension
Google Gemini 2.5 Pro, Gemini 2.5 Flash Multimodal tasks, large context windows, fast responses

Model availability depends on your organization's subscription and the API keys your administrator has configured. You will only see models that your administrator has enabled.

Self-hosted and local models

SecureAI also supports self-hosted model providers, giving your organization full control over data residency and model selection:

  • Ollama -- run open-source models locally. Popular choices include Llama 3, Mistral, and Phi-3.
  • vLLM -- high-performance inference server for hosting large models on your own GPU infrastructure.
  • Any OpenAI-compatible API -- SecureAI can connect to any endpoint that implements the OpenAI API format, including custom fine-tuned models and specialized inference servers.

With self-hosted models, prompts never leave your infrastructure. This is the preferred option for organizations with strict data residency requirements.

How to see which models are available to you

  1. Open a new chat in SecureAI.
  2. Click the model selector dropdown at the top of the chat area.
  3. The list shows all models your administrator has enabled for your account.

If a model you need is not listed, ask your administrator to enable it. Administrators manage model availability from Admin Panel > Settings > Models.

Choosing the right model

Different models are suited for different tasks:

Task Recommended approach
Quick questions and simple tasks Use a smaller, faster model (e.g., GPT-4o-mini, Claude 3.5 Haiku, Gemini 2.5 Flash) for lower latency and cost.
Complex analysis and reasoning Use a larger model (e.g., GPT-4o, Claude 4 Opus, Gemini 2.5 Pro) for tasks requiring deeper reasoning or multi-step problem solving.
Working with long documents Choose a model with a large context window. Claude and Gemini models support context windows up to 200K+ tokens.
Sensitive or regulated data Use a self-hosted model (via Ollama or vLLM) to keep all data within your infrastructure.

Your organization may have guidelines on which models to use for specific types of work. Check with your administrator if you are unsure.

Can my administrator restrict model access?

Yes. Administrators can set each model's visibility to "All users" or "Admins only." Models set to "Admins only" are hidden from standard users entirely. For details, see Can admins restrict models and integrations?.

Related articles

What is a token and how is usage measured?

A token is the basic unit AI models use to process text. Understanding tokens helps you estimate costs, choose the right model for a task, and stay within your organization's usage limits.

What is a token?

AI models do not read text word by word. Instead, they break text into smaller pieces called tokens. A token can be a whole word, part of a word, a punctuation mark, or a space.

As a rough guide:

  • 1 token is approximately 4 characters or 0.75 words in English.
  • A short sentence like "How do I reset my password?" is about 8 tokens.
  • A full page of text (roughly 500 words) is about 650--700 tokens.
  • A 10-page document is roughly 6,500--7,000 tokens.

The exact number of tokens depends on the specific model's tokenizer. Different providers (OpenAI, Anthropic, Google) use slightly different tokenization methods, so the same text may produce slightly different token counts across models.

How is usage measured?

Every time you send a message in SecureAI, usage is measured in two parts:

Component What it counts
Input tokens Your message, any system prompt, conversation history sent for context, and any documents or knowledge base content retrieved via RAG
Output tokens The AI model's response

Total tokens per message = input tokens + output tokens.

A few things that affect your token count:

  • Conversation history -- as a conversation grows longer, each new message includes prior messages as context, increasing input tokens. Starting a new chat resets this.
  • Knowledge base (RAG) retrieval -- when SecureAI pulls relevant documents to answer your question, those retrieved passages count as input tokens.
  • Attachments and uploads -- files you attach to a message (PDFs, text files, images) are converted to tokens and included in the input.
  • System prompts -- your organization's system prompt is included with every message as input tokens. Longer system prompts increase per-message costs.

How many tokens does a typical message use?

Token usage varies widely depending on the task:

Scenario Approximate tokens
Simple question and short answer 200--500 total
Question with a few paragraphs of context 1,000--2,000 total
Analyzing an uploaded document (10 pages) 7,000--10,000 total
Long conversation (20+ back-and-forth messages) 10,000--30,000+ total

These are estimates. Actual usage depends on the model, the length of your prompts and responses, and how much context is included.

Why do costs vary by model?

Different models charge different rates per token. Generally:

  • Smaller, faster models (e.g., GPT-4o-mini, Claude 3.5 Haiku, Gemini 2.5 Flash) cost less per token and are suited for routine tasks.
  • Larger, more capable models (e.g., GPT-4o, Claude 4 Opus, Gemini 2.5 Pro) cost more per token but handle complex reasoning and analysis better.
  • Self-hosted models (via Ollama or vLLM) have no per-token API cost -- you pay only for the infrastructure to run them.

Your administrator may set usage limits per user or per model to manage costs. If you hit a limit, you will see a message indicating that your usage quota has been reached.

Tips for managing token usage

  • Start new chats for unrelated questions instead of continuing a long conversation. This keeps input token counts low.
  • Use smaller models for simple tasks like summarization, formatting, or quick lookups.
  • Be specific in your prompts. Clear, focused questions produce shorter, more relevant responses.
  • Limit attachments to the relevant pages or sections rather than uploading entire large documents when possible.

Where can I see my usage?

Usage visibility depends on your role:

  • Users -- check your usage in Settings > Account > Usage. You can see token counts per conversation and overall totals for the current billing period.
  • Administrators -- view organization-wide usage in Admin Panel > Usage. This includes per-user breakdowns, per-model costs, and trend data.

For billing details, see How is SecureAI billed?.

Related articles