Understanding Token Usage and Costs

Every time you send a message in SecureAI, the AI model processes your text by breaking it into small units called tokens. Understanding how tokens work helps you use SecureAI more efficiently and keeps costs predictable for your organization.

What Are Tokens?

A token is a chunk of text that the AI model reads and produces. Tokens are not the same as words — they are pieces that the model's internal dictionary recognizes. Some rough guidelines:

1 token ≈ 4 characters of English text (about ¾ of a word).
Short common words like "the", "is", "a" are usually one token each.
Longer or uncommon words get split into multiple tokens. "Aftermarket" might be two or three tokens.
Numbers, punctuation, and special characters each consume tokens.
Part numbers like AC-DELCO-PF48 may use 5–8 tokens because they combine letters, hyphens, and digits.

Example: The sentence "What is the OEM oil filter for a 2023 Toyota Camry?" is roughly 14 tokens.

How Usage Is Measured

SecureAI tracks token usage in two directions for every exchange:

Direction	What It Counts	Why It Matters
Input tokens	Your message + the conversation history the model re-reads	Longer conversations cost more because the model re-processes previous messages each turn
Output tokens	The model's response	Detailed answers with tables, lists, or long explanations use more output tokens

Total tokens per exchange = input tokens + output tokens.

Conversation History Adds Up

Each new message in a conversation includes the full history so the model remembers context. A 10-message conversation means message 10 re-sends messages 1–9 as input. This is why long conversations use significantly more tokens than short ones.

Uploaded Files Count Too

When you upload a PDF, image, or parts catalog, SecureAI converts the content into tokens for the model to read. A 10-page PDF might add thousands of input tokens to every subsequent message in that conversation.

Viewing Your Token Usage

Depending on how your administrator has configured SecureAI, you can see token usage in several places:

In the Chat Interface

After the model responds, look for a token count or usage indicator near the response. In OpenWebUI, this typically appears as a small label showing the number of tokens used for that exchange. Click or hover to see the breakdown of input versus output tokens.

In Your Usage Dashboard

If your organization has enabled the usage dashboard:

Click your profile icon in the lower-left corner of the interface.
Select Usage or Token Usage (the exact label depends on your organization's configuration).
You will see a summary of your token consumption over time — daily, weekly, or monthly.

This view helps you spot trends. If usage spikes on a particular day, you can check which conversations drove the increase.

Asking Your Administrator

Administrators have access to organization-wide usage analytics including per-user breakdowns, model-by-model costs, and trend reports. If you cannot see your own usage, ask your admin for a summary.

What Tokens Cost

Your organization pays for token usage based on the model tier:

Model Tier	Relative Cost	Typical Use Case
Fast (smaller models)	Lowest	Quick lookups, simple Q&A
Balanced (mid-size models)	Moderate	Cross-referencing, compatibility checks
Advanced (large frontier models)	Highest	Complex diagnostics, multi-document analysis

The exact pricing depends on your organization's contract with the LLM provider. Your administrator can tell you which models cost more and whether there are usage caps or quotas in place.

Key point: Output tokens typically cost more per token than input tokens. A long, detailed response costs more than a short, direct one — even if the question was the same.

Reducing Token Usage (and Costs)

You do not need to ration your usage, but these habits keep costs reasonable and often get you better answers:

1. Start New Conversations for New Topics

When you switch from looking up brake pads to asking about transmission fluid, start a fresh conversation. The old history adds input tokens to every message with no benefit for the new topic.

2. Be Specific in Your Questions

Vague questions produce long, general answers that consume more output tokens. Compare:

Vague: "Tell me about oil filters." → Long, broad response (300+ tokens).
Specific: "What is the OEM oil filter part number for a 2023 Toyota Camry 2.5L?" → Short, direct response (30–50 tokens).

3. Choose the Right Model for the Task

Use the fast tier for straightforward lookups and save the advanced tier for complex analysis. See the Model Comparison and Selection Guide for detailed recommendations by task type.

4. Summarize Before Continuing Long Conversations

If a conversation has gone on for many messages and you need to keep going, ask the model to summarize the key findings so far. Then start a new conversation with that summary as your opening message. This resets the input token count while preserving the essential context.

5. Avoid Uploading Unnecessary Files

Only attach documents that are directly relevant to your question. A full 200-page parts catalog uploaded "just in case" adds massive input tokens to every message in that conversation.

6. Use Tables and Structured Requests

When you need data for multiple parts or vehicles, ask for a table in a single message rather than asking one question at a time. One well-structured request is cheaper than five separate exchanges.

List the OEM brake pad part numbers for these vehicles in a table:
- 2022 Honda Civic
- 2023 Toyota Camry
- 2021 Ford F-150

This produces one response instead of three separate conversations.

Token Limits and Context Windows

Every model has a context window — the maximum number of tokens it can process in a single exchange (input + output combined). When a conversation exceeds the context window, the model cannot see the oldest messages.

Signs you have hit the context limit:

The model "forgets" something you discussed earlier in the conversation.
You see a warning about message length or context limits.
Responses become less relevant to the earlier parts of your conversation.

When this happens, start a new conversation. Copy over any critical details from the previous one.

Common Questions

Does every message cost the same? No. Cost depends on the model tier, the length of your message, the length of the conversation history, any uploaded files, and the length of the model's response.

Can I see exactly how much each conversation cost in dollars? This depends on your organization's setup. Some administrators expose cost data; others show only token counts. Ask your admin what is available.

Does the model charge tokens for messages I delete? Tokens are consumed at the time of the exchange. Deleting a message afterward does not refund the tokens.

Are tokens shared across my team? Typically yes — your organization has a pool or budget. Your administrator manages allocation and can set per-user quotas if needed.

Tips for Automotive Aftermarket Users

Parts lookup queries tend to be token-efficient because they are short and specific. The biggest token consumers for aftermarket users are:

Uploading large PDFs (full parts catalogs, service manuals) — keep uploads targeted to relevant pages when possible.
Long diagnostic conversations where you are troubleshooting a complex issue step by step — summarize and restart when the conversation exceeds 15–20 messages.
Using advanced models for simple lookups — save budget by using the fast tier for catalog queries and reserving advanced models for multi-vehicle analysis or compatibility reasoning.