Troubleshooting Document Indexing

When you upload a document to a SecureAI knowledge base, the system indexes it — extracting text, splitting it into chunks, and creating vector embeddings so the AI can search and reference the content. When indexing fails or produces incomplete results, the AI cannot answer questions about that document accurately.

This guide covers the most common indexing problems, how to diagnose them, and how to fix them.

How to Check Indexing Status

After uploading a document, check its indexing status in the knowledge base view:

Navigate to Knowledge in the left sidebar.
Select the knowledge base containing your document.
Look at the status indicator next to each document:
- Completed — the document was fully indexed and is searchable.
- Processing — indexing is still in progress. Large files can take several minutes.
- Failed — indexing encountered an error. The document is not searchable.
- Partial — some content was indexed but the process did not complete fully.

If a document shows Failed or Partial, use the sections below to diagnose and resolve the issue.

Common Problems and Solutions

Document Shows "Failed" Status

Symptoms: The document appears in the knowledge base file list but has a failed status. The AI cannot reference any content from it.

Possible causes and fixes:

Cause	How to Identify	Fix
Corrupted file	File cannot be opened in other applications	Re-export or re-download the original file and upload again
Unsupported format	File extension is not in the supported list	Convert to a supported format (PDF, TXT, MD, DOCX, CSV). See Supported File Formats
Password-protected PDF	PDF requires a password to open	Remove password protection before uploading. In Adobe Acrobat: File > Properties > Security > No Security
File exceeds size limit	File is larger than 50 MB	Split the document into smaller files. See Splitting Large Documents below
Encoding issues	Text file uses a non-UTF-8 encoding	Re-save the file as UTF-8. In most text editors: Save As > Encoding > UTF-8

Quick fix: Delete the failed document from the knowledge base and re-upload it. Transient server errors can cause one-time failures that succeed on retry.

Document Is Indexed but AI Cannot Find Content

Symptoms: The document shows "Completed" status, but the AI returns "I don't have information about that" when you ask questions about content you know is in the document.

This is the most common indexing issue. The document was processed, but the content was not extracted in a usable way.

Cause 1: Scanned PDF without OCR

Scanned PDFs are images of pages, not searchable text. SecureAI extracts text content — if there is no text layer, there is nothing to index.

How to check:

Open the PDF on your computer.
Try to select and copy text from a page.
If you cannot select individual words, the PDF is image-only.

Fix: Run the PDF through an OCR tool before uploading:

Adobe Acrobat Pro: Tools > Scan & OCR > Recognize Text
Free alternative: OCRmyPDF (command line: ocrmypdf input.pdf output.pdf)
Re-upload the OCR-processed PDF to the knowledge base.

Cause 2: Content is in images or diagrams

Parts diagrams, exploded views, wiring schematics, and specification tables stored as images are not indexed. Only text content is searchable.

Fix: Add text descriptions or captions alongside visual content. For critical reference images, create a companion text document that describes the key data points (part numbers, specifications, connections) shown in the diagrams.

Cause 3: Content is in headers, footers, or watermarks

Repeated header/footer text and watermarks are sometimes stripped during indexing to reduce noise. If important content (like part numbers) appears only in headers or footers, it may not be indexed.

Fix: Ensure key information appears in the main body text of the document, not only in headers or footers.

Cause 4: Document uses complex formatting

Multi-column layouts, nested tables, text boxes, and heavily formatted documents may not extract cleanly. The indexer processes the document in reading order, which can scramble content in complex layouts.

Fix: Simplify the document layout before uploading. Export to plain text or Markdown if possible. For complex catalogs, consider restructuring into a single-column format with clear headings.

Only Part of the Document Is Searchable

Symptoms: The AI can answer questions about the first portion of the document but not the later sections. Or it finds some content but misses other sections entirely.

Cause 1: Document exceeds page limit

Very large documents (over 100 pages) may only be partially indexed. The indexer processes pages sequentially, and extremely long documents may hit processing limits.

Fix: Split the document into smaller files. See Splitting Large Documents below.

Cause 2: Mixed content types

A document that switches between text pages and image-only pages (common in catalogs that mix digitally-created pages with scanned appendices) will have gaps in indexing wherever image-only pages appear.

Fix: Run OCR on the entire document to add a text layer to the scanned pages, or split the document and process the scanned sections separately.

Cause 3: Encoding changes mid-document

Documents assembled from multiple sources sometimes have encoding inconsistencies. Pages from one source may use a different character set than pages from another.

Fix: Re-export the document from a single source application (e.g., print-to-PDF from a consistent viewer) to normalize the encoding.

Indexing Takes Too Long

Symptoms: The document has been in "Processing" status for more than 30 minutes.

Document Size	Expected Indexing Time
Under 10 pages	Under 1 minute
10-50 pages	1-5 minutes
50-100 pages	5-15 minutes
Over 100 pages	15-30 minutes

If indexing exceeds these times:

Wait and check back — heavy server load can cause delays. Do not re-upload while processing is in progress, as this creates duplicate entries.
Check your network — if the upload did not complete fully, the server may be waiting for data. Verify your connection is stable.
Delete and retry — if indexing has been stuck for over an hour, delete the document from the knowledge base and re-upload.
Split the file — if retrying does not help, the file may be too large or complex for a single indexing pass.

Indexed Content Has Errors or Garbled Text

Symptoms: The AI returns answers that contain garbled characters, wrong numbers, or scrambled text when referencing a specific document.

Cause 1: Poor OCR quality

OCR on low-resolution scans or poor-quality prints produces errors. Part numbers like D1222 may be read as D12Z2 or 01222.

Fix: Start with the highest-quality source available. If working from scans, use 300 DPI or higher. Re-run OCR on a cleaner source if possible.

Cause 2: Non-standard fonts

Technical documents sometimes use specialized fonts for symbols, engineering notation, or manufacturer-specific characters that OCR tools cannot recognize.

Fix: Re-export the document using standard fonts. If the original application supports it, export to PDF with fonts embedded as standard Unicode characters.

Cause 3: Table extraction issues

Data in tables with complex cell merging, rotated text, or nested subtables may extract in the wrong order. A row might appear as Part# D1222 Brake Pad Ceramic when the original table had those values in separate columns with different relationships.

Fix: For critical tabular data, convert the table to a flat text format with clear delimiters before uploading:

Part: D1222
Type: Brake Pad
Material: Ceramic
Application: 2025 Toyota Camry (all trims)

Splitting Large Documents

Large documents are the most common cause of indexing problems. Here is how to split them effectively:

By Section (Recommended)

Split at natural content boundaries:

Parts catalogs: Split by product category (brakes, filters, ignition, etc.)
Service manuals: Split by vehicle system (engine, transmission, electrical, etc.)
Multi-brand catalogs: Split by manufacturer

By Page Count

If there are no natural section boundaries, split into files of 50-75 pages each. Ensure splits do not occur in the middle of a table or parts listing.

Tools for Splitting PDFs

Adobe Acrobat Pro: Organize Pages > Split
Free alternatives: PDFsam (desktop), Smallpdf (web-based)
Command line: pdftk input.pdf cat 1-50 output section1.pdf

After Splitting

Name each file descriptively: dorman-brake-components-section1-pads.pdf, not split-1.pdf
Upload all sections to the same knowledge base
Test with queries that span section boundaries to verify coverage

Re-indexing a Document

If you have fixed the underlying issue (added OCR, split the file, simplified formatting), you need to re-upload the document for the changes to take effect:

Navigate to the knowledge base containing the document.
Delete the old version of the document.
Upload the corrected version.
Wait for indexing to complete and verify the status shows "Completed."
Test with a few queries to confirm the content is searchable.

SecureAI does not currently support in-place re-indexing — you must delete and re-upload.

When to Contact Support

If you have tried the fixes above and the document still will not index correctly:

The document is in a supported format, under the size limit, and text-selectable
You have deleted and re-uploaded at least once
You have tried splitting the document into smaller sections
The issue persists across multiple documents from the same source

Contact your SecureAI administrator or reach out to support with:

The file format and approximate size
The indexing status shown in the knowledge base
What you have tried so far
A sample query that should return results but does not

How to Upload Parts Catalog PDFs — step-by-step upload guide
Knowledge Base Design Best Practices — organizing documents for optimal retrieval
Supported File Formats — complete list of formats SecureAI can index