When you upload a document to a SecureAI knowledge base, the system indexes it — extracting text, splitting it into chunks, and creating vector embeddings so the AI can search and reference the content. When indexing fails or produces incomplete results, the AI cannot answer questions about that document accurately.
This guide covers the most common indexing problems, how to diagnose them, and how to fix them.
How to Check Indexing Status
After uploading a document, check its indexing status in the knowledge base view:
- Navigate to Knowledge in the left sidebar.
- Select the knowledge base containing your document.
- Look at the status indicator next to each document:
- Completed — the document was fully indexed and is searchable.
- Processing — indexing is still in progress. Large files can take several minutes.
- Failed — indexing encountered an error. The document is not searchable.
- Partial — some content was indexed but the process did not complete fully.
If a document shows Failed or Partial, use the sections below to diagnose and resolve the issue.
Common Problems and Solutions
Document Shows "Failed" Status
Symptoms: The document appears in the knowledge base file list but has a failed status. The AI cannot reference any content from it.
Possible causes and fixes:
| Cause | How to Identify | Fix |
|---|---|---|
| Corrupted file | File cannot be opened in other applications | Re-export or re-download the original file and upload again |
| Unsupported format | File extension is not in the supported list | Convert to a supported format (PDF, TXT, MD, DOCX, CSV). See Supported File Formats |
| Password-protected PDF | PDF requires a password to open | Remove password protection before uploading. In Adobe Acrobat: File > Properties > Security > No Security |
| File exceeds size limit | File is larger than 50 MB | Split the document into smaller files. See Splitting Large Documents below |
| Encoding issues | Text file uses a non-UTF-8 encoding | Re-save the file as UTF-8. In most text editors: Save As > Encoding > UTF-8 |
Quick fix: Delete the failed document from the knowledge base and re-upload it. Transient server errors can cause one-time failures that succeed on retry.
Document Is Indexed but AI Cannot Find Content
Symptoms: The document shows "Completed" status, but the AI returns "I don't have information about that" when you ask questions about content you know is in the document.
This is the most common indexing issue. The document was processed, but the content was not extracted in a usable way.
Cause 1: Scanned PDF without OCR
Scanned PDFs are images of pages, not searchable text. SecureAI extracts text content — if there is no text layer, there is nothing to index.
How to check:
- Open the PDF on your computer.
- Try to select and copy text from a page.
- If you cannot select individual words, the PDF is image-only.
Fix: Run the PDF through an OCR tool before uploading:
- Adobe Acrobat Pro: Tools > Scan & OCR > Recognize Text
- Free alternative: OCRmyPDF (command line:
ocrmypdf input.pdf output.pdf) - Re-upload the OCR-processed PDF to the knowledge base.
Cause 2: Content is in images or diagrams
Parts diagrams, exploded views, wiring schematics, and specification tables stored as images are not indexed. Only text content is searchable.
Fix: Add text descriptions or captions alongside visual content. For critical reference images, create a companion text document that describes the key data points (part numbers, specifications, connections) shown in the diagrams.
Cause 3: Content is in headers, footers, or watermarks
Repeated header/footer text and watermarks are sometimes stripped during indexing to reduce noise. If important content (like part numbers) appears only in headers or footers, it may not be indexed.
Fix: Ensure key information appears in the main body text of the document, not only in headers or footers.
Cause 4: Document uses complex formatting
Multi-column layouts, nested tables, text boxes, and heavily formatted documents may not extract cleanly. The indexer processes the document in reading order, which can scramble content in complex layouts.
Fix: Simplify the document layout before uploading. Export to plain text or Markdown if possible. For complex catalogs, consider restructuring into a single-column format with clear headings.
Only Part of the Document Is Searchable
Symptoms: The AI can answer questions about the first portion of the document but not the later sections. Or it finds some content but misses other sections entirely.
Cause 1: Document exceeds page limit
Very large documents (over 100 pages) may only be partially indexed. The indexer processes pages sequentially, and extremely long documents may hit processing limits.
Fix: Split the document into smaller files. See Splitting Large Documents below.
Cause 2: Mixed content types
A document that switches between text pages and image-only pages (common in catalogs that mix digitally-created pages with scanned appendices) will have gaps in indexing wherever image-only pages appear.
Fix: Run OCR on the entire document to add a text layer to the scanned pages, or split the document and process the scanned sections separately.
Cause 3: Encoding changes mid-document
Documents assembled from multiple sources sometimes have encoding inconsistencies. Pages from one source may use a different character set than pages from another.
Fix: Re-export the document from a single source application (e.g., print-to-PDF from a consistent viewer) to normalize the encoding.
Indexing Takes Too Long
Symptoms: The document has been in "Processing" status for more than 30 minutes.
| Document Size | Expected Indexing Time |
|---|---|
| Under 10 pages | Under 1 minute |
| 10-50 pages | 1-5 minutes |
| 50-100 pages | 5-15 minutes |
| Over 100 pages | 15-30 minutes |
If indexing exceeds these times:
- Wait and check back — heavy server load can cause delays. Do not re-upload while processing is in progress, as this creates duplicate entries.
- Check your network — if the upload did not complete fully, the server may be waiting for data. Verify your connection is stable.
- Delete and retry — if indexing has been stuck for over an hour, delete the document from the knowledge base and re-upload.
- Split the file — if retrying does not help, the file may be too large or complex for a single indexing pass.
Indexed Content Has Errors or Garbled Text
Symptoms: The AI returns answers that contain garbled characters, wrong numbers, or scrambled text when referencing a specific document.
Cause 1: Poor OCR quality
OCR on low-resolution scans or poor-quality prints produces errors. Part numbers like D1222 may be read as D12Z2 or 01222.
Fix: Start with the highest-quality source available. If working from scans, use 300 DPI or higher. Re-run OCR on a cleaner source if possible.
Cause 2: Non-standard fonts
Technical documents sometimes use specialized fonts for symbols, engineering notation, or manufacturer-specific characters that OCR tools cannot recognize.
Fix: Re-export the document using standard fonts. If the original application supports it, export to PDF with fonts embedded as standard Unicode characters.
Cause 3: Table extraction issues
Data in tables with complex cell merging, rotated text, or nested subtables may extract in the wrong order. A row might appear as Part# D1222 Brake Pad Ceramic when the original table had those values in separate columns with different relationships.
Fix: For critical tabular data, convert the table to a flat text format with clear delimiters before uploading:
Part: D1222
Type: Brake Pad
Material: Ceramic
Application: 2025 Toyota Camry (all trims)
Splitting Large Documents
Large documents are the most common cause of indexing problems. Here is how to split them effectively:
By Section (Recommended)
Split at natural content boundaries:
- Parts catalogs: Split by product category (brakes, filters, ignition, etc.)
- Service manuals: Split by vehicle system (engine, transmission, electrical, etc.)
- Multi-brand catalogs: Split by manufacturer
By Page Count
If there are no natural section boundaries, split into files of 50-75 pages each. Ensure splits do not occur in the middle of a table or parts listing.
Tools for Splitting PDFs
- Adobe Acrobat Pro: Organize Pages > Split
- Free alternatives: PDFsam (desktop), Smallpdf (web-based)
- Command line:
pdftk input.pdf cat 1-50 output section1.pdf
After Splitting
- Name each file descriptively:
dorman-brake-components-section1-pads.pdf, notsplit-1.pdf - Upload all sections to the same knowledge base
- Test with queries that span section boundaries to verify coverage
Re-indexing a Document
If you have fixed the underlying issue (added OCR, split the file, simplified formatting), you need to re-upload the document for the changes to take effect:
- Navigate to the knowledge base containing the document.
- Delete the old version of the document.
- Upload the corrected version.
- Wait for indexing to complete and verify the status shows "Completed."
- Test with a few queries to confirm the content is searchable.
SecureAI does not currently support in-place re-indexing — you must delete and re-upload.
When to Contact Support
If you have tried the fixes above and the document still will not index correctly:
- The document is in a supported format, under the size limit, and text-selectable
- You have deleted and re-uploaded at least once
- You have tried splitting the document into smaller sections
- The issue persists across multiple documents from the same source
Contact your SecureAI administrator or reach out to support with:
- The file format and approximate size
- The indexing status shown in the knowledge base
- What you have tried so far
- A sample query that should return results but does not
Related Articles
- How to Upload Parts Catalog PDFs — step-by-step upload guide
- Knowledge Base Design Best Practices — organizing documents for optimal retrieval
- Supported File Formats — complete list of formats SecureAI can index