Upload documents
Documents are the second most common knowledge source. Use them for content that doesn’t live on a public website — internal handbooks, exported wikis, product manuals.
In this guide:
- Supported formats and sizes
- Upload a single document
- Upload many at once
- What happens after upload
- Troubleshooting
Supported formats
| Format | Notes |
|---|---|
| Text-based PDFs only. Scanned/image PDFs won’t extract — OCR them first. | |
| DOCX | Microsoft Word format. Tables, headings, and lists are preserved. |
| TXT | Plain text. Best for raw exports. |
Per-file size limits depend on your plan; in most plans single files cap at 25 MB. Larger files: split into sections and upload as multiple sources.
Step 1: Upload
Open Knowledge base → Add knowledge source → Document.
Drag a file into the dropzone, or click to browse.
Screenshot: The document upload dropzone with a file selected.
Click Add. The file uploads, the backend extracts text, and training begins.
Step 2: Upload many at once
Select multiple files in the file browser (Cmd/Ctrl-click on macOS / Windows). Each becomes its own source — they show up as separate rows so you can manage them independently.
Filename becomes the source title. Rename later by editing the source.
Step 3: Verify extraction
After status reaches trained, click the source row. You’ll see:
- Filename
- Page count (PDFs) or word count
- A preview of the first extracted text — your sanity check that the document came through readably
Step 4: Update a document
Documents are immutable in storage — to update, delete the old source and upload the new file. There’s no in-place edit (unlike Snippets and articles).
Auto-retraining for documents (New)
When you turn on Auto-retraining for a document source, Hilal Chatbot stores a SHA-256 content hash on upload. If you re-upload the same filename later, the system compares hashes and re-trains only when the content actually changed — saving quota.
Troubleshooting
- “No text extracted.” Likely a scanned/image PDF. Run through an OCR tool (Adobe Acrobat, ABBYY, or open-source
ocrmypdf) first. - Tables look garbled. Complex multi-column or nested tables don’t always extract cleanly. Convert to a simpler format or save the table as a CSV/text export.
- File too large. Split into sections at chapter boundaries.