Content Ingestion
Overview
The Documents section serves as the central hub for all content that your AI Agent uses to answer user queries. By uploading and managing comprehensive documentation here, you equip your Agent to deliver accurate, relevant, and helpful responses to customers, prospects, and employees.
How Content Ingestion Works
When you add content, the ingestion process automatically:
- Extracts images and generates detailed descriptions for use by Image AI.
- Splits documents into chunks for vectorization—creating embeddings, text, and metadata (including associated images).
- Enhances understanding for web content with JSON-LD structured data, when present.
Best Practice for Websites: Use a CMS Connector
Connect your CMS directly for automated, ongoing sync of your site’s content. Enable CMS connectors in organization settings, then select the connector in your Agent configuration.
Ways to Add Content
Adding Files
- Click
Add a new file
to upload documents (e.g., product guides, PDFs, sales decks). - Supported file types include:
.pdf
,.docx
,.pptx
,.xlsx
,.csv
,.txt
,.json
,.markdown
,.md
, and more.
Adding URLs
- Use
Add URL
to ingest specific web pages or resources (including YouTube videos and public documents). - Paste the URL and the content will be fetched and added to your knowledge base.
Adding Entire Websites
- Select
Add Website
to crawl and ingest a complete site—ideal for corporate sites, knowledge bases, and large catalogs. - The system will automatically discover and pull in all linked, relevant pages.
Adding and Managing Documents
- In the Documents section, click
Add Document
and choose whether to add a file, URL, or website. - Each uploaded document appears in the list with its name, description, and last modified date.
- Use the Action menu (three dots) for each document to:
- Info: See document details (type, size, upload date, etc.).
- Continue Ingest: Complete ingestion steps if you enabled features like histogram analysis.
- Sync: Check for changes and update ingested websites automatically.
- Delete: Remove obsolete or irrelevant content.
Processing Multi-Step Ingestions
- For some sites and large documents, ingestion may be multi-step:
- Select
Continue Ingest
when prompted (if, for example, you opted for advanced features like histograms). - Complete the required steps to ensure all content is fully processed and searchable.
- Select
Document Status & Insights
Every document or asset has a Document Information panel with:
- Basic details (IDs, upload status, last sync).
- Tabs for Vector Documents (all processed chunks stored in the vector DB) and Settings (ingestion rules, language filters, etc.).
Best Practices for Document Management
- Relevance: Only upload content that aligns with the questions your Agent will receive.
- Organization: Tag and categorize documents for easier management and retrieval.
- Keep Updated: Regularly sync and update content to ensure users always receive the latest information.
- Review Frequently: Use the Info action to verify content types and details; remove outdated or duplicate files.
A well-maintained Documents section ensures your AI Agent can always access up-to-date, high-quality information—improving accuracy and user satisfaction.