Skip to main content

Content Ingestion

Overview

The Documents section serves as the central hub for all content that your AI Agent uses to answer user queries. Content from CMS connectors can also be viewed here. By uploading and managing comprehensive documentation here, you equip your Agent to deliver accurate, relevant, and helpful responses to customers, prospects, and employees.

How Content Ingestion Works

When you add content, the ingestion process automatically:

  • Extracts images and generates detailed descriptions for use by Image AI.
  • Splits documents into chunks for vectorization—creating embeddings, text, and metadata (including associated images).
  • Enhances understanding for web content with JSON-LD structured data, when present.

It also captures per-page context (title, URL, language, and page metadata) so Answer AI can cite accurate sources and Image AI can match images found on the same pages your answer came from.

Best Practice for Websites: Use a CMS Connector

Connect your CMS directly for automated, ongoing sync of your site’s content. Enable CMS connectors in organization settings, then select the connector in your Agent configuration.

ai12z Documents tab showing options to upload files, add URLs, or ingest entire websites for AI knowledge base.

Ways to Add Content

Adding Files

adding a file

  • Click Add a new file to upload documents (e.g., product guides, PDFs, sales decks).
  • Supported file types include: .pdf, .docx, .pptx, .xlsx, .csv, .txt, .json, .markdown, .md, and more.
JSON and CSV have to be certain formats. Review documentation for uploading JSON and CSV files

Adding URLs

  • Use Add URL to ingest specific web pages or resources (including YouTube videos and public documents).
  • Paste the URL and the content will be fetched and added to your knowledge base.

Adding Entire Websites

  • Select Add Website to crawl and ingest a complete site—ideal for corporate sites, knowledge bases, and large catalogs.
  • The system will automatically discover and pull in all linked, relevant pages.

During website ingestion, ai12z detects page languages and stores a per-language breakdown. You can view this distribution later on the Settings tab for the document.

Adding and Managing Documents

  • In the Documents section, click Add Document and choose whether to add a file, URL, or website.
  • Each uploaded document appears in the list with its name, description, and last modified date.
  • Use the search box to quickly filter documents by title, URL, or content preview.
  • Use the Action menu (three dots) for each document to:
    • Info: See document details (type, size, upload date, etc.).
    • Edit: Update the title and description
    • Continue Ingest: Complete ingestion steps if you enabled features like histogram analysis.
    • Sync: Check for changes and update ingested websites automatically.
    • Delete: Remove obsolete or irrelevant content.

Processing Multi‑Step Ingestions (Show Histogram enabled)

If you enable Show Histogram, website ingestion runs in stages:

  1. ai12z generates a site map and prepares page statistics.
  2. You can adjust IncludePattern / ExcludePattern filters and languages.
  3. Click Continue Ingest to proceed with splitting and vectorization.
  4. Email notifications let you know when the next step is ready and when the ingestion fully completes.

Document Status & Insights

Every document or asset has a Document Information panel with:

  • Basic details (IDs, upload status, last sync).
  • Tabs for Vector Documents (all processed chunks stored in the vector DB) and Settings (ingestion rules, language filters, etc.).
  • A source Link back to the original URL (for websites) and a Delete Document action.

Information tab

Document status and vector tab UI, showing document info, settings, and processed vector chunks.


Vector tab

Vector tab with columns for Title, Description, Page Content, URL, Word Count, and action menu.

Vector Actions

  • Use the search input to find vector chunks by title, URL, or text.
  • Select rows to perform bulk actions like Delete Selected or Export.
  • Per row actions: Edit and Delete. Ability to edit a vector doc

Settings tab

Settings tab with include/exclude patterns and selected languages.

The Settings tab summarizes ingestion controls and telemetry, including:

  • ExcludePattern / IncludePattern
  • ForceCrawl and crawl statistics (Pages, Posts)
  • SiteMapMeta (e.g., scrape mode used such as wordpress_api)
  • Histogram selector (if enabled)
  • WebPageMetaData with times, token counts, pages processed, and request type
  • Languages chart showing detected language distribution across pages

Best Practices for Document Management

  • Relevance: Only upload content that aligns with the questions your Agent will receive.
  • Organization: Tag and categorize documents for easier management and retrieval.
  • Keep Updated: Regularly sync and update content to ensure users always receive the latest information.
  • Review Frequently: Use the Info action to verify content types and details; remove outdated or duplicate files.

A well-maintained Documents section ensures your AI Agent can always access up-to-date, high-quality information—improving accuracy and user satisfaction.