Skip to main content

Image AI

Overview

Image AI is an advanced feature designed to enhance the capability of Large Language Models (LLM) by integrating visual context into the text-based processing. With Image AI, users can obtain image recommendations that are relevant to the LLM's textual response, providing a multimodal experience. This document outlines how to utilize the Image AI feature to match images with text-based answers.

Use Case

Image AI is particularly useful when a user requires a visual representation or confirmation of concepts discussed within the LLM's answers. For example, if the LLM describes a historical event, Image AI can provide relevant images to give a visual context to the description.

How It Works

Upon receiving a query and the LLM's answer, Image AI analyzes the available images using their alt text and src attributes. It then selects the most relevant image, based on the best match.

The image shows a user interface for an Image AI feature within a web application. It includes a checked checkbox labeled and guidelines for image selection based on textual relevance, and fields for the system's response and image data.

  1. Image Match AI:

    • When enabled, this feature allows the system to display related images alongside bot responses. This can enhance the interaction by providing visual context to the textual information provided by the bot. Note that additional costs may apply when this feature is enabled.
  2. Image Description AI:

    • When enabled, this feature allows for automatic description generation for images using OpenAI Vision as they are ingested. This helps in creating detailed descriptions of images to enhance understanding and accessibility. Additional costs may also apply for this feature.
  3. Minimum Image Width:

    • This setting allows the user to define the smallest width (in pixels) an image must have to be processed. Images narrower than this threshold will be ignored. The default setting is 100 pixels, which helps in filtering out thumbnails or icons that typically do not contain useful information for analysis.
  4. Minimum Image Size in Bytes:

    • This sets the minimum file size (in bytes) for images to be processed. The default threshold is set at 2000 bytes, which helps in avoiding very low-quality images that are unlikely to contribute valuable insights.
  5. Image Quality:

    • Users can select the desired image quality from three options: Low Resolution, Auto Resolution, and High Resolution.
      • Low Resolution: Processes images at a lower resolution (512px x 512px), using fewer tokens for representation. Suitable for quick overview tasks.
      • Auto Resolution: The AI system decides the best detail level based on the input image size and content complexity. Balances detail and performance automatically.
      • High Resolution: Provides the highest level of detail, consuming more tokens but necessary for fine-grained analysis.
  6. Create Image Descriptions Button:

    • This button triggers the processing of images in documents that were uploaded before the "Enable Image Description AI" feature was activated. Once enabled, newly added images in PDFs and web documents will automatically be processed to generate descriptions. This ensures that images in previously uploaded documents can also benefit from descriptive processing.
Clicking the Create Image Desciption Button

The job will only process images not previously processed.