AI Workflow
ReAct agentic workflow, Reasoning Engine, Agents and Forms
ReAct - Reasoning and Action
ai12z is built on the ReAct agentic workflow architecture, which centers on a Reasoning Engine. The LLM (serving as a reasoning engine) is guided by a System Prompt, augmented by context which includes:
- A list of Agents, Tools and Forms that are available.
- The complete history of the conversation.
- Additional context from the bot, such as Meta Data that can inlcude language, Geo-location, URL location, ect..
- Overall goals and constraints defined in the system prompt.
- ReAct LLM Creates a Plan, that can be a multi step process, that includes multiple Agents and Form calls to accomplish a goal, before returning a result
When the user asks a question, the LLM uses this context to decide on a plan of action. This plan might involve:
- Calling one or more agents/forms in a specific order.
- Asking the user follow-up or clarifying questions.
The LLM knows which agents/forms are enabled and determines how best to use them to achieve the user’s goal. It may perform multiple planning steps—revising or expanding its plan as needed—until the objective is met.
RAG agent
In the ai12z system, RAG is the default agent that ReAct calls whenever a user’s question cannot be answered by another agent. Even if the ReAct LLM believes the question is irrelevant, it should still forward the query to RAG rather than attempting to answer it itself.
By default, RAG streams its response directly to the client, without returning it to the ReAct LLM. However, there are instances where the ReAct LLM may need RAG to return the data to ReAct instead of streaming it to the client. For example, if ReAct needs to compare multiple items by making several parallel calls to RAG, or if it needs to gather data from both RAG and another agent before analyzing and sending a final response to the client, then ReAct instructs RAG to return the results for further processing rather than streaming them.
Why This Matters
Traditional Bot Frameworks
-
Rigid, predefined responses
Traditional bots typically rely on scripted flows or decision trees. Each possible user request is mapped to a pre-written response. While this approach can handle simple, predictable questions, it becomes challenging to scale or personalize. It’s difficult to create new knowledge and adapt to new scenarios without rebuilding or extensively modifying existing logic. -
Limited contextual understanding
Because the bot is driven by a fixed set of intents and entities, it often fails to grasp broader conversational context—such as previous user queries, session history, or additional background information. As a result, the conversation can feel repetitive or disconnected, and the bot may struggle to maintain continuity or recall prior inputs. -
Requires frequent manual updates, intents, entities, training
With traditional frameworks, any new topics, domains, or expansions to the knowledge base often require fresh datasets or manual coding updates. You must define new intents and entities, then retrain and redeploy the bot, making it time-consuming and resource-heavy to scale to meet new business needs. -
Complex, high-code deployment
Implementing and maintaining these bots often calls for a dedicated development team. You need to account for a wide range of “what if” scenarios in your code, handle error states manually, and build in fallbacks when the bot doesn’t understand a request. This complexity can slow down development cycles and increase operational costs. -
Difficult to expand knowledge base
As new information becomes available or the company’s product line evolves, you must manually integrate this content into the bot’s flow. This process can be cumbersome, especially if the bot’s architecture is not built to accommodate rapid changes or large-scale content updates. -
Struggles with unexpected inputs due to decision tree reliance
Traditional bots often follow linear, step-by-step logic. When a user suddenly changes topics or asks something out of scope, the bot doesn’t know how to pivot or handle the new request. This leads to conversational dead ends and a poor user experience.
Copilots (GenAI + Your Content + Reasoning Engine)
-
Adaptive, real-time conversations
Copilots leverage AI models that can interpret user requests on the fly, adapting to the conversation as it unfolds. Rather than being tied to predefined scripts, they can dynamically generate replies, making each interaction feel more natural and human-like. -
Seamless integration with your content
By connecting directly to your documentation, product catalogs, knowledge bases, or FAQs, Copilots can draw on up-to-date information in real time. This cuts out the need for manual content ingestion or frequent model retraining just to handle new or updated data. -
Advanced decision-making and insights
Because Copilots can process large volumes of text and understand nuanced questions, they excel at more complex tasks like summarizing content, classifying inquiries, or providing detailed insights. They can go beyond simple Q&A to become strategic assistants for both internal teams and end users. -
Agents to connect with 3rd-party services
Copilots can be paired with “agents,” which act as bridges to external APIs or services. This allows them to execute tasks—like booking a meeting or checking inventory—directly from the conversation, creating an interactive and efficient user experience. -
Connectors to sync with CMS and other content sources
With built-in connectors, Copilots can automatically update their knowledge base from a CMS (Content Management System) or other data repositories. This ensures they’re always reflecting the most current information without the need for manual refreshes. -
Fast, low-code deployment
Modern platforms for Copilots often emphasize low-code or no-code solutions, reducing the technical barrier to entry. Businesses can quickly deploy conversational interfaces, experiment with new features, and iterate without exhaustive development resources.
The Components of the System
Drilling into some of the components of the system
1. Query and Image Upload Description
The process begins with the user's input, known as a 'query,' which could include an image. The system needs to process the user's question in the context of the images.
2. Word Replace
Following the 'query,' there's a 'Word Replace' function. This may involve substituting certain words or phrases for various reasons, such as using synonyms for better understanding.
3. Reasoning Engine and Vision AI
When a user uploads images with their query, the AI generates a description of these images and updates the vector query accordingly. This vector query is then used by the vector database to find relevant content.
When the Copilot feature is enabled, the language model reads the list of tools and agents, including their descriptions and the parameters that can be passed to them. The platform supports various out-of-the-box agents, Python function calls, and REST API calls, allowing seamless integration with any third-party service. It reads the available Forms and decides when these should be pushed to the the BOT or Search control.
4. Vector Database (DB)
When Rag only Vision AI, Context AI, and History AI output a modified version of the query called the (vector_query
), which interacts with a 'Vector DB.' When copilot AI is enabled the history and context are done with the Reasoning Engine i.e the Copilot AI. The Vector DB database stores vectorized data representations for more efficient searching and matching. A record consists of vectors, text, content, and metadata. An example of metadata would be the URL related to the source of content or the images related to that content.
5. Rerank
The results from the Vector DB are then passed to a 'Rerank' process, which prioritizes them according to relevance, accuracy, and possibly other metrics to ensure the best matching result is selected. Embeddings do a good job of finding relevant documents, rerank ensures the best ones are first, since in the answer AI you are going to only pass 3-6 documents.
6. Answer AI LLM
The reranked results become part of the prompt for the Answer AI LLM. The prompt can contain other context (see below) such as history data. This AI is responsible for interpreting the reranked results and formulating an appropriate response, guided by the prompt and system prompt for this LLM.
7. Image AI Match
The Image AI Match images to both the query and the answer from the LLM.
8. Form Components (Bot Controls)
The output can either be a rich text bubble or a bot component. The bot components include:
- Validation Text Fields: Fields for validating specific formats such as email addresses and phone numbers.
- Multi-Select: Allows users to select multiple options from a list.
- Single Select: Allows users to select a single option from a list.
- Image Upload: Enables users to upload images as part of their query.
- File Upload: Enables users to upload files in various formats.
- Date Picker: Allows users to select a date from a calendar.
- Time Picker: Allows users to select a specific time.
- Checkboxes: Allows users to select multiple items by checking boxes.
- Radio Buttons: Allows users to select one option from a set of options.
- Text Area: Provides a larger text input field for longer responses or comments.
- Slider: Allows users to select a value from a range by sliding a handle.
These components enhance user interaction by providing a variety of input methods tailored to different types of data and user needs.
System Prompts and Dynamic Tokens
In a Large Language Model (LLM) like OpenAI GPT, a prompt and a system prompt refer to the initial instruction or input provided to the model that sets the context or requests a specific type of response. It acts as a directive for the model, guiding it on what information to generate, how to structure its response, or what kind of task to perform. System prompts are crucial because they can significantly influence the model's outputs, ensuring they are relevant and valuable for the intended application or user query. In the system Prompt, you can insert dynamic tokens that will be replaced with their values.
Dynamic Tokens Supported by Each AI Module
AI accepts a variety of dynamic tokens that allow for a dynamic and responsive interaction with the user. The dynamic tokens will be replaced with the real data. These dynamic tokens include:
{query}
: Directly represents the user's inputted question.{vector-query}
: The vector query is adjusted by the context AI, Vision Ai or the History Ai. The Vector DB needs to retrieve the most relevant information to feed to the Answer AI.{history}
: Captures the conversation's history, which the AI uses to maintain context over the interaction.{title}
: Reflects the webpage's title where the search is taking place, often providing critical context for the query.{origin}
: Indicates the original URL of the webpage, which may be relevant to the search.{language}
: Specifies the language of the page, which is essential to return results in the user's language.{referrer}
: Points to the URL that led the user to the current page, which might affect the user's search intention.{attributes}
: Allows for the insertion of supplementary context, generally injected via JavaScript into the search control.{org_name}
: Denotes the name of the organization for which the AI is configured, which helps customize the AI's functionality to suit the organization's needs.{purpose}
: Outlines the AI bot's intended purpose, aiding the AI in focusing and streamlining the search results.{org_url}
: Represents the organization's domain URL, enabling the bot to give precedence to content from that specific domain.{tz=America/New_York}
: Timezone creates the text of what is the time and date for the LLM to use.{image_upload_description}
: Image descriptions of images uploaded by the site visitor into the search box processed by vision AI.
Enhancing AI Performance with Context
By utilizing these dynamic tokens, the AI modules can be provided with additional context, allowing them to operate at their fullest potential. Including dynamic content ensures that responses are not just based on static programming but are adapted to the user's real-time needs and environment.
These dynamic tokens are crucial for maintaining a nuanced and relevant dialogue between the AI and the user, allowing for a more personalized and effective user experience. Whether retaining the conversation thread through {history}
or providing localized responses via {language}
, these dynamic tokens are indispensable for a sophisticated AI interaction.
Conclusion
This system is designed to provide accurate and context-aware answers by considering the current session's context and the user's interaction history. It represents an advanced approach to managing AI prompts that could be employed in various applications where user interaction and historical data play a significant role in the quality of the AI's responses.