Unlock Powerful AI Capabilities with Qwen-Agent: Function Calling, Code Interpreter, and RAG

Unlock powerful AI capabilities with Qwen-Agent, an open-source multi-agent framework that integrates Qwen 2 LLM for function calling, code interpretation, and retrieval augmented generation. Discover how it outperforms RAG and native long-context models.

July 27, 2024

Unlock the power of AI with Qwen-Agent, a cutting-edge multi-agent framework that seamlessly integrates the advanced Qwen 2 large language model. Discover how this framework's capabilities, including function calling, code interpretation, and retrieval-augmented generation, can elevate your AI-driven projects to new heights.

Powerful Multi-Agent Framework: Function Calling, Code Interpreter, and RAG
Generating Data for Training New Long Context Quin Models
Building the Agent: Three Levels of Complexity
Retrieval Augmented Generation (RAG)
Chunk by Chunk Reading
Step-by-Step Reasoning with Tool Calling Agents
Experiments and Performance Improvements
Getting Started with Quin Agent

Powerful Multi-Agent Framework: Function Calling, Code Interpreter, and RAG

The Quen Agent is a new and advanced AI agent framework built on top of the Quen 2 large language model. It integrates several powerful capabilities, including function calling, code interpreter, retrieval augmented generation (RAG), and a Chrome extension.

This framework aims to create sophisticated AI agents that can outperform other multi-agent systems. One of the key features of the Quen Agent is its ability to handle complex tasks with a large context size. The framework has been able to understand documents with up to 1 million tokens, surpassing the performance of RAG and native long-context models.

The Quen Agent uses a four-step approach to generalize the large language model from an 8K context size to a million-token context:

Initial Model: The framework starts with a weak 8K context chat model.
Agent Development: The model is used to build a relatively strong agent capable of handling the 1 million-token context.
Data Synthesis: The agent is used to synthesize high-quality fine-tuning data, with automated filtering to ensure quality.
Model Fine-tuning: The synthetic data is used to fine-tune a pre-trained model, resulting in a strong 1 million-token chatbot.

The Quen Agent's capabilities are organized into three levels of complexity:

Retrieval Augmented Generation (RAG): This is a simple approach that processes 1 million-token contexts, dividing them into shorter chunks and retaining the most relevant ones within the 8K context.
Chunk-by-Chunk Reading: This brute-force strategy checks each 512-token chunk for relevance to the query, retrieves the most relevant chunks, and generates the final answer.
Step-by-Step Reasoning: This approach uses multi-hop reasoning and tool-calling agents to answer complex questions that require understanding across multiple steps.

The Quen Agent's impressive performance and its ability to handle long-context tasks make it a powerful open-source AI agent framework. Developers can get started with the Quen Agent by installing the framework from the Pi website and following the available tutorials to deploy their own agents and utilize the Quen 2 large language model.

Generating Data for Training New Long Context Quin Models

The Quin agent was used to generate data for training new long context Quin models. This is a significant achievement, as preparing sufficiently long fine-tuning data has been a challenge in the research on large language models that can natively process sequences of millions of tokens.

The approach used by the Quin agent involves a four-step process:

Initial Model: The process starts with a weak 8K context chat model as the initial model.
Agent Development: In this phase, the Quin agent is used to build a relatively strong agent capable of handling 1 million context.
Data Synthesis: The agent is then used to synthesize the fine-tuning data, with automated filtering to ensure quality.
Model Fine-tuning: Finally, the synthetic data is used to fine-tune a pre-trained model, resulting in a strong 1 million context chatbot.

This approach leverages the capabilities of the Quin agent to overcome the challenge of data preparation for training large language models with long contexts. By using the agent to generate high-quality synthetic data, the researchers were able to fine-tune a model that can effectively process sequences of up to 1 million tokens, surpassing the performance of traditional approaches like RAG and native long-context models.

The success of this approach highlights the power of the Quin agent framework and its ability to enable the development of advanced AI systems that can handle complex tasks and long-form content.

Building the Agent: Three Levels of Complexity

The agent build consists of three levels of complexity, each built upon the previous one:

Retrieval Augmented Generation:
- This is a simple approach that processes a 1 million context length.
- It uses the RAG (Retrieval Augmented Generation) algorithm.
- It divides the context into shorter chunks, each not exceeding 512 tokens.
- It retains only the most relevant chunks within the 8K context.
- It has three sub-steps:
  - Separate instruction and information: Distinguishes between the instruction and non-instruction parts of the user query.
  - Extract keywords: Deduces multilingual keywords from the informational part of the query.
  - Retrieve relevant chunks: Uses the BM25 algorithm to locate the most relevant chunks.
Chunk-by-Chunk Reading:
- This approach addresses the limitations of the RAG approach, which can miss relevant chunks if they don't match a keyword in the query.
- It includes three steps:
  - Assess relevance: A model checks each 512-token chunk for relevance to the query.
  - Retrieve chunks: The relevant sentences are used to retrieve the most relevant chunks within the 8K context limit, using the BM25 algorithm.
  - Generate answer: The final answer is generated based on the retrieved context, similar to the RAG method.
Step-by-Step Reasoning:
- This approach is used for document-based question answering, where multi-hop reasoning is required.
- It utilizes tool-calling agents, which have multiple types of tools, such as "Ask the LV3 agent a question", "Sub-questions", "Update memory", and more.
- This approach allows the model to increase the context to 1 million tokens and improve the quality of various functionalities.

The experiments show that the Quin Agent is able to significantly improve the quality of context length and performance compared to other RAG-based models.

Retrieval Augmented Generation (RAG)

The first level of the agent build consists of a Retrieval Augmented Generation (RAG) approach. This is a simple approach that has been seen many times before. It processes a 1 million context length and uses the RAG algorithm.

The process involves:

Dividing the Context: The context is divided into shorter chunks, with each chunk not exceeding 512 tokens.
Retaining Relevant Chunks: Only the most relevant chunks within the 8K context are retained.
Separate Instruction Transformation: A separate information instruction is used to distinguish between the instruction and the non-instruction parts of the user queries. For example, transforming the query "You should reply in 2,000 words and it should be detailed as possible. My question is when were bicycles invented?" into a prompt structure.
Keyword Extraction: The model is able to deduce multilingual keywords from the informational part of the query.
Relevant Chunk Retrieval: The BM25 algorithm, a traditional keyword-based retrieval method, is used to locate the most relevant chunks.

This RAG approach is fast, but it can miss relevant chunks if they don't match a keyword in the query.

Chunk by Chunk Reading

The second level of the agent build is the "Chunk by Chunk Reading" approach. The researchers found that the initial RAG (Retrieval Augmented Generation) approach was quite fast, but it could miss relevant chunks if they didn't match a keyword in the query. To address this, they introduced a more brute-force strategy with three steps:

Access Relevance: A model that checks each 512-token chunk for its relevance to the query.
Retrieval of Chunks: The relevant sentences from the query are used to retrieve the most relevant chunks within the 8K context limit, using the BM25 algorithm.
Answer Generation: The final answer is generated based on the retrieved context, similar to the RAG method.

This Chunk by Chunk Reading approach is more thorough in ensuring that relevant information is not missed, even if it doesn't match the exact keywords in the query. By checking each chunk individually and then retrieving the most relevant ones, the agent can build a more comprehensive understanding of the context to generate a high-quality answer.

Step-by-Step Reasoning with Tool Calling Agents

In the Quen Agent framework, the step-by-step reasoning approach is used to address the challenge of document-based question answering, where the model needs to perform multi-hop reasoning to arrive at the correct answer.

The key aspects of this approach are:

Multiple Tool Agents: The framework utilizes multiple specialized tool agents, such as "Ask the LV3 Agent a Question", "Sub-Questions", "Update Memory", and others. These agents can be called upon to perform specific reasoning steps.
Iterative Reasoning: The agent starts with the initial question and breaks it down into sub-questions. It then calls the appropriate tool agents to gather the necessary information, update its internal memory, and finally generate the answer.
Context Expansion: By leveraging the tool agents, the agent is able to expand the context beyond the initial 8K token limit, allowing it to handle questions that require information from a larger document corpus.

This step-by-step reasoning approach enables the Quen Agent to tackle complex, multi-hop questions that would be challenging for traditional retrieval-augmented generation models. The ability to call specialized tools and perform iterative reasoning allows the agent to break down the problem, gather relevant information, and arrive at a more accurate and comprehensive answer.

Experiments and Performance Improvements

The Quin agent framework has demonstrated impressive capabilities in handling complex tasks with long-context inputs. Through a series of experiments, the developers have showcased the significant performance improvements achieved by this new agent framework.

One of the key advancements is the ability to generalize the large language model from an 8K context size to a million-token context. This was accomplished by utilizing the Quin agent's multi-level approach, which includes retrieval-augmented generation, chunk-by-chunk reading, and step-by-step reasoning.

The experiments have shown that the Quin agent can outperform traditional RAG (Retrieval-Augmented Generation) algorithms and native long-context models in various capabilities. This includes the quality of the generated responses, the ability to understand and reason about long-form documents, and the overall performance on document-based question-answering tasks.

Furthermore, the Quin agent was used to generate high-quality training data for new long-context Quin models, further enhancing the capabilities of the underlying language model. This approach of leveraging the agent framework to synthesize fine-tuning data has proven to be a valuable strategy in advancing the state-of-the-art in large language models.

The detailed results and comparisons of the Quin agent's performance can be found in the accompanying blog post, which is linked in the description below. This resource provides a deeper dive into the technical aspects and the specific improvements achieved by this new agent framework.

Overall, the Quin agent represents a significant advancement in the field of multi-agent systems and their ability to handle complex, long-form tasks. Developers and researchers interested in exploring the capabilities of this framework are encouraged to refer to the provided resources and tutorials to get started.

Getting Started with Quin Agent

Hey what is up guys, welcome back to another YouTube video at the World of AI. In today's video, we're going to be taking a look at Quin Agent, a new framework built on the Quin 2 large language model. This framework integrates advanced capabilities like function calling, code interpreter, retrieval augments generation, as well as a Chrome extension.

To get started with Quin Agent, you'll first need to go to the Pi website, which I'll leave a link to in the description below. From there, you can install the agent framework onto your desktop. Once you have it installed, you can then start preparing the model services and deploying your own agents using the tutorials they provide.

One of the key features of Quin Agent is its ability to utilize the new Quin 2 model, which is the purpose of this video. This new model is incredibly powerful and is considered the best open-source AI agent framework available. It can handle complex tasks quite well, and what's really impressive is that they were able to generalize the large language model from an 8K context to a million tokens, surpassing the performance of RAG and native long-context models.

To get started with the new Quin 2 model, you can follow the tutorials on the Pi website. They have a lot of great resources that will showcase what you can do with this new framework. I definitely recommend that you check it out, as it's a game-changer in the world of AI agent development.

So, if you're interested in exploring the capabilities of Quin Agent and the new Quin 2 model, be sure to head over to the Pi website, install the framework, and start experimenting. It's a powerful tool that can help you create sophisticated AI agents that can tackle complex tasks with ease.

FAQ

What is Qwen-Agent?

What are the key capabilities of Qwen-Agent?

How did Qwen-Agent achieve the ability to process 1 million token contexts?

What are the different levels of complexity in the Qwen-Agent framework?

How can I get started with Qwen-Agent?