Optimizing generative AI with RAG

Overview

ABSTRACT

Information retrieval covers many applications ranging from document retrieval using Boolean queries to the generation and extraction of precise answers to questions in natural language. It applies to text, images, and audio, and can be interactive, in the form of dialogues with a conversational agent. This article focuses on the crossing of information retrieval and generative AI, known as augmented (response) generation through information retrieval (RAG). RAG assists in the generation of responses from a large language model and information sources that may be private. Large language models and RAG architectures are presented (agentic RAG, GraphRAG, etc.), as are the many strategies that can be followed. This coupling between neural machine learning, natural language processing, and traditional information retrieval requires a rethinking of data search, indexing, and storage processes using data warehouses, APIs, and ad hoc software environments. Although they remain imperfect in certain situations and are rarely easy to deploy, these solutions are now mature. They are discussed from a scientific and technological perspective.

Read this article from a comprehensive knowledge base, updated and supplemented with articles reviewed by scientific committees.

Read the article

AUTHOR

Patrice BELLOT: University Professor - Aix-Marseille University, CNRS, Polytech, Marseille, France

INTRODUCTION

Without generative AI and assisted response generation, information retrieval can be carried out using search engines which, based on unstructured documents, respond to queries only with lists of documents, or by querying databases, which require complex and costly organization of source data. RAG (Retrieval Augmented Generation) is a third approach, integrating information retrieval with the major uses of conversational agents.

The years 2010-2020 have seen the emergence of neural machine learning and automatic language processing approaches, which have been rapidly exploited for information search and retrieval and, even more broadly, for knowledge engineering. Large-scale language models help not only to obtain semantic representations of documents, but also, in their generative form, to create fluid, comprehensible answers to complex questions, expressed in the form of "prompts".

Unfortunately, the cost of training large language models limits this operation to a handful of AI players who, in addition to having outstanding computing infrastructures, exploit data that extends far beyond the public Web alone. Even if refining large, pre-trained, freely distributed language models is a less costly operation than full training, and enables adaptation to specialist fields or private data, it is not enough to deploy secure search engines. Indeed, refinement must be kept to a minimum, otherwise the model will be unable to generate comprehensible text. A model, even when refined, remains dependent on its original training data. If you try to answer specific questions using a generative model alone, you run a high risk of obtaining obsolete, erroneous (hallucinations) or confused answers, simply because of the presence of contradictory information in the training data. What's more, when exploited without care, these masses of data reflect major societal biases, and indiscriminately mix opinions and false information.

While not a universal miracle solution, RAG can help reduce the risks outlined above. The main idea is to force the large language model to generate responses whose information comes from a data set pre-selected on the fly according to the query. The general knowledge of the model should only be used to ensure the linguistic competence needed to generate a comprehensible response.

Most of this article is devoted to the description of a RAG system, so as to enable the creation of functional software prototypes, based on a good understanding of theoretical principles, and knowledge of freely available reusable solutions. The solutions described in this article focus on the use of RAG for text documents, but the availability of foundation models and multimodal models ensures transfer to data including oral data,...

You do not have access to this resource.

Exclusive to subscribers. 97% yet to be discovered!

You do not have access to this resource. Click here to request your free trial access!

Already subscribed? Log in!

KEYWORDS

CAN BE ALSO FOUND IN:

Ongoing reading
Optimizing generative AI with RAG

Classic information search

Article included in this offer

"Software technologies and System architectures"

( 227 articles )

Complete knowledge base

Updated and enriched with articles validated by our scientific committees

Services

A set of exclusive tools to complement the resources

View offer details

Bibliography

(1) - AMINI (M.-R.), GAUSSIER (E.) - Recherche d'information : Applications, modèles et algorithmes-Fouille de données, décisionnel et big data. - Éditions Eyrolles (2013).
(2) - ROBERTSON (S.E.), WALKER (S.) - Some simple effective approximations to the 2-poisson...

RAG for Optimizing Generative AI - Response generation from LLMs enhanced by information retrieval

Bibliography

Directory