A Deep Dive into Data Flow and Transformation: Hybrid RAG System in Action Using AWS Serverless Architecture

Efficiently managing massive datasets while ensuring fast, accurate, and context-aware insights is critical. One of the most innovative solutions emerging in this space is the Hybrid Retrieval-Augmented Generation (RAG) system, which combines retrieval-based AI with generative AI models, enhanced by a Reinforcement Learning from Human Feedback (RLHF) loop. This system not only retrieves data but also generates human-readable insights, continuously improving as it receives feedback from users.

In this article, we will dive into how such a system works, focusing on the data flow and the transformations that occur at each stage. To make this relatable for developers, we’ll show how the process can be set up in an AWS Serverless environment using services like Amazon S3, AWS SageMaker, and pre-trained models from Cohere or Anthropic. Along the way, we’ll use real-world business examples and demonstrate how these components integrate into a pipeline that you could prototype in environments like Google Colab or AWS.

Scenario: Financial Analysis Query

Consider a financial analyst working for a global energy company. The analyst needs to generate a report comparing the company’s Q3 2024 revenue with Q3 2023 revenue and assess current market trends. The query might look like:

Compare the revenue growth in Q3 2024 against Q3 2023 for the energy sector, considering market trends.

We’ll walk through how this query is processed within the Hybrid RAG system, starting from data ingestion to the final fact-checked and refined response.

Step 1: Document Ingestion – Using AWS S3 and SageMaker

The first step is to ingest documents—these could be financial reports, PDFs, spreadsheets, or market trend documents—into an Amazon S3 bucket for storage. As the documents are ingested, metadata is extracted (e.g., document title, date) and embeddings are generated using AWS SageMaker-hosted models like Cohere’s NLP embeddings or Anthropic’s foundation models.

Here are some example financial documents:

revenue_2023_q3.csv:

 Region,Revenue,Costs,Net_Profit
 North America,500000,200000,300000
 Europe,450000,190000,260000
 Asia,300000,150000,150000

revenue_2024_q3.xlsx:

 | Region        | Revenue | Costs  | Net_Profit |
 |---------------|---------|--------|------------|
 | North America | 550000  | 220000 | 330000     |
 | Europe        | 500000  | 210000 | 290000     |
 | Asia          | 320000  | 160000 | 160000     |

market_report_energy_2024.pdf:
- “The energy sector has seen a 5% overall growth in Q3 2024 due to an increase in demand for renewable energy sources. However, increased competition from new entrants may impact long-term growth.”

The AWS Glue service can be used for metadata extraction, and the content of these documents is transformed into embeddings using models hosted on AWS SageMaker. For example, you could use Cohere’s large-scale text embedding model to convert document content into a vectorized form.

Tech Insight:

By leveraging AWS SageMaker and models from Cohere or Anthropic, embeddings can be created efficiently, allowing for semantic matching of documents and queries later in the pipeline.

Simulated Example in AWS SageMaker (Pseudo-Code):

import boto3
import sagemaker
from sagemaker import get_execution_role

role = get_execution_role()
sagemaker_session = sagemaker.Session()

# Using a Cohere embedding model hosted on SageMaker
embedder = sagemaker.model.Model(
    role=role,
    image_uri="cohere-model-image",
    sagemaker_session=sagemaker_session
)

# Sample document to be converted to embedding
document = "North America revenue 500000, costs 200000, net profit 300000"
embedding = embedder.predict([document])

Business Impact:

Automating document ingestion and embedding generation allows businesses to handle vast datasets without manual overhead. Leveraging AWS’s scalable infrastructure ensures that this process remains cost-effective as data grows.

Step 2: Pre-processing – Query to Embedding Transformation

Next, the system needs to pre-process the user’s query:
"Compare the revenue growth in Q3 2024 against Q3 2023 for the energy sector, considering market trends."

Just like the documents, this query is transformed into an embedding using the same model in AWS SageMaker. By converting the query into an embedding, the system can perform a semantic search that finds relevant documents even if they don’t contain the exact phrasing of the query.

Simulated Example in AWS SageMaker (Pseudo-Code):

# Query to embedding transformation using Cohere on AWS SageMaker
query = "Compare the revenue growth in Q3 2024 against Q3 2023 for the energy sector."
query_embedding = embedder.predict([query])

Tech Insight:

Pre-processing converts queries into vector embeddings, enabling semantic search across large datasets. This allows for more accurate retrieval compared to traditional keyword-based search.

Business Impact:

Pre-processing ensures that complex business-specific queries are matched with the most relevant documents, enhancing both speed and precision in retrieving critical insights.

Step 3: Hybrid Retrieval – Powered by Amazon OpenSearch and Vector Databases

At this stage, the system performs Hybrid Retrieval, combining text-based retrieval from Amazon OpenSearch with vector-based retrieval (using the embeddings generated earlier). Vector search allows the system to retrieve semantically relevant documents based on the query embedding, while OpenSearch can retrieve documents based on keywords.

For our financial analysis, the system retrieves:

revenue_2023_q3.csv
revenue_2024_q3.xlsx
market_report_energy_2024.pdf

Simulated Example in AWS (Pseudo-Code):

from opensearchpy import OpenSearch

# OpenSearch retrieval (text-based)
client = OpenSearch(hosts=[{'host': 'search-domain', 'port': 443}])
query_text = "Q3 2024 energy sector revenue"
response = client.search(index="documents", body={"query": {"match": {"content": query_text}}})

# Vector-based retrieval (semantic)
vector_response = cosine_similarity(query_embedding, document_embeddings)

Tech Insight:

Hybrid retrieval ensures the system balances traditional keyword search with deep semantic understanding, improving accuracy and efficiency in pulling relevant documents.

Business Impact:

The hybrid approach speeds up document retrieval and ensures that analysts get the most contextually relevant data for decision-making, without manually sifting through hundreds of documents.

Step 4: Response Generation – Utilizing Generative Models in SageMaker

Once the system retrieves the relevant documents, it uses generative models like Anthropic’s Claude or Cohere’s command model hosted on AWS SageMaker to synthesize a coherent response. For example, the system could generate:

“The revenue in Q3 2024 increased by 10% compared to Q3 2023 in the energy sector. Market conditions show a 5% growth in demand for renewable energy, but competition may impact future growth.”

These models are designed to analyze and summarize complex datasets, providing natural language explanations of trends and comparisons.

Simulated Example in AWS SageMaker (Pseudo-Code):

# Generating a report using Anthropic's Claude model
response_model = sagemaker.model.Model(
    role=role,
    image_uri="anthropic-claude-model-image",
    sagemaker_session=sagemaker_session
)

context = """
Q3 2024 revenue: North America: 550000, Europe: 500000, Asia: 320000.
Q3 2023 revenue: North America: 500000, Europe: 450000, Asia: 300000.
Market trends show 5% overall growth in demand for renewable energy.
"""
generated_report = response_model.predict([context])

Tech Insight:

Using models from Anthropic or Cohere, the system generates contextually aware responses, merging data points into coherent summaries.

Business Impact:

This enables financial analysts to quickly understand complex data without manually reading and comparing multiple reports. It saves time, allowing decision-makers to focus on strategic action.

Step 5: Fact-Checking – Ensuring Data Integrity Using AWS APIs

Before presenting the final response, the system runs a Fact-Checking process. It cross-references the generated response with external APIs or internal data sources to verify its accuracy. For example, it might verify revenue growth or market trend data using a trusted financial data provider API, such as MarketData API.

Simulated Example in AWS (Pseudo-Code):

# Fact-checking using an external API (pseudo code)
import requests

response = requests.get("https://api.marketdata.com/energy-sector?quarter=2024Q3")
market_data = response.json()

# Verifying market trends
assert "5% growth" in market_data['summary']

Tech Insight:

Integrating fact-check ing APIs ensures that the data provided is accurate, reducing the risk of misinformation or errors.

Business Impact:

Fact-checking builds trust in the system, ensuring that business-critical decisions are made based on verified and reliable information.

Step 6: RLHF Feedback Loop – Continuous Learning and Optimization

Finally, the system uses Reinforcement Learning from Human Feedback (RLHF) to continuously improve. For instance, if users find the response helpful, their feedback will be used to fine-tune the model's parameters, making future responses more precise.

This feedback loop can be implemented in AWS SageMaker using reinforcement learning algorithms that adjust the model based on user input.

Simulated Example in AWS SageMaker (Pseudo-Code):

from sagemaker.rl import RLEstimator

# Training the RLHF loop using SageMaker's RL estimator
rl_estimator = RLEstimator(
    entry_point="train_rlhf.py",
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    sagemaker_session=sagemaker_session
)
rl_estimator.fit(inputs)

Tech Insight:

RLHF enables continuous improvement of the system, ensuring that it adapts to changing business needs and user preferences over time.

Business Impact:

Continuous learning reduces the need for manual updates, lowers the total cost of ownership (TCO), and improves the relevance of future responses, enhancing long-term business value.

Transforming Data into Insights with AWS-Powered Hybrid RAG Systems

By walking through this financial analysis scenario, we’ve seen how data is ingested, transformed, retrieved, and synthesized into actionable insights within a Hybrid RAG system powered by AWS Serverless architecture. Each step—from document ingestion with S3 and SageMaker to fact-checking and RLHF feedback loops—ensures that businesses receive fast, accurate, and relevant insights tailored to their needs.

For developers and data scientists, tools like AWS SageMaker provide scalable solutions that can handle real-time query generation and retrieval, offering everything from semantic search to generative AI models like Anthropic or Cohere. By leveraging these services, businesses can enhance decision-making, reduce manual work, and minimize operational costs.

Architekt

Prohledat tento blog