Democratizing Data Access: How Avahi Built a Secure, GenAI-Powered NL2SQL Engine on AWS to Eliminate SQL Bottlenecks

Client

Confidential

Location

Palo Alto, California

Industry

Mental Health Platform / Data Analytics / Internal BI Tools

Services & Tech

Amazon Bedrock (Claude, Nova, Titan Embeddings) · Amazon Athena · Amazon S3 · AWS Glue · AWS Lambda · Amazon ECS Fargate · PostgreSQL + pgvector · Amazon CloudWatch · sqlglot (AST Validation)

Project Overview

A confidential client is a leading online mental health platform serving millions of users worldwide with on-demand emotional support, therapy, and counseling. As the platform scaled, its internal analytics teams were overwhelmed by a critical bottleneck: non-technical business users (product managers, analysts, and operations staff) had no way to independently access the vast datasets stored in the company’s AWS-based data lake without involving a data engineer. Every insight required writing SQL, and that dependency was slowing down decision-making across the organization.

Avahi designed and delivered a production-grade, AI-powered Natural Language to SQL (NL2SQL) engine on AWS, enabling any user to query complex datasets using plain English. The solution combines Amazon Bedrock’s large language models, a Retrieval-Augmented Generation (RAG) architecture, and an AST-based SQL validation layer to produce accurate, cost-optimized, and secure Athena queries, automatically. The result is a fully private, internally deployed system that democratizes data access, reduces engineering overhead, and improves analytics productivity across the organization.

About The  Customer

The client is a leading mental health technology platform that connects individuals seeking emotional support with trained listeners, therapists, and counselors through an accessible, on-demand digital service. With millions of active users globally, the platform operates a data-intensive environment that tracks user engagement, session quality, platform performance, and clinical outcomes. The organization’s internal teams, from product to engineering to operations, depend heavily on data to make informed decisions, making reliable and timely access to analytics a core business need.

The  Problem

The client had amassed a large and growing data lake on AWS S3, powering critical business intelligence across the platform. But the ability to extract value from that data was locked behind a technical barrier: SQL. Every data request — no matter how simple — required routing through a data engineer, creating a persistent bottleneck that delayed insights and stretched engineering capacity thin.

Non-technical users, including product managers and business analysts, had no way to self-serve their data needs. They were forced to context-switch between stakeholder conversations and engineering queues, often waiting hours or days for answers to questions that should have taken minutes. The compounding effect was organizational drag: slower product decisions, reduced operational agility, and a growing gap between data availability and data utilization.

Beyond access, the team faced real risks around query accuracy and cost. Athena bills per data scanned, meaning inefficient or unpruned queries could rapidly inflate cloud costs. There was also a security concern: any system that let users interact with production data infrastructure needed strict guardrails to prevent destructive operations, accidental or otherwise. Without a validated, cost-aware query layer, opening data access to non-engineers carried significant operational risk.

Left unaddressed, these challenges would have continued to strain engineering resources, suppress analytics productivity, and leave valuable data underutilized, limiting client`s ability to make fast, informed decisions at scale.

Why AWS

The customer was already deeply invested in the AWS ecosystem, with its data lake built on Amazon S3 and query workloads running through Amazon Athena. AWS provided the natural foundation for extending that infrastructure into an AI-powered query layer without introducing external dependencies or data egress risk. The breadth of the AWS AI/ML portfolio, particularly Amazon Bedrock’s managed access to frontier language models and Titan embedding models, meant the team could build a sophisticated GenAI system without standing up and maintaining their own model infrastructure.

AWS also gave the solution its most important non-negotiable quality: privacy. With all compute, storage, and AI inference running inside a private VPC — accessible only via VPN through an internal Application Load Balancer — the client could expose natural language data querying to its teams without exposing any data or infrastructure to the public internet. For a mental health platform operating under strict data sensitivity requirements, this architecture was not optional; it was essential.

Why The Customer Chose Avahi

As a premier-tier AWS Partner, Avahi brought the rare combination of deep cloud-native engineering expertise and hands-on experience building production-grade AI systems. The client needed more than a vendor familiar with language models — they needed a partner who could architect an end-to-end GenAI pipeline that met strict requirements around accuracy, security, cost, and scalability. Avahi’s track record of delivering complex AWS solutions with measurable business outcomes made them the right fit for a project with no margin for error.

What set Avahi apart was its ability to anticipate and engineer around the inherent failure modes of LLM-generated SQL. Rather than simply wiring a language model to a query engine, Avahi designed a multi-phase validation and self-correction pipeline that transformed the system from a prototype into a production-ready tool. The team’s expertise in RAG architecture, SQL semantics, and AWS-native deployment gave the client confidence that the solution would be both intelligent and reliable from day one.

Solution

Avahi designed and delivered a fully integrated GenAI NL2SQL engine on AWS that takes a user’s natural language question and returns an accurate, optimized, and explainable SQL result, all within a secure, private infrastructure. The system is built around a multi-phase pipeline that combines language model reasoning, semantic retrieval, AST-based validation, and automatic self-correction to produce reliable results at scale.

The pipeline begins the moment a user submits a natural language query through the chat-based interface. Before reaching any AI component, the query passes through a pre-LLM security screening layer that checks for injection attempts or unsafe intent. Once cleared, the query is vectorized using Amazon Bedrock’s Titan v2 embedding model. These embeddings are used to perform semantic search against a PostgreSQL database with pgvector, which stores schema metadata and curated few-shot SQL examples. This Retrieval-Augmented Generation (RAG) approach gives the language model the precise schema context and query patterns it needs to generate accurate SQL, dramatically reducing hallucination.

With schema and examples retrieved, a structured prompt is assembled that includes the relevant table definitions, column relationships, historical query context from the user’s session, and the few-shot examples. This prompt is passed to a Claude or Nova model via Amazon Bedrock, which generates a SQL query. The generated SQL is not executed immediately; it is first parsed and validated using sqlglot, an AST-based SQL parsing library. This validation step checks for syntactic correctness, enforces SELECT-only read access, and blocks any DDL or DML operations that could alter or damage data. It also verifies that partition pruning is applied to Athena queries, directly controlling data scan costs.

If validation fails, the system does not surface an error to the user — instead, it triggers an automatic self-correction loop. The validation error and the original SQL are fed back to the LLM with additional context, prompting a retry. This loop significantly improves query reliability without requiring human intervention. For queries flagged as potentially high-cost based on estimated scan volume, the system presents a confirmation step to the user before execution, adding a final layer of cost awareness.

Once a validated query is executed against Amazon Athena, the results are returned to the user alongside a natural language explanation of what the query retrieved, closing the loop for non-technical users who may not read raw tabular data fluently. The entire system runs on ECS Fargate, with Flask microservices handling the API layer and an nginx-served single-page application as the frontend. AWS Glue maintains the data catalog, an AWS Lambda function powers the schema ingestion pipeline that keeps embeddings current, and Amazon CloudWatch provides full observability across the stack. All infrastructure is deployed in a private VPC with no public endpoints, accessible exclusively through an internal ALB over VPN.

Key Deliverables

NL2SQL query engine powered by Amazon Bedrock (Claude / Nova models)
RAG-based prompt system with schema-aware, few-shot example retrieval using Titan v2 embeddings and pgvector
AST-based SQL validation engine (sqlglot) with SELECT-only enforcement and DDL/DML blocking
Automated schema ingestion pipeline via AWS Lambda for real-time schema synchronization
Chat-based user interface with session tracking and query history context
Feedback loop mechanism for continuous model and prompt improvement
High-cost query detection and user confirmation workflow
Self-correction LLM retry loop for automatic query repair
Fully private AWS deployment: private VPC, internal ALB, VPN-only access, IAM-based service authorization
Observability stack via Amazon CloudWatch for monitoring, logging, and alerting

Project  Impact

The NL2SQL engine fundamentally changed how the customer interacts with its data. Business users — product managers, analysts, and operations teams — can now query the company’s data lake directly using plain English, without writing a single line of SQL or waiting on engineering. The democratization of data access has removed a persistent bottleneck from the organization’s analytics workflow, freeing data engineers to focus on higher-value work while enabling faster, more autonomous decision-making across teams.

Beyond access, the system delivers measurable improvements in query quality and operational efficiency. The combination of RAG-based schema retrieval, AST validation, and self-correction has significantly reduced query failure rates and the retries that follow them. Partition-aware query generation controls Athena data scan costs by ensuring queries are optimized before execution, directly reducing infrastructure spend. The system is fully production-grade, secured against destructive operations, and built for scale.

Outcomes at a Glance:

Reduced time to generate and retrieve data insights — from multi-hour engineering queues to real-time self-service
Reduced query failures and retries through multi-phase AST validation and LLM self-correction
Lowered Athena data scan costs via partition pruning enforcement built into every generated query
Improved analytics productivity across business, product, and operations teams
Enabled safe, production-grade natural language data access with zero public infrastructure exposure
Scalable architecture applicable to any enterprise with a cloud data lake and non-technical data consumers

Ready to Transform Your Business with AI?

Let’s explore your high-impact AI opportunities together in a complimentary session