Upcoming Event: The AI Agent Implementation Framework
Nonstop Health
Concord, CA
FinTech
Lambda, EC2, Xray, CloudWatch, IAM, CloudTrail, Bedrock, DynamoDB, S3, ECR, AWS Connect.
Nonstop Health is a healthcare benefits administration and insurance services company providing affordable, first-dollar coverage for organizations with 50 or more employees. The company’s member support operations were strained by approximately 30,000 annual support tickets and wait times that had reached up to 60 minutes during high-volume periods, while its existing IVR system could handle only a single query type. Avahi designed and delivered an AI-powered voice agent integrated with Amazon Connect, leveraging Amazon Bedrock (Anthropic Claude), AWS Lambda, Amazon DynamoDB, and a ChromaDB vector database on Amazon EC2 to automate routine member inquiries across four call categories. The solution provides members with immediate, natural-language answers to questions about claims, card and account status, substantiation requirements, and general plan information, reducing the burden on live agents and establishing a scalable foundation for ongoing call deflection.
Nonstop Health, headquartered in Concord, California, is a healthcare benefits administration and insurance services provider founded in 2012. The company’s mission is to make high-quality healthcare accessible and affordable, starting with the belief that communities thrive when care is available before and whenever help is needed. Nonstop Health’s first-dollar coverage model reduces employer premiums by an average of 8 to 10% while eliminating employee out-of-pocket healthcare costs. Serving organizations across the country, including nonprofits, school districts, and mid-size employers, Nonstop Health provides comprehensive benefits administration through proprietary member and client portals, real-time claims data and financial reporting tools, and dedicated member support services backed by a pre-loaded Visa benefit card.
Nonstop Health’s member services team handled approximately 30,000 support tickets annually, with the majority concentrated during the peak enrollment period from January through April. During high-volume periods, compounded by staffing constraints, member wait times had reached up to 60 minutes, far above the team’s target of under 60 seconds. The company’s existing IVR system was limited to a single query type, balance checks, and performed this function inaccurately, forcing members to wait for a live agent even for the most routine inquiries.
Member inquiries fell into three primary categories: processing and status of reimbursement claims, verification and substantiation of Visa benefit card transactions, and general account and card questions including balance inquiries, card declines, and replacement requests. A fourth, broader category involved general questions about what Nonstop Health is and how its program works, a frequent point of confusion for members who conflated Nonstop Health with their insurance provider.
Compounding these challenges, the organization lacked a centralized knowledge base. Agents operated as generalists, relying on repeated training sessions and real-time escalations via internal chat to maintain consistency in their responses. Without a structured repository of answers, the pressure to achieve first-call resolution was high, yet the tools to support this goal were inadequate. Left unaddressed, these constraints would continue to erode the member experience during the periods when support was needed most.
Nonstop Health was already operating core infrastructure on AWS, including AWS Lambda for backend processing, Amazon ECS for containerized workloads, Amazon S3 for storage, Amazon CloudFront for content delivery, Amazon Bedrock (Anthropic Claude) for AI-powered responses, and Amazon GuardDuty for security monitoring. Building the voice agent solution on AWS allowed the team to extend this existing investment rather than introduce a separate platform, ensuring seamless integration with the company’s established environment and security posture.
Amazon Connect provided a natural entry point for the voice agent, offering a cloud-native contact center platform with built-in telephony, IVR capabilities, and direct integration with AWS Lambda for real-time AI processing. Combined with Amazon Bedrock for generative AI, Amazon Lex for speech-to-text recognition, and Amazon DynamoDB for serverless conversation storage, AWS offered a predominantly serverless architecture, with Amazon EC2 hosting the ChromaDB vector database, that minimized operational overhead while providing the scalability needed to handle peak enrollment volumes
Nonstop Health selected Avahi as its delivery partner based on Avahi’s deep expertise as an AWS consulting partner specializing in generative AI solutions. Avahi brought direct experience in designing and deploying AI-powered conversational agents on AWS, with a proven methodology for rapid delivery through structured sprint cycles. This was critical for Nonstop Health, which needed a working solution validated by internal teams within a compressed timeline.
Avahi’s approach combined technical depth with hands-on collaboration. Rather than delivering a generic chatbot framework, Avahi worked directly with Nonstop Health’s member services, product, and engineering teams through weekly touchpoints to design a purpose-built solution tailored to the company’s specific member inquiry patterns, authentication requirements, and call flow logic. Avahi also committed to a comprehensive knowledge transfer and post-project support model, ensuring Nonstop Health’s internal team could maintain and evolve the system independently after handoff.
Avahi designed and delivered a member-facing AI voice agent to deflect routine inbound calls and reduce agent workload. The solution focused on four call categories: Claims inquiries, Card and Account questions, Substantiation requests, and General Information. The architecture leveraged AWS serverless and managed services orchestrated through Amazon Connect as the voice interface.
Architecture and Core Components
The system was built around five purpose-built AWS Lambda functions, each handling a distinct responsibility: query classification and response generation (the agent Lambda), member authentication, Amazon Lex bot activation, call counter management, and document embedding processing. Amazon Lex V2 provided speech-to-text conversion at the front of the call flow, while Amazon Connect managed the telephony interface, call routing, and text-to-speech output.
Amazon Bedrock powered both the intelligence and knowledge retrieval layers of the solution. Anthropic Claude served as the large language model for query classification and natural-language response generation, configured with deterministic settings for classification (temperature 0) and moderate creativity for member-facing answers (temperature 0.5). Amazon Bedrock Titan Embeddings V2 generated 1,024-dimensional vector representations of both member queries and knowledge base documents, enabling semantic similarity search.
Knowledge Retrieval and Vector Search
The knowledge base was managed through a ChromaDB vector database deployed on an Amazon EC2 instance within the project’s VPC. An automated data ingestion pipeline processed PDF documents and JSON user profile data uploaded to Amazon S3, extracting text, chunking content into 500-character segments with 50-character overlap for context preservation, generating vector embeddings via Bedrock Titan, and storing them in ChromaDB collections. FAQ documents were stored in a shared collection, while personal member data was isolated in per-user collections following a naming convention that prevented cross-user data access.
When a member asked a question, the agent Lambda generated an embedding of the query, performed a vector similarity search against the appropriate ChromaDB collection (FAQ or personal), retrieved the top results, applied a confidence threshold (L2 distance ≤ 1.9), and passed the most relevant document chunks as context to Claude for response generation. Queries with insufficient confidence scores triggered a transfer to a live agent rather than delivering a low-quality answer.
Authentication and Security
For personal account queries requiring authenticated access, the system collected three data points via voice: date of birth, ZIP code, and last four digits of Social Security Number. The authentication Lambda validated these inputs against stored member records in S3, with up to three attempts allowed before transferring to a live agent. AWS Bedrock Guardrails provided an additional content safety layer, filtering inputs for violence, hate speech, sexual content, misconduct, and prompt injection attempts before queries reached the language model.
The network architecture implemented VPC isolation with public and private subnets across two availability zones. Lambda functions operated in private subnets with NAT gateway access, communicating with the ChromaDB EC2 instance via private IP. VPC endpoints provided private connectivity to AWS services including Bedrock, DynamoDB, CloudWatch, and S3, ensuring sensitive data never traversed the public internet. All data at rest was encrypted using AES-256 across DynamoDB, S3, and CloudWatch Logs.
Conversation Management and Call Flow
Conversation history was stored in Amazon DynamoDB with a configurable 30-day time-to-live, enabling context-aware multi-turn interactions within a single call session. The system maintained the last five interactions per session, allowing Claude to reference prior questions and answers when generating responses. After each answer, the system presented a touch-tone menu giving members the option to ask another question, end the call, transfer to a live agent, or hear the answer repeated.
The classification layer distinguished between general FAQ questions answered directly from the shared knowledge base, personal account questions requiring authentication and member-specific data retrieval, and out-of-scope queries that prompted rephrasing or agent transfer. Prompt engineering was iteratively refined based on end-user testing feedback, including the addition of an acknowledgment prompt (“Let me look into that for you”) to reduce perceived response lag, and adjustment of the voice isolation threshold from 0.6 to 1.0 seconds to better capture natural phrasing.
Implementation Approach
The project was executed across four sprints. Sprint 1 delivered query classification, the data ingestion and embeddings pipeline, and routing Lambda functions. Sprint 2 produced the first working demo, showcasing the full Amazon Connect flow with authentication, FAQ answering, and agent transfer capabilities. Sprint 3 focused on end-user testing conducted with two internal Nonstop Health teams over two weeks, generating iterative feedback that drove prompt refinement and user experience improvements. The final sprint addressed bugs identified during quality assurance, delivered the knowledge transfer session, and completed all documentation and infrastructure-as-code handoffs.
The project successfully delivered a fully functional AI voice agent capable of handling member inquiries across all four targeted call categories, with authentication, knowledge retrieval, and agent transfer capabilities operational and validated by internal teams. The system processes queries with 3 to 4 second model response latency using Anthropic Claude, addressing the organization’s 30,000 annual member support ticket volume. The solution established a scalable foundation for call deflection during high-volume periods, when wait times have historically reached up to 60 minutes, providing members with immediate access to answers for routine questions about claims status, card balances, substantiation requirements, and general plan information.
The project completed with comprehensive testing coverage including pipeline validation, end-user acceptance testing with two internal Nonstop Health teams, and formal quality assurance. The entire delivery included iterative refinement of voice isolation thresholds, query classification logic, and prompt engineering based on direct member services feedback. Post-implementation, Nonstop Health’s technical team received full ownership of all code, documentation, and infrastructure-as-code modules, with a 14-business-day email and Slack-based support window to ensure a smooth transition to independent operations.
Let’s explore your high-impact AI opportunities together in a complimentary session