Upcoming Event: The AI Agent Implementation Framework
Primary Health
San Francisco, California
Healthcare Technology
Amazon Textract, AWS Bedrock (Claude Sonnet), AWS Lambda, Amazon API Gateway, Python
Primary Health is a healthcare technology company whose cloud-based platform enables testing, vaccinations, and preventive care programs across the United States. The company’s nurses were manually entering newborn screening (NBS) cards into Electronic Health Record (EHR) systems, a time-consuming, error-prone process that diverted clinical staff from patient care. Avahi delivered a Python-based OCR transcription script combining Amazon Textract with AWS Bedrock for GenAI refinement, automating NBS card data extraction. The solution achieved greater than 95% extraction accuracy, surpassing Primary Health’s prior benchmark of 92–94%, and was delivered as both a REST API endpoint and a CLI tool with a configurable field mapping system for long-term adaptability.
Primary Health is a healthcare technology company headquartered in San Francisco, California. Its cloud-based platform delivers digital health management, data interoperability, and analytics solutions to public health agencies, school districts, community-based organizations, and health systems. The company’s mission is to stop the spread of diseases and reduce illness severity by providing affordable diagnostics and preventive care at scale. Its software streamlines program administration, scheduling, patient communications, and test results reporting, freeing healthcare staff to focus on delivering care.
Primary Health’s nurses were manually entering newborn screening (NBS) cards into their EHR systems. This manual process was time-consuming, error-prone, and diverted clinical staff from patient care. The challenge was compounded by extreme form variability: multiple NBS form types existed with no standard format across states and facilities. Field names, layouts, and optional fields varied widely across variants.
Primary Health had previously attempted to automate this process using AWS Bedrock Data Automation combined with Google Document AI, achieving only 92–94% accuracy. This fell short of the standard required for medical documents, where data integrity is paramount. The company’s CTO emphasized that accuracy was the top concern, errors in extraction would erode trust in the system and create an additional burden for clinical staff who would need to manually verify every result.
Further complicating the challenge, the available training data consisted of printed pictures of actual forms rather than originals, introducing blurriness and noise. Real newborn screening cards are medical documents that are difficult to procure for development purposes, forcing any solution to operate within significant data quality constraints.
Primary Health was already building on AWS and had explored AWS Bedrock Data Automation as part of its initial automation attempts. Amazon Textract provided the OCR extraction capability, and AWS Bedrock offered GenAI refinement for ambiguous fields. AWS Lambda and Amazon API Gateway provided a lightweight deployment model that aligned with the customer’s preference for a simple, script-based solution that their Ruby application could call directly.
Both Amazon Textract and AWS Bedrock provide the ability to opt out of AI training on customer data, an important requirement for processing medical documents. AWS’s breadth of AI and ML services made it the natural platform for combining OCR extraction with GenAI refinement in a single script.
Primary Health’s previous automation attempts had plateaued at 92–94% accuracy using a single-technology approach. The company needed a partner with expertise in AI and machine learning on AWS who could push past the accuracy ceiling that off-the-shelf solutions had not been able to overcome.
Avahi proposed combining Amazon Textract for OCR extraction with AWS Bedrock for GenAI refinement, an approach Primary Health had not previously attempted. Avahi also demonstrated the ability to adapt mid-engagement when the client requested a scope adjustment, delivering a streamlined script-based solution within a compressed timeline without compromising accuracy targets.
Avahi delivered a Python-based OCR transcription script deployed to AWS Lambda and exposed via Amazon API Gateway as a REST API endpoint, allowing Primary Health’s Ruby application to call it directly. The script also runs locally as a CLI tool for development and testing.
The script uses Amazon Textract as its OCR engine, extracting text and field-level data with per-field confidence scores from NBS forms. Textract proved resilient to the low-quality training data, photographed printouts with blurriness and noise, performing well without preprocessing. The team explored computer vision techniques (deskew, binarization, denoising, contrast enhancement) but validation confirmed Textract produced sufficient quality without them, so the preprocessing pipeline was delivered as an optional standalone component.
For ambiguous or missing fields, such as determining whether “Last Name” refers to the baby or the guardian, or disambiguating specimen type/source fields, the script optionally calls AWS Bedrock (Claude Sonnet) for GenAI refinement. The LLM uses the extracted data and field relationships to resolve ambiguities that pure OCR cannot handle, improving accuracy on handwritten text and non-standard layouts. This refinement is toggled via a flag, so the LLM is only invoked when needed, keeping per-image processing costs low.
To handle the ongoing variability of NBS forms, the script uses a configuration file with fuzzy matching for field name mapping. When a new form variant appears, Primary Health can add field name mappings and modify the output schema by editing the config file, no code changes or vendor support required. The script outputs structured JSON aligned to Primary Health’s schema, including per-field and document-level confidence scores, enabling their backend to apply business logic for downstream EHR integration. It supports JPEG, PNG, and PDF inputs, with automatic compression for large files when using the CLI.
Primary Health originally scoped a broader engagement that included infrastructure, a human review interface, workflow orchestration, and multiple backend APIs. In late November 2025, the customer requested a scope adjustment to accelerate delivery. The project pivoted to script-only delivery, reducing the engagement by 139 hours (42% from the original scope) while preserving all core OCR functionality and accuracy targets. The final demonstration occurred December 22, 2025, and the complete codebase was transferred to Primary Health’s GitHub organization.
The combined Textract and Bedrock approach achieved greater than 95% extraction accuracy on tested documents, surpassing Primary Health’s prior benchmark of 92–94% from their AWS Bedrock Data Automation and Google Document AI implementation. The solution automates the manual entry of newborn screening cards that nurses were previously inputting into EHR systems, freeing clinical staff to focus on patient care.
The config-based field mapping provides long-term maintainability, enabling Primary Health to independently adapt to new NBS form types as they emerge. The streamlined scope reduced the engagement by 139 hours (42% from the original scope) while delivering the core extraction capability the customer needed. The complete codebase was transferred to Primary Health’s GitHub organization, enabling self-hosting and customization.
Let’s explore your high-impact AI opportunities together in a complimentary session