MedSchoolCoach Automates Academic Document Extraction and Student Clustering with AWS Generative AI

Client

MedSchoolCoach

Location

Boston, MA

Industry

Education Technology, Medical Admissions Consulting

Services & Tech

Amazon Bedrock, Amazon S3, Amazon EC2, Amazon SageMaker, AWS Lambda, Amazon API Gateway

Project Overview

MedSchoolCoach empowers pre-med students with personalized one-on-one guidance from physician advisors with admissions committee experience. As the business scaled, the team needed a faster, more consistent way to extract key academic metrics from thousands of incoming student documents and turn them into actionable advising insights. Avahi delivered an AWS-based generative AI workflow that performs OCR on unstructured résumés and transcripts, normalizes GPA and MCAT information, and groups students into academic clusters based on historical success signals. This foundation enables advisors to spend more time on student engagement and deliver more specific, data-informed recommendations.

About the
Customer

MedSchoolCoach is an education technology and medical admissions consulting organization that supports pre-med students through individualized advising, helping them build stronger applications using structured guidance, coaching, and admissions expertise.

The
Problem

MedSchoolCoach receives large volumes of résumés, transcripts, and academic documents every month. These inputs vary widely in structure and quality, including PDFs, scanned images, and screenshots, which makes automated extraction challenging.

The advising team was spending significant time manually extracting GPA, MCAT scores, coursework details, and academic patterns from these documents. This slowed student assessments, created review bottlenecks, and limited how consistently the team could personalize advising at scale.

Without automation, MedSchoolCoach would continue to face rising operational load as document volumes increased, which would limit growth and delay higher-value advising interactions.

Why AWS

MedSchoolCoach selected AWS to support a secure, scalable pipeline for ingesting and processing academic documents while using managed AWS services for model inference and analytics. AWS provided the foundation to store raw and processed documents, run clustering and evaluation workflows, and integrate foundation models through Amazon Bedrock.

Why MedSchoolCoach Chose Avahi

MedSchoolCoach engaged Avahi because of Avahi’s experience delivering AWS-native generative AI systems that combine unstructured data extraction with repeatable analytical workflows. Avahi brought proven delivery patterns for integrating OCR with LLM reasoning, designing measurable validation steps, and packaging the solution behind an API-first backend that can evolve toward production-scale operations.

Solution

  • Avahi designed an end-to-end workflow to transform unstructured student documents into structured academic profiles that could be clustered and analyzed. Documents were ingested and stored in Amazon S3, then processed through an OCR extraction component capable of handling scanned PDFs, photos, and table-heavy transcripts.
  • After extraction, Avahi implemented cleaning and normalization logic to standardize academic signals across inconsistent formats, including multiple GPA conventions and incomplete MCAT fields. Where rules alone were insufficient, Amazon Bedrock foundation models were used to interpret context and produce consistent, structured outputs suitable for downstream analytics.
  • To help MedSchoolCoach move beyond generic advising, Avahi built a clustering framework that groups applicants into cohorts based on critical academic features tied to admissions outcomes. The clustering approach was designed for unsupervised learning and validated through quantitative methods (such as silhouette scoring) combined with MedSchoolCoach stakeholder review.
  • For new applicants, the workflow maps intake data into the most relevant cluster, identifies qualification gaps, and generates targeted recommendations using Amazon Bedrock models. The system was exposed through an AWS-hosted FastAPI backend and deployed on Amazon EC2 for end-to-end testing and demonstration, with clear interfaces for future integration and expansion.

Key Deliverables

– AWS-hosted infrastructure on Amazon EC2 (including Elastic IP) to run the end-to-end pipeline

– Document ingestion and storage in Amazon S3

– OCR extraction pipeline for résumés and transcripts

– Automated parsing and normalization of GPA and MCAT metrics

– Unsupervised student clustering by academic similarity

– Amazon Bedrock integration for LLM-based interpretation and recommendation generation

– FastAPI backend endpoints for workflow execution and results delivery

– Technical documentation, demo handoff, and recommendations to scale to production

Project
Impact

The engagement demonstrated that MedSchoolCoach can automate academic metric extraction and student segmentation workflows that were previously manual and time-intensive. By combining OCR with Bedrock-powered interpretation and clustering, the solution improves processing consistency and increases advisor capacity for higher-value student interactions.

Metrics

  • Greater than 85 percent correctness in GPA and MCAT extraction on the evaluation dataset

We highly recommend Avahi as a reliable and innovative technology partner. Their expertise in cutting-edge technologies was instrumental in building our Proof of Concept (PoC) and developing our Minimum Viable Product (MVP). Avahi consistently delivered high-quality solutions on time while maintaining a collaborative, responsive approach. They went beyond expectations by identifying opportunities for enhancement, ensuring scalability and compliance for our law enforcement-focused products. Avahi is the clear choice if you need a tech partner with industry knowledge, professionalism, and a commitment to innovation.

Brandon Puhlman

Founder, Bravo Foxtrot

Ready to Transform Your Business with AI?

Book Your Free Ignition AI Workshop

Let’s explore your high-impact AI opportunities together in a complimentary half-day session

View Our Case Studies

See how we’ve delivered measurable results for businesses like yours