Case Study

How Tech 42 built Health Note’s agentic AI receptionist on AWS

Driven by results:

~50% lower sampled response-processing latency vs legacy VAPI solution
~20k static FAQ tokens removed from repeated prompt injection
Full platform ownership instead of VAPI control-plane lock-in

How Health Note powers the future of AI medical receptionists using AWS

Health Note serves healthcare organizations that depend on responsive, accurate, and compliant patient communication. What makes this platform distinct isn't the scale of the technology. Health Note reimagined what an AI receptionist can be: a system that handles real clinical workflows responsibly, with patient safety and data privacy at its core.

Health Note's mission is to eliminate the administrative burden that contributes to physician burnout. Answering phones, scheduling appointments, verifying patient identity, and triaging call reasons consume thousands of hours of clinical staff time annually. To pursue this mission at scale, Health Note initially relied on VAPI, a third-party voice AI platform. As patient volume and complexity grew, Health Note recognized that proprietary, cloud-native infrastructure was essential. The reliability, cost efficiency, multi-tenancy, and HIPAA-grade security controls that healthcare organizations demand required building their own.

Challenge: Scaling an AI receptionist beyond a third-party platform

VAPI provided a rapid path to market and helped Health Note validate its AI receptionist strategy. As Health Note’s healthcare use cases matured, the team identified an opportunity to build a more flexible AWS-native foundation with deeper control over performance, observability, tenant-specific workflows, and long-term scalability.

Opportunity: A production-grade AWS foundation

Health Note's decision to migrate to a fully AWS-native architecture opened the door to a fundamentally different class of capability. By owning every layer of the stack—from telephony ingestion to LLM orchestration to database persistence—Health Note could optimize for the specific demands of healthcare voice AI.

Tech 42 designed the migration in deliberate phases: a comprehensive Assessment phase defined the target architecture; a Proof-of-Concept (POC) phase validated core technical feasibility; and the Migration phase transitioned the system from POC to full production deployment across Health Note's healthcare client base.

Solution: Building an AWS-native AI voice agent

Architecture overview

Figure 1: Current AWS-native, multi-tenant Health Note voice agent high-level design (HLD).

The system is an AWS-native, multi-tenant agent platform. Patient traffic enters through Twilio, Telnyx, or web chat, reaches an internet-facing ALB, and is handled by the FastAPI agent service on ECS. The service resolves assistant and tenant context from DynamoDB, persists LangGraph checkpoints in Amazon RDS, retrieves assistant-scoped knowledge through Amazon Bedrock Knowledge Bases, calls the FastMCP tools service for Healthnote tool execution, and exports traces and logs to Langfuse and CloudWatch.

Layer Technology Purpose
Voice Communication Twilio ConversationRelay, Telnyx media streams, SMS webhooks Real-time voice and SMS ingress across telephony providers
API & Orchestration FastAPI (Python 3.11+), async/await Low-latency request handling and LLM streaming
Agent Framework LangGraph (ReAct agent) Stateful, multi-step clinical workflow execution
Foundation Models Configurable AWS Bedrock models (e.g. Claude Haiku 4.5 or Amazon Nova Pro by environment) LLM inference for natural-language understanding, response generation, and tool planning
Session Persistence PostgreSQL on Amazon RDS LangGraph checkpoint storage for conversation continuity
Infrastructure ECS Fargate services + Terraform IaC Containerized FastAPI agent and FastMCP tools services with repeatable deployments
Load Balancing Application Load Balancer (ALB) + ACM + Route53 HTTPS termination and traffic distribution
Secrets Management AWS Secrets Manager Secure injection of API keys, database credentials, and Langfuse tokens
Observability Langfuse V3 on EKS + Amazon CloudWatch LLM traces, token usage, tool performance, logs, and operational metrics
CI/CD GitHub Actions + AWS IAM OIDC Automated deployments to dev, staging, and production with no long-lived credentials
Control Plane / Multi-tenancy Amazon DynamoDB Assistant, number, API key, tool, and KB configuration lookup
Tool Execution Boundary FastMCP Tools Service on ECS MCP-backed execution of Healthnote tool APIs with signed assistant context
Knowledge Base Amazon Bedrock Knowledge Bases, Amazon S3, Amazon OpenSearch Serverless Assistant-scoped RAG retrieval using mandatory metadata filters

The AI receptionist agent

At the core of the system is a single, highly specialized ReAct agent built on LangGraph and powered by AWS Bedrock. Rather than a generic chatbot, this agent is purpose-engineered for the exact workflows that define clinical front-desk operations.

The agent manages over 30 state fields via a TypedDict-based state system. Patient information collected early in a call (like phone number, date of birth, verified identity, selected location) persists across every subsequent tool call. This eliminates redundant questions. In healthcare, re-asking for information erodes patient trust.

The agent exposes 19 specialized tools across local and MCP-backed execution paths, organized across five domains:

  • Patient Management: Caller ID extraction, EHR patient matching with fuzzy logic and duplicate detection, new patient registration, and date-of-birth verification
  • Call Reason Triage: LLM-powered intent classification, contextual follow-up question generation, and call reason confirmation before booking
  • Appointment Management: Real-time availability lookup, new appointment booking, rescheduling with slot availability checking, and appointment cancellation with reason collection
  • Call Control and Knowledge Retrieval: Intelligent human transfer with conversation context handoff, plus assistant-scoped knowledge-base search for clinic FAQ, policy, and support content.
  • MCP Tools Service: A separate FastMCP service runs as an ECS service and exposes Healthnote tool capabilities over the MCP protocol. The agent injects signed assistant and tenant claims into each MCP tool call so the tools service can enforce tenant context before forwarding to Healthnote APIs.

Voice-First Design for Healthcare

Healthcare voice AI has unique UX requirements that differ sharply from general-purpose conversational AI. Patients calling a clinic are often stressed, elderly, or navigating complex insurance and scheduling scenarios. The system implements several production-hardened design patterns to ensure natural, trustworthy phone interactions:

Intelligent filler words: During tool execution—when the agent is querying the EHR or checking appointment availability—the system immediately emits natural verbal fillers ("Hold on a sec while I search," "Give me a minute, let me see what's available") within 50–100ms of tool invocation. Extended wait fillers ("Still working on that," "Almost there") play every 5 seconds for long-running operations. Fillers are cancelled immediately upon the first streaming token from the LLM or upon user speech, ensuring no awkward overlap.

Low-latency design target: The system is designed around a sub-300ms first-token target for voice experiences, using async request handling, streaming responses, and short conversational turns. 

Multi-language capable design: The agent and telephony configuration are designed to support multiple languages and voices, with actual production language coverage determined by the selected telephony provider, STT/TTS model support, and configured voice settings.

Voice-optimized output formatting: Phone numbers are read digit-by-digit; dates are pronounced in natural language rather than ISO format—small but meaningful details that distinguish a professional AI receptionist from a generic system.

Multi-Channel Support

The platform's versatile architecture facilitates diverse patient engagement points by using ingress pathways that recognize specific channels and applying tailored prompt overlays:

  • Voice Ingestion: Synchronous telephony via Telnyx media streams and Twilio ConversationRelay, with traffic directed through dedicated /api/v1/twilio/relay and /api/v1/telnyx/stream endpoints.
  • Web-Based Chat: Interactive SSE streaming delivered through a React/Vite frontend interface. Security and tenant segregation are maintained via assistant-scoped authentication at the /api/v1/web/chat/stream route.
  • SMS Messaging: Leveraging /api/v1/twilio/sms and /api/v1/telnyx/sms, the system repurposes the core agent graph while executing SMS-optimized logic and routing based on destination phone numbers.

HIPAA-conscious security architecture

Healthcare deployments require more than encryption—they require architectural decisions at every layer that prevent inadvertent exposure of Protected Health Information (PHI). The system implements several HIPAA-conscious design principles:

  • PHI-conscious logging and tracing: Logging and tracing are structured to minimize sensitive data exposure, redact known identifiers where possible, and keep secrets out of application logs.
  • Secrets Manager injection: All sensitive configuration—EHR API keys, database credentials, Langfuse tokens—is injected at ECS task startup via AWS Secrets Manager, never stored in environment files or container images
  • Scoped tool authentication: Health Note tool API access is mediated through configured API key or bearer-token credentials, secret references, signed MCP service claims, and assistant context headers so downstream services can route tenant-specific behavior.
  • LLM observability via Langfuse: Conversation sessions generate LangGraph/LangChain traces in Langfuse with assistant, tenant, channel, token, latency, and tool-call metadata to support quality review, cost analysis, and operational debugging.

CI/CD and Deployment Automation

One of the Migration phase's most significant deliverables is a production-grade CI/CD pipeline that enables Health Note's team to deploy, roll back, and manage the AWS infrastructure independently after knowledge transfer.

The pipeline uses GitHub Actions with AWS IAM OIDC federation, eliminating the need for long-lived AWS access keys stored as GitHub secrets. Instead, GitHub Actions assumes short-lived IAM roles scoped to each environment:

  • CI (ci.yml): Runs backend tests, frontend build checks, MCP import validation, and Terraform validation for the active environment roots.
  • Dev deploy (deploy-dev.yml): Deploys the FastAPI agent service and FastMCP tools service to dev from the dev branch after validation.
  • Stage deploy (deploy-stage.yml): Provides the stage promotion path for the FastAPI and MCP services when Health Note applies the stage/prod infrastructure.
  • Prod deploy (deploy-prod.yml): Provides the production promotion path for the same services with environment-specific Terraform configuration and protected deployment controls.
  • Manual rollback: rollback-fastapi.yml and rollback-mcp.yml deploy previous image tags by git SHA for break-glass recovery of either service

Infrastructure is managed as code via Terraform, with separate live roots per AWS account, ensuring environment isolation and enabling repeatable, auditable infrastructure changes.

Observability with Langfuse V3

Langfuse V3 is deployed on EKS within Health Note's AWS environment, providing a self-hosted LLM observability platform with security controls aligned to HIPAA requirements. Every patient call generates a complete Langfuse trace capturing:

  • End-to-end conversation latency: Time from patient utterance to agent first response
  • Token usage: Input and output token counts per Bedrock model call, enabling cost tracking and anomaly detection
  • Tool execution frequency and success rates: Which local or MCP-backed tools are called, how often, and whether they succeed or return errors.
  • Conversation and graph visibility: Langfuse captures the nested LangGraph/LangChain execution tree, while Amazon RDS stores LangGraph checkpoints for session continuity and debugging.

This observability layer feeds directly into the LLM-as-judge evaluation framework also implemented during the Migration phase, enabling automated quality monitoring of agent responses against clinical accuracy benchmarks.

RAG Performance and Knowledge Base

The system includes a Bedrock Knowledge Base component for retrieval-augmented generation, supporting clinic FAQ, policy, and support content. The current multi-tenant design uses a shared KB pool with mandatory assistant or tenant metadata filters at retrieval time, so each assistant only retrieves documents assigned to its configured scope. Evaluation artifacts document baseline retrieval latency and response-time methodology.

Outcome: Pushing the boundaries of healthcare voice AI

Through its strategic AWS migration executed by Tech 42, Health Note moved from a vendor-dependent POC toward a production-ready, fully owned AI Medical Receptionist platform. The migration delivers:

  • ~50% lower sampled response-processing latency vs legacy VAPI
  •  ~50–60% lower sampled model-token load per turn
  • ~20k static FAQ tokens removed from repeated prompt injection
  •  ~ 32.6% lower p95 RAG latency after KB tuning
  • Full platform ownership instead of VAPI control-plane lock-in

"The migration from VAPI to AWS is designed to give us greater flexibility and ownership over the AI receptionist’s core intelligence layer, from conversation flow and algorithm design to tool-call orchestration. As this foundation comes together, it will help us iterate faster, tune performance, and adapt to clients’ workflow, security, and scalability needs." - Aaron Rau

About Health Note

Health Note is a leading pre-visit clinical intake automation platform that uses AI to transform healthcare workflows and reduce physician burnout. With a comprehensive suite of solutions including AI-powered intake management, medical receptionist services, and clinical documentation automation, Health Note serves healthcare organizations seeking to streamline administrative workflows while improving patient experience through seamless EHR integration. The company recently secured $17M in Series A funding led by SignalFire, validating its strong market traction and the critical need for intelligent automation in healthcare administration.

Industry
Healthcare
Services
AI Agent; AI Infrastructure Architecture
Share
Health Note

Looking for voice agent support?

Learn more

Explore Case Studies

Case Study

Enabling AI self-improvement at scale through LLM fine-tuning pipeline in AWS

learn more
Case Study

AI agent built on AWS delivering time-savings and technical consistency

learn more
Case Study

Slack-integrated AI chatbot on AWS for more accessible company knowledge

learn more
Case Study

Blazing-fast embedding search on AWS: Efficiently handling billions of vectors in biotech

learn more
Case Study

How Tech 42 built Health Note’s agentic AI receptionist on AWS

learn more