Health Note serves healthcare organizations that depend on responsive, accurate, and compliant patient communication. What makes this platform distinct isn't the scale of the technology. Health Note reimagined what an AI receptionist can be: a system that handles real clinical workflows responsibly, with patient safety and data privacy at its core.
Health Note's mission is to eliminate the administrative burden that contributes to physician burnout. Answering phones, scheduling appointments, verifying patient identity, and triaging call reasons consume thousands of hours of clinical staff time annually. To pursue this mission at scale, Health Note initially relied on VAPI, a third-party voice AI platform. As patient volume and complexity grew, Health Note recognized that proprietary, cloud-native infrastructure was essential. The reliability, cost efficiency, multi-tenancy, and HIPAA-grade security controls that healthcare organizations demand required building their own.
VAPI provided a rapid path to market and helped Health Note validate its AI receptionist strategy. As Health Note’s healthcare use cases matured, the team identified an opportunity to build a more flexible AWS-native foundation with deeper control over performance, observability, tenant-specific workflows, and long-term scalability.
Health Note's decision to migrate to a fully AWS-native architecture opened the door to a fundamentally different class of capability. By owning every layer of the stack—from telephony ingestion to LLM orchestration to database persistence—Health Note could optimize for the specific demands of healthcare voice AI.
Tech 42 designed the migration in deliberate phases: a comprehensive Assessment phase defined the target architecture; a Proof-of-Concept (POC) phase validated core technical feasibility; and the Migration phase transitioned the system from POC to full production deployment across Health Note's healthcare client base.

Figure 1: Current AWS-native, multi-tenant Health Note voice agent high-level design (HLD).
The system is an AWS-native, multi-tenant agent platform. Patient traffic enters through Twilio, Telnyx, or web chat, reaches an internet-facing ALB, and is handled by the FastAPI agent service on ECS. The service resolves assistant and tenant context from DynamoDB, persists LangGraph checkpoints in Amazon RDS, retrieves assistant-scoped knowledge through Amazon Bedrock Knowledge Bases, calls the FastMCP tools service for Healthnote tool execution, and exports traces and logs to Langfuse and CloudWatch.
At the core of the system is a single, highly specialized ReAct agent built on LangGraph and powered by AWS Bedrock. Rather than a generic chatbot, this agent is purpose-engineered for the exact workflows that define clinical front-desk operations.
The agent manages over 30 state fields via a TypedDict-based state system. Patient information collected early in a call (like phone number, date of birth, verified identity, selected location) persists across every subsequent tool call. This eliminates redundant questions. In healthcare, re-asking for information erodes patient trust.
The agent exposes 19 specialized tools across local and MCP-backed execution paths, organized across five domains:
Healthcare voice AI has unique UX requirements that differ sharply from general-purpose conversational AI. Patients calling a clinic are often stressed, elderly, or navigating complex insurance and scheduling scenarios. The system implements several production-hardened design patterns to ensure natural, trustworthy phone interactions:
Intelligent filler words: During tool execution—when the agent is querying the EHR or checking appointment availability—the system immediately emits natural verbal fillers ("Hold on a sec while I search," "Give me a minute, let me see what's available") within 50–100ms of tool invocation. Extended wait fillers ("Still working on that," "Almost there") play every 5 seconds for long-running operations. Fillers are cancelled immediately upon the first streaming token from the LLM or upon user speech, ensuring no awkward overlap.
Low-latency design target: The system is designed around a sub-300ms first-token target for voice experiences, using async request handling, streaming responses, and short conversational turns.
Multi-language capable design: The agent and telephony configuration are designed to support multiple languages and voices, with actual production language coverage determined by the selected telephony provider, STT/TTS model support, and configured voice settings.
Voice-optimized output formatting: Phone numbers are read digit-by-digit; dates are pronounced in natural language rather than ISO format—small but meaningful details that distinguish a professional AI receptionist from a generic system.
The platform's versatile architecture facilitates diverse patient engagement points by using ingress pathways that recognize specific channels and applying tailored prompt overlays:
Healthcare deployments require more than encryption—they require architectural decisions at every layer that prevent inadvertent exposure of Protected Health Information (PHI). The system implements several HIPAA-conscious design principles:
One of the Migration phase's most significant deliverables is a production-grade CI/CD pipeline that enables Health Note's team to deploy, roll back, and manage the AWS infrastructure independently after knowledge transfer.
The pipeline uses GitHub Actions with AWS IAM OIDC federation, eliminating the need for long-lived AWS access keys stored as GitHub secrets. Instead, GitHub Actions assumes short-lived IAM roles scoped to each environment:
Infrastructure is managed as code via Terraform, with separate live roots per AWS account, ensuring environment isolation and enabling repeatable, auditable infrastructure changes.
Langfuse V3 is deployed on EKS within Health Note's AWS environment, providing a self-hosted LLM observability platform with security controls aligned to HIPAA requirements. Every patient call generates a complete Langfuse trace capturing:
This observability layer feeds directly into the LLM-as-judge evaluation framework also implemented during the Migration phase, enabling automated quality monitoring of agent responses against clinical accuracy benchmarks.
The system includes a Bedrock Knowledge Base component for retrieval-augmented generation, supporting clinic FAQ, policy, and support content. The current multi-tenant design uses a shared KB pool with mandatory assistant or tenant metadata filters at retrieval time, so each assistant only retrieves documents assigned to its configured scope. Evaluation artifacts document baseline retrieval latency and response-time methodology.
Through its strategic AWS migration executed by Tech 42, Health Note moved from a vendor-dependent POC toward a production-ready, fully owned AI Medical Receptionist platform. The migration delivers:
"The migration from VAPI to AWS is designed to give us greater flexibility and ownership over the AI receptionist’s core intelligence layer, from conversation flow and algorithm design to tool-call orchestration. As this foundation comes together, it will help us iterate faster, tune performance, and adapt to clients’ workflow, security, and scalability needs." - Aaron Rau
Health Note is a leading pre-visit clinical intake automation platform that uses AI to transform healthcare workflows and reduce physician burnout. With a comprehensive suite of solutions including AI-powered intake management, medical receptionist services, and clinical documentation automation, Health Note serves healthcare organizations seeking to streamline administrative workflows while improving patient experience through seamless EHR integration. The company recently secured $17M in Series A funding led by SignalFire, validating its strong market traction and the critical need for intelligent automation in healthcare administration.




