"Sludge.ai runs a high-volume AI-powered video processing platform, and we needed to understand whether AWS Bedrock models could match the quality of our existing Gemini-based pipeline while meaningfully reducing operational costs. Tech 42 designed and delivered a structured model evaluation framework that tested both our transcript-driven and multimodal processing pathways across Amazon Nova Pro, Nova Lite 2, and Pegasus 1.2.
Using sample videos, transcripts, and validated JSON outputs from our own production workflows, Tech 42 built Jupyter-based evaluation notebooks that replicated our highlight extraction and key frame identification logic end-to-end. Qualitative scoring was conducted using a G-Eval framework with Claude Sonnet as the LLM judge against our own scoring rubric, and the engagement included a detailed cost analysis projecting savings across processing volumes of one thousand, ten thousand, and one million videos compared to our Gemini baseline.
The results were promising enough that we plan to roll Pegasus 1.2 and Nova Lite 2 into parts of our production pipeline. Tech 42 delivered the evaluation within scope, communicated consistently throughout via Slack and weekly check-ins, and conducted a thorough knowledge transfer session to walk us through the findings. It was a smooth engagement and a strong foundation for our path to production."
Vlad Munteanu
CTO