From GPT Dependency to Custom AI: How SupportXDR Validated a Smarter, Cost-Effective Security LLM on AWS

Client

TrueIT LLC (SupportXDR)

Location

West Fargo, North Dakota

Industry

IT Services / Managed Security / AI-Powered Cybersecurity

Services & Tech

AWS Bedrock AWS Lambda AWS Step Functions Amazon S3 Amazon RDS Amazon SageMaker Claude Sonnet 4.5 Claude Haiku 4.5 Llama 3.3 70B DeepSeek R1 Hugging Face TRL LoRA/QLoRA Python Flask Supabase

Project Overview

TrueIT LLC, the company behind SupportXDR, an AI-powered cybersecurity platform, needed to determine whether AWS Bedrock foundation models could match or surpass the performance of OpenAI’s GPT for automated security incident analysis. Locked into a costly commercial AI dependency, SupportXDR partnered with Avahi to design and execute a rigorous multi-model benchmarking framework, followed by fine-tuning a custom security LLM using LoRA/QLoRA techniques. The result was a reproducible, defensible model evaluation framework spanning six AI models and five performance dimensions, giving SupportXDR the validated evidence it needed to accelerate its AWS Marketplace go-to-market strategy.

About The
 Customer

TrueIT LLC is a West Fargo, North Dakota-based managed IT and cybersecurity services firm recognized on the MSP 500 list. Their flagship product, SupportXDR, is an AI-powered security operations platform built around the AgentX IR incident response engine. With roughly 35–51 employees and approximately $12M in revenue, TrueIT LLC serves enterprise customers who rely on SupportXDR to automate threat detection, investigation, and response at scale. The company operates at the intersection of managed security services and applied AI, making model quality, cost, and reliability mission-critical concerns.

The 
Problem

SupportXDR’s AgentX IR platform was built on OpenAI’s GPT models to power automated cybersecurity incident analysis. While effective, this dependency carried a growing set of risks: rising API costs, limited control over model behavior, and no clear path toward a proprietary, domain-optimized AI capability. As SupportXDR began positioning itself for the AWS Marketplace, staying on a third-party commercial AI stack posed a strategic bottleneck.

The core question leadership needed answered was straightforward but technically demanding: could AWS Bedrock models, including frontier options like Claude and Llama, perform at GPT-level quality on real security investigation tasks, and at a lower total cost? Without a rigorous, apples-to-apples evaluation, any migration decision would be guesswork. A poor model choice deployed in production could erode the accuracy, reasoning quality, and hallucination resistance that enterprise security customers depend on.

Beyond model selection, SupportXDR’s longer-term ambition was to build and commercialize a fine-tuned security LLM tailored to their specific incident taxonomy and response workflows. Without validated results demonstrating that a custom model could outperform general-purpose commercial alternatives, there was no credible path to productizing that capability or presenting it as a differentiator to enterprise buyers.

Left unaddressed, SupportXDR would remain locked into OpenAI with escalating costs, no validated AWS alternative, and no foundation for building or commercializing a proprietary security AI model, stalling their AWS Marketplace go-to-market strategy entirely.

Why AWS

AWS provided the ideal infrastructure for both the benchmarking and fine-tuning phases of this engagement. AWS Bedrock offered direct access to a curated roster of high-performance foundation models — including Anthropic’s Claude family and Meta’s Llama — through a single, unified API, eliminating the overhead of managing disparate model endpoints. This made it possible to run consistent, controlled comparisons across models within a single cloud environment.

Beyond model access, AWS’s broader service ecosystem — Lambda, Step Functions, S3, RDS, and SageMaker — enabled Avahi to build a fully automated, scalable evaluation and fine-tuning pipeline without stitching together third-party tools. Running the entire workload within SupportXDR’s own AWS environment also addressed data sensitivity concerns, keeping security incident data off third-party infrastructure and within a governed, auditable cloud boundary.

Why TrueIT LLC Chose Avahi

Avahi brought a rare combination of AWS technical depth and applied AI expertise that made them the right partner for an engagement this specialized. Designing a credible LLM benchmarking framework, one rigorous enough to support internal migration decisions and external customer proof points, required more than cloud architecture skills. It demanded expertise in evaluation methodology, fine-tuning techniques, and the ability to translate model performance data into actionable business strategy.

What further distinguished Avahi was their ability to structure the engagement as two sequential, methodologically linked phases sharing the same evaluation framework. This approach enabled true longitudinal model comparison, a structure rarely seen in partner engagements, and produced results that were reproducible, defensible, and directly usable as go-to-market evidence. SupportXDR’s return for a second engagement is a direct reflection of the confidence Avahi earned in the first.

Solution

Avahi designed and executed a two-phase AI evaluation and model development program, with each phase building directly on the last.

  • Phase 1Multi-Model Benchmarking Avahi architected an automated benchmarking pipeline using AWS Lambda and Step Functions as a prompt dispatcher, routing standardized cybersecurity incident scenarios across five foundation models simultaneously. Model outputs were stored in S3 and RDS, then scored using AWS SageMaker with BLEU and cosine similarity metrics. A custom Flask UI provided a clear, side-by-side comparison interface, making results accessible to both technical and non-technical stakeholders. The evaluation rubric scored each model across five dimensions, factual accuracy, incident understanding, actionability, clarity, and hallucination avoidance, on a 1–5 scale using an LLM-as-judge methodology. 
  • Phase 2Custom Security LLM Fine-Tuning Building directly on Phase 1’s framework, Avahi applied LoRA/QLoRA parameter-efficient fine-tuning techniques to adapt a foundation model to SupportXDR’s specific cybersecurity incident taxonomy. Using a curated dataset of 50–100 validated security incident examples with ground truth outputs, this approach avoided full model retraining — compressing what would typically require months of ML work into a four-week delivery window. The fine-tuned model was then benchmarked against the same six commercial models using the identical prompts, rubric, and scenarios from Phase 1, enabling a direct, longitudinal comparison.
    A critical risk in Phase 2 was data quality. The initial dataset provided contained only 11–12 JSON incident examples, an insufficient for reliable fine-tuning. Avahi mitigated this by establishing clean, validated security incident data with ground truth outputs as a hard prerequisite before work began, and framing outcomes as directional benchmarks rather than guaranteed performance thresholds.
    The full six-model benchmark spanned Claude Sonnet 4.5, Claude Haiku 4.5, GPT-OSS 20B and 120B, Llama 3.3 70B, and DeepSeek R1, with cost-per-token analysis layered alongside quality scoring to give SupportXDR a complete picture for migration and commercialization decisions.
    All workloads ran within SupportXDR’s own AWS environment, ensuring that sensitive security incident data, including CNC and exploit case files, never left a governed cloud boundary.

Key Deliverables

  • Automated multi-model benchmarking pipeline (AWS Lambda + Step Functions)
  • Flask-based testing UI for 5-model side-by-side comparison
  • SageMaker-powered scoring pipeline (BLEU, cosine similarity)
  • Model comparison report with cost/token analysis
  • LoRA/QLoRA fine-tuned security LLM trained on 50–100 domain-specific incident examples
  • Multi-dimensional scoring matrix (5 criteria, 6 models, 1–5 rubric)
  • LLM-as-judge evaluation methodology and rubric
  • Final benchmarking report and executive summary
  • Recommended model migration roadmap for AWS Marketplace positioning

Project
 Impact

Avahi delivered a reproducible, multi-dimensional benchmarking framework that gave SupportXDR defensible, data-backed evidence to support both internal migration decisions and external customer-facing proof points. By validating that a fine-tuned security LLM could compete with and, in targeted dimensions, outperform commercial foundation models, the engagement removed the primary technical and strategic blocker standing between SupportXDR and a credible AWS Marketplace go-to-market motion.

The framework’s reusability is among its most durable outcomes. Because Phase 1 and Phase 2 shared identical prompts, scenarios, and scoring rubrics, the results are directly comparable across time — a rare capability in partner engagements that SupportXDR can continue to leverage as models evolve and their platform scales.

 

Outcome Highlights:

  • Six foundation models benchmarked across five scored performance dimensions in a single, unified evaluation framework
  • LoRA/QLoRA fine-tuning completed in approximately 4 weeks, a process that typically requires months of ML development
  • Fine-tuning achieved with as few as 50–100 domain-specific security incident examples, demonstrating high data efficiency
  • Cost-per-token analysis was produced across all six models, enabling direct ROI comparison against existing OpenAI spend
  • All sensitive security incident data processed entirely within SupportXDR’s own AWS environment — zero third-party data exposure
  • SupportXDR returned for a second engagement, validating the quality and business value of Phase 1 outcomes

Ready to Transform Your Business with AI?

Let’s explore your high-impact AI opportunities together in a complimentary session