Priority Software Moves from Reactive to Proactive Operations with New Relic Observability on AWS

Client

Priority Software

Location
Industry

Cloud-based ERP (SaaS)

Services & Tech

AmazonEC2, Amazon RDS, Amazon ElastiCache, AWS Lambda, Amazon CloudFront, Amazon CloudWatch, Amazon EBS, Amazon S3

Client

Priority Software

Location
Industry

Cloud-based ERP (SaaS)

Services & Tech

AmazonEC2, Amazon RDS, Amazon ElastiCache, AWS Lambda, Amazon CloudFront, Amazon CloudWatch, Amazon EBS, Amazon S3

Project Overview

Priority Software operates a cloud-based ERP platform that serves thousands of business customers globally. As their SaaS footprint grew, their operations team struggled with fragmented visibility, noisy alerts, and slow incident response that often started only after customers reported issues. Avahi, a Premier Tier AWS Partner, deployed New Relic across Priority’s multi-region AWS environment using a phased rollout that improved infrastructure visibility, alert quality, and centralized troubleshooting. The result was faster detection and resolution, fewer false alarms, improved uptime, and measurable cost and efficiency gains.

About The
 Customer

Priority Software runs a cloud-based ERP platform for business customers worldwide, supporting a large-scale SaaS operation with multiple environments (development, staging, and production) and a broad AWS footprint spanning multiple regions.

The 
Problem

As Priority’s platform scaled, their operational model became increasingly reactive. The team often learned about issues from customer support tickets, which meant incidents were already impacting end users before engineering could respond.

They also lacked a unified view across development, staging, and production. Critical signals were spread across systems, making it difficult to quickly understand what was happening, where it was happening, and whether it was isolated or systemic.

Alerting was another major blocker. CloudWatch alarms were noisy and frequently ignored, creating alert fatigue that increased the risk of missing real incidents. Capacity planning was similarly reactive, which led to emergency scaling events and unnecessary stress on the team.

Finally, post-incident analysis was slow and inconsistent because logs were scattered. Without centralized logging and correlation between metrics and logs, the team spent too much time reconstructing timelines and pinpointing root causes. At Priority’s scale (127 EC2 instances, 18 RDS databases, and 12 Application Load Balancers across three AWS regions), these gaps compounded quickly and threatened reliability, customer trust, and operational efficiency.

Why AWS

Priority’s ERP platform ran on AWS to support a global customer base with a multi-region architecture and a mix of managed and infrastructure services, including Amazon EC2, Amazon RDS, Application Load Balancers, AWS Lambda, Amazon ElastiCache, and Amazon CloudFront. AWS provided the flexibility to scale resources as demand changed, while maintaining consistent deployment patterns across environments.

AWS also enabled deep operational telemetry through native service metrics and logs, which made it possible to implement a comprehensive observability strategy that could span compute, databases, networking, and delivery layers across development, staging, and production.

Why Priority Software Chose Avahi

Priority engaged Avahi when Avahi began managing their AWS infrastructure in Q3 2025, seeking a partner who could improve operational maturity without disrupting production systems. As a Premier Tier AWS Partner, Avahi brought the cloud operations expertise required to standardize monitoring and incident response across a large, distributed AWS footprint.

Avahi was uniquely qualified to lead this effort because the challenge was not simply “adding more monitoring”, it required designing an end-to-end operational approach. That included establishing baselines, implementing tiered alerting to reduce noise, centralizing logs for faster root cause analysis, and integrating alerts directly into Priority’s existing workflows through Jira Service Management and Slack.

Solution

Avahi deployed New Relic across Priority’s AWS footprint using a phased approach to drive adoption quickly, reduce risk, and steadily increase value.

In the first two weeks, Avahi focused on infrastructure visibility. Monitoring agents were installed on all EC2 instances to capture host-level health and performance signals. In parallel, Avahi configured AWS API integration so New Relic could ingest telemetry from managed services across the environment. With this foundation in place, Avahi created initial dashboards tailored for the operations team, providing a single pane of glass across development, staging, and production.

In the following two weeks, Avahi implemented intelligent alerting. Rather than relying on overly sensitive thresholds, Avahi established baseline performance metrics and then configured tiered alerting (Critical, High, Medium) aligned to operational severity. Alerts were integrated with Jira Service Management and Slack, enabling automated ticket creation and real-time notifications so the team could respond quickly and consistently.

Avahi also addressed troubleshooting speed by centralizing logs. Fluent Bit was deployed for log aggregation, and CloudWatch logs were connected into New Relic. This enabled correlation between metrics and logs, letting engineers pivot directly from an alert to the related logs and events for faster diagnosis.

In the final phase, Avahi focused on operational optimization and continuous improvement. Dashboards were expanded to serve multiple stakeholders (executives, operations, and developers), alerts were tuned based on real-world patterns to further reduce noise, and cost optimization opportunities were identified through data-driven usage visibility. The monitoring strategy also reinforced better operational practices, including the importance of tagging and incremental expansion of coverage to avoid overwhelming teams early in adoption.

Primary AWS services in scope included Amazon EC2, Amazon RDS (PostgreSQL and MySQL), Elastic Load Balancing (Application Load Balancers), Amazon ElastiCache, AWS Lambda, Amazon CloudFront, and Amazon CloudWatch (including CloudWatch Logs), with monitoring coverage extending into storage and performance signals such as Amazon EBS and Amazon S3 metrics where applicable.

Key Deliverables

  • New Relic monitoring agents deployed across all EC2 instances
  • AWS API integrations configured for managed service telemetry ingestion
  • Unified dashboards for development, staging, and production visibility
  • Baseline performance metrics established for alert threshold accuracy
  • Tiered alerting policies implemented (Critical, High, Medium)
  • Jira Service Management integration for automated incident ticket creation
  • Slack integration for real-time incident notifications
  • Fluent Bit deployed for centralized log forwarding
  • CloudWatch Logs connected to New Relic for log consolidation
  • Metrics-to-logs correlation enabled for faster root cause analysis
  • Stakeholder-specific dashboards delivered (executives, operations, developers)
  • Ongoing alert tuning process and operational optimization recommendations
  • Cost optimization findings and rightsizing recommendations delivered
  • Three enablement and training sessions for Priority’s team

Project
 Impact

With unified observability in place, Priority shifted from reactive firefighting to proactive operations. Issues were detected within minutes, most incidents were identified before customers noticed, and investigation time dropped sharply due to centralized logs and correlation between telemetry sources. The improvements directly strengthened platform reliability while also delivering tangible efficiency and cost savings for the business.

Metrics

  • Mean Time to Detect improved from 15 to 30 minutes to 2 to 3 minutes (85% faster)
  • Mean Time to Resolve reduced from 2 to 4 hours to 30 to 45 minutes (70% faster)
  • 90% of incidents were caught before customers reported them
  • False alerts decreased from 35% to 8%
  • Monthly downtime reduced from 3.6 hours to 0.57 hours (84% reduction)
  • Uptime improved from 99.5% to 99.92% (+0.42 percentage points)
  • Customer-reported issues decreased from 12 to 15 per month to 6 to 7 per month (50% reduction)
  • Identified $6,000 per month in rightsizing opportunities
  • Detected 23 over-provisioned instances
  • Eliminated 15 hours per week of manual monitoring tasks
  • Reduced time investigating false alarms by 60%
  • NPS improved from 42 to 58

Ready to Transform Your Business with AI?

Let’s explore your high-impact AI opportunities together in a complimentary session

Ready to Transform Your Business with AI?

Let’s explore your high-impact AI opportunities together in a complimentary session

Ready to Transform Your Business with AI?

Book Your Free Ignition AI Workshop

Let’s explore your high-impact AI opportunities together in a complimentary half-day session

View Our Case Studies

See how we’ve delivered measurable results for businesses like yours