Professional Project: Software Engineer·LTM·2022 — 2024

Cloud-Native Premium Calculation Engine

Race Condition Elimination, Parallel Data Fetching, and Redis Caching at 1M+ User Scale

1M+ users · 3,000+ concurrent sessions

Stack

JavaSpring BootRedisCompletableFutureAWS ECS FargateOAuth2 / JWT

Domain

Backend EngineeringConcurrent SystemsPerformance EngineeringInsurance Platform

1M+

Users served

3,000+

Concurrent sessions

Race conditions after refactor

Cloud-Native Premium Calculation Engine: system architecture

Overview

The insurance platform at LTM ran premium calculation as one of its most critical user-facing operations. Every quote, every policy renewal, every mid-term coverage change triggers a premium recalculation, and on a platform serving over one million users, that means thousands of these calculations executing concurrently at peak load.

The engine I inherited had two categories of problems. The first was a correctness problem: under concurrent load, it produced intermittently incorrect results. The second was a performance problem: response latency at high concurrency was high enough to be user-visible. The two problems had different root causes and required different fixes, but they were connected by a common theme: the engine's original design had not been built with concurrency in mind.

I diagnosed both problems through profiling, fixed the correctness issue by eliminating shared mutable state structurally rather than patching it with synchronization, fixed the performance issue through parallelized data fetching and Redis caching with programmatic invalidation, and validated the result under 3,000-concurrent-session JMeter load testing.

The Problem

What the Engine Does

The premium calculation engine computes the insurance premium for a given policy based on a set of inputs: the policyholder's risk profile, the coverage type and level selected, applicable actuarial rate tables, regulatory adjustments by jurisdiction, any discounts or surcharges on the account, and the policy's effective dates. Each of these inputs comes from a different data source (the policy management service, the actuarial data store, the account service, the jurisdiction rules database) and the calculation logic applies them in a defined sequence to produce a final premium amount.

Correctness is the engine's non-negotiable requirement. Premium calculation is not a recommendation or an estimate. It's a financial commitment. The number the engine produces is the number on the policyholder's contract. A race condition that produces an incorrect premium under concurrent load is not a performance problem or a UX problem. It's a billing error on an insurance contract, with downstream consequences that extend well beyond the technical system.

The Correctness Problem: Shared Mutable State

The original engine held intermediate calculation state in shared objects that were not properly synchronized across concurrent requests. Specifically: certain intermediate values (rate table lookups, discount calculations, jurisdiction adjustment factors) were stored as instance variables on Spring-managed singleton service beans rather than as local variables scoped to each request.

In single-threaded or low-concurrency conditions, this worked fine. The calculations ran fast enough that the chance of two requests interleaving through the same shared state was low. As concurrent load increased toward production levels, the interleaving became frequent: Request A would write a jurisdiction adjustment factor to the shared state, Request B would overwrite it with a different value before Request A had finished using it, and Request A would complete its calculation using Request B's factor, producing a premium amount that was incorrect for Request A's policyholder.

The race condition: two concurrent requests write the same instance variable on a singleton bean, and one overwrites the other mid-calculation

The failures were intermittent and non-deterministic, which made them difficult to reproduce in test environments running low concurrency. They were discovered through anomaly detection on production billing data: premium amounts that didn't match the expected calculation for a given policy configuration. Tracing those anomalies back to a concurrency bug in the calculation service took significant investigation.

The Performance Problem: Sequential Data Fetching

The engine's data fetching was entirely sequential. The request handler would fetch the risk profile, then fetch the rate tables, then fetch the account discount data, then fetch the jurisdiction rules, each call waiting for the previous one to complete before initiating. In a blocking I/O model, every one of those network calls parked the thread for its duration.

At 3,000 concurrent sessions, the thread pool was saturated with threads blocked on I/O. New requests queued. Latency climbed. The engine was not CPU-bound (the actual arithmetic was trivial); it was I/O-bound on sequential data fetching that could have been parallelized.

The Investigation

Before writing any fix, the first step was profiling to confirm that the hypotheses about the bottleneck were actually correct. This step mattered more than it might seem.

The initial hypothesis about the performance problem was that the calculation logic itself was the bottleneck: the actuarial formulas involved multiplications across large rate tables and the assumption was that computation was the slow part. A week of profiling under simulated load disproved this. The computation completed in single-digit milliseconds. The rate table lookups, the discount queries, the jurisdiction fetches (the I/O calls) were each taking 50–150ms and executing sequentially. Total wall time for a calculation request under load was 400–600ms, almost entirely I/O wait. The computation was not the bottleneck. The data fetching was.

This was an important correction. Optimizing the computation (the original plan) would have produced negligible improvement because it was not the slow part. Profiling before optimizing is not a best practice platitude in this context. It was the difference between a week of work that accomplished nothing and a week of work that fixed the actual problem.

The correctness investigation followed a different path: comparing production premium outputs against manually computed expected values for policies where anomalies had been flagged, then systematically adding thread-local logging to trace which service instances and shared objects were involved in the erroneous calculations. The shared mutable state in the singleton service bean became visible once the logging showed multiple concurrent requests touching the same instance variables within the same calculation window.

Technical Architecture

Fix 1: Eliminating Race Conditions Through Stateless Request Design

The fix for the shared mutable state problem was structural, not additive. The instinct when encountering a concurrency bug is often to add synchronization: put a lock around the shared state, or use an AtomicReference, or reach for a thread-safe collection. This approach can work but it has costs: contention on the lock becomes a new bottleneck under high concurrency, and synchronized blocks in a calculation engine that needs to scale to thousands of concurrent sessions reintroduce the same throughput problem through a different mechanism.

The better fix was to eliminate the shared state entirely. The redesign introduced a PremiumCalculationContext value object: an immutable data container that holds all inputs, intermediate values, and results for a single calculation request. Every field that had previously lived as an instance variable on a shared service bean now lives as a field on this per-request context object.

Stateless request design: each request carries its own immutable context object through the calculation chain, sharing no mutable state

The calculation service methods were refactored to accept a PremiumCalculationContext as a parameter and return an updated context: functional style, no instance variable reads or writes anywhere in the calculation pipeline. Each request instantiates its own context on entry, passes it through the calculation chain, and the final context carries the result out. No two requests ever share any mutable state. There is nothing to synchronize because there is nothing shared.

// Before — shared mutable state on singleton service
@Service
public class PremiumCalculationService {
    private BigDecimal baseRate;          // shared instance variable
    private BigDecimal jurisdictionFactor; // shared instance variable

    public BigDecimal calculate(PolicyRequest request) {
        this.baseRate = rateTableService.lookup(request.getCoverageType());
        this.jurisdictionFactor = jurisdictionService.getFactor(request.getJurisdiction());
        // ... concurrent requests clobber each other here
        return this.baseRate.multiply(this.jurisdictionFactor);
    }
}

// After — isolated value object per request
@Service
public class PremiumCalculationService {
    public PremiumCalculationContext calculate(PolicyRequest request) {
        PremiumCalculationContext ctx = PremiumCalculationContext.forRequest(request);
        ctx = applyBaseRate(ctx);
        ctx = applyJurisdictionFactor(ctx);
        // ... each request operates on its own isolated context
        return ctx;
    }
}

The race conditions were structurally impossible after this refactor. The same inputs, passed to the calculation engine from concurrent requests, always produce the same output: guaranteed by the absence of any shared mutable state, not by synchronization logic that could be incorrectly applied.

Fix 2: Parallelizing Independent Data Fetches with CompletableFuture

With the calculation logic now operating on a per-request context object, the data fetching that populates that context became the clear performance target. The four primary data sources (risk profile, rate tables, account discounts, jurisdiction rules) are independent of each other. There is no data dependency between them: fetching the rate tables does not require the risk profile to have been fetched first, and so on. Sequential fetching was imposing artificial wait time that served no purpose.

The fix parallelized all four fetches using Java's CompletableFuture framework:

public PremiumCalculationContext fetchInputsParallel(PolicyRequest request) {
    CompletableFuture<RiskProfile> riskFuture =
        CompletableFuture.supplyAsync(() -> riskProfileService.fetch(request.getPolicyId()), executor);

    CompletableFuture<RateTable> rateFuture =
        CompletableFuture.supplyAsync(() -> rateTableService.lookup(request.getCoverageType()), executor);

    CompletableFuture<DiscountSummary> discountFuture =
        CompletableFuture.supplyAsync(() -> accountService.getDiscounts(request.getAccountId()), executor);

    CompletableFuture<JurisdictionFactor> jurisdictionFuture =
        CompletableFuture.supplyAsync(() -> jurisdictionService.getFactor(request.getJurisdiction()), executor);

    CompletableFuture.allOf(riskFuture, rateFuture, discountFuture, jurisdictionFuture).join();

    return PremiumCalculationContext.builder()
        .riskProfile(riskFuture.join())
        .rateTable(rateFuture.join())
        .discounts(discountFuture.join())
        .jurisdictionFactor(jurisdictionFuture.join())
        .build();
}

Sequential vs parallel fetching: four independent fetches run sequentially sum their durations; run in parallel with CompletableFuture they take only the longest

All four fetches initiate simultaneously and the engine waits for all of them to complete with allOf(...).join() before proceeding to the calculation phase. Wall time for the data fetching phase drops from the sum of the four sequential fetch times to approximately the maximum of the four: the slowest individual fetch now determines the total data-fetch latency rather than the sum of all four.

The CompletableFuture calls run on a dedicated bounded executor thread pool rather than the common fork-join pool, so the parallelized fetches don't compete for threads with other application work. Pool size was tuned based on the profiling data: sized to allow sufficient parallelism at peak concurrency without creating excessive thread contention.

Fix 3: Redis Caching with Programmatic Invalidation

Several of the data sources feeding the engine change infrequently relative to request volume. Rate tables are updated on actuarial review cycles: typically monthly or quarterly, not continuously. Jurisdiction rules change when regulations change: infrequently and on a known schedule. Fetching these from their source services on every calculation request was redundant work that added latency for no benefit.

Redis caching was added for the stable data sources (rate tables and jurisdiction factors) using Spring's @Cacheable abstraction backed by a Redis instance. Cache keys are structured to scope cached values correctly:

rate-table:{coverageType}:{effectiveDate}
jurisdiction-factor:{jurisdictionCode}:{effectiveDate}

The effectiveDate component in the cache key is critical: it scopes cached values to a specific effective date, so a rate table update for a new effective date automatically results in a cache miss rather than serving stale data. This is a structural staleness prevention mechanism that doesn't depend on any application code correctly invalidating the right entries.

TTL-only cache invalidation was not sufficient for this use case. If a rate table update is applied mid-month and the TTL is set to 24 hours, the engine could serve stale rates for up to 24 hours after the update: incorrect premiums for that entire window. The fix was programmatic invalidation: the rate table service publishes an event whenever an update is committed, and a cache invalidation listener clears the affected cache entries immediately on receipt of that event. The TTL remains as a backstop against missed events, but correct data is available as soon as the update is committed, not after the TTL expires.

Risk profile and account discount data (which change more frequently, on policy updates and account modifications) are not cached at this layer. Caching frequently-changing data introduces more staleness risk than it eliminates latency, so those sources continue to fetch from origin on every request.

Infrastructure: AWS ECS Fargate with Stateless Horizontal Scaling

The stateless request design that fixed the race conditions also made the engine horizontally scalable in a clean way. Because no calculation state lives in the service process (every request carries its own isolated context, and Redis holds the cached reference data externally), any number of ECS Fargate task instances can run the calculation service and handle any request without coordination between instances.

Fargate task count scales on CPU utilization metrics published to CloudWatch. At peak load (3,000+ concurrent sessions), the auto-scaling policy adds task instances to distribute the load. Because every instance is stateless and reads from the same Redis cache, scaling in or out is transparent to the calculation logic. No sticky sessions, no session affinity, no shared in-process state to worry about across instances.

OAuth2 / JWT authentication is enforced at the API gateway layer before any request reaches the calculation service. The service itself verifies the JWT signature and extracts the caller's identity and scope claims, but authentication is not a responsibility the service owns end-to-end. It's a concern handled at the boundary.

Key Technical Challenges

Challenge 1: Eliminating Shared State Without Regressing Correctness

Refactoring a calculation engine that's in production and serving real premium requests requires demonstrating that the refactored version produces identical results to the original for every valid input, not just the inputs you tested. A calculation engine that's faster but occasionally computes a different premium is worse than one that's slow but correct.

The validation approach was to run both the original engine and the refactored engine against the same input sets (using historical production calculation requests logged during the investigation phase) and compare outputs. Every input that had previously produced a correct result in the original engine needed to produce the same result in the refactored version. Every input that had previously produced an incorrect result under concurrency needed to now produce the correct result consistently.

This dual-execution comparison was run against thousands of historical requests covering the full range of coverage types, jurisdiction codes, policy configurations, and account discount combinations. No regressions were found. The refactored engine matched the original on all inputs where the original was correct, and produced the correct result on all inputs where the original had been wrong under concurrent load.

Challenge 2: Cache Invalidation Timing in a Financial Context

Cache invalidation is famously one of the hard problems in computer science, but the difficulty is sharpened in a financial context. A stale cache entry that causes the engine to compute a premium based on last month's rate tables is not a minor inaccuracy. It's a contractual pricing error. The invalidation mechanism has to be fast and reliable.

The event-driven invalidation approach requires the rate table service to reliably publish update events, and the cache invalidation listener to reliably process them. Both halves of that chain can fail. The mitigation was defense in depth: the TTL backstop ensures staleness is bounded even if an event is missed, the cache keys include the effective date so future-dated updates don't pollute current-period entries, and the invalidation listener logs all events processed and publishes a CloudWatch metric on processing failures so missed invalidations are detectable rather than silent.

Redis caching with programmatic invalidation: effective-date keys, event-driven invalidation on rate-table updates, and a TTL backstop

The remaining staleness window (the time between a rate table update being committed and the invalidation event being processed) was measured in milliseconds under normal conditions. This was acceptable for the platform's operational requirements.

Challenge 3: Validating Concurrency Correctness, Not Just Performance

Performance improvements are straightforward to measure: before and after latency under load, error rates, throughput. Concurrency correctness is harder to validate because the failure mode (incorrect calculation results produced by thread interleaving) doesn't show up in standard performance metrics. A load test that measures only latency and error rate will pass even if the engine is still producing intermittently wrong premium amounts.

The JMeter load tests were designed to validate correctness under load, not just throughput. For a subset of the concurrent sessions, the test injected policy configurations with known expected premium outputs: configurations where the correct answer was precomputed and stored. Under 3,000-concurrent-session load, the test asserted that every response for these validation requests matched the expected value. Any deviation (any instance of the engine producing the wrong premium for a known input) would fail the test.

Across all load test runs, zero deviations were recorded. The correctness guarantee held under full concurrent load.

Outcome and What It Demonstrates

The refactored engine eliminated the race conditions that had been producing incorrect premium calculations in production, reduced data-fetching latency through parallelization, and added Redis caching that reduced load on upstream data services at peak traffic. The JMeter validation confirmed correctness under 3,000-concurrent-session load: the failure mode that had existed before the refactor was demonstrably absent after it.

From an engineering standpoint, the project demonstrates:

Concurrent Systems Reasoning. Diagnosing a race condition in a production system, tracing it to shared mutable state on singleton service beans, and fixing it structurally through stateless request design rather than through synchronization primitives reflects a genuine understanding of concurrency. Adding locks is the instinct; eliminating shared state is the engineering.

Profiling Before Optimizing. Discovering that the performance bottleneck was sequential I/O, not computation (after a week of investigating the wrong hypothesis), and pivoting to the correct fix demonstrates data-driven engineering. The profiling step was the decision point that determined whether the optimization work would actually improve anything.

Cache Design for a Financial Domain. The cache key design (including effective date), the combination of event-driven invalidation with TTL backstop, and the decision not to cache frequently-changing data reflects understanding of cache correctness requirements in a domain where stale data has contractual consequences.

Correctness Validation Under Load. Designing load tests that validate output correctness rather than just performance metrics (by injecting known-answer requests into the concurrent test load and asserting exact matches) is the rigorous approach to validating a concurrency fix. Measuring latency alone would not have proven the race conditions were eliminated.

Tech Stack Summary

Layer	Technology
Language	Java
Framework	Spring Boot
Concurrency	Java `CompletableFuture` · Bounded executor thread pool
Caching	Redis · Spring Cache abstraction · Programmatic invalidation
Infrastructure	AWS ECS Fargate (stateless horizontal scaling)
Authentication	OAuth2 · JWT
Load Testing	Apache JMeter (3,000+ concurrent sessions)
Observability	AWS CloudWatch (scaling metrics, invalidation failure alerts)
Platform Scale	1M+ users · 3,000+ peak concurrent sessions