Soluzioni

Impresa

Modelli

Sviluppatore

Risorse

Prezzo

Da

Keith Williams

31 dic 2025

How We Cut Our CI Wall Time from 30 Minutes to 5 Minutes

How We Cut Our CI Wall Time from 30 Minutes to 5 Minutes

How We Cut Our CI Wall Time from 30 Minutes to 5 Minutes

Slow CI is one of those things that grinds engineering team’s gears. Engineers wait longer for feedback, context-switching increases, and the temptation to batch changes grows, which only makes things worse. We spent a few weeks optimizing our GitHub Actions workflows and cut our PR checks from 30 minutes to 5 minutes. That’s an 83% reduction in wait time, achieved without sacrificing any test coverage or quality gates.

There wasn’t a single magic fix. We looked at where time was actually going and applied three principles: smart job dependencies, aggressive caching, and test sharding. Here’s what we learned.

Cal.com event settings showing the 'Limits' tab with options to set buffer time before and after events, allowing users to block time before and after appointments.
Cal.com event settings showing the 'Limits' tab with options to set buffer time before and after events, allowing users to block time before and after appointments.

The Foundation: Migrating to Blacksmith

We'd actually tried migrating to Blacksmith before but ran into issues. With a main branch as active as ours, we didn't want to deal with CI instability. This time around, the migration was straightforward (#26247): replace buildjet-*vcpu-ubuntu-2204 runners with blacksmith-*vcpu-ubuntu-2404 across 28 workflow files, swap out the cache and setup-node actions, and adjust vCPU allocations. The only real issue we hit was the corrupted cache problem with .next files (more on that in the caching section).

Once we upgraded, we noticed an immediate 2x speed improvement for pulling cache. With yarn install being 1.2GB, that saves around 22 seconds on every job that needs to yarn install, which is most of them. We're also leveraging Blacksmith's container caching for services used across many jobs (Postgres, Redis, Mailhog). The Initialize containers job is now down over 50%, from 20 seconds to 9 seconds.

jobs:
  build:
    name: Build Web App
    runs-on: blacksmith-4vcpu-ubuntu-2404
    steps:
      - uses: actions/checkout@v4
      - uses: ./.github/actions/yarn-install
      - uses

What we had to fix

Blacksmith runners expose the host machine's CPU count rather than the allocated vCPUs (typically 4). Playwright, being helpful, would spawn workers to match, causing resource contention and flaky test failures. We simply needed to cap workers explicitly with --workers=4 to match our actual allocation.

We also discovered that higher parallelism exposed race conditions in our test cleanup logic. Integration tests that worked fine sequentially would collide when running concurrently, producing idempotencyKey errors. The solution required rethinking how tests clean up after themselves: querying bookings by eventTypeId rather than tracking individual booking IDs, which missed bookings created indirectly by the code under test (#26269, #26278, #26283).

Even our screenshot tests needed adjustment (#26247). Different runners have subtle font rendering differences, so we added deviceScaleFactor: 1 for consistent rendering and slightly increased our diff threshold from 0.05 to 0.07 to account for antialiasing variations without masking real regressions.

Moving faster often exposes hidden coupling in your system. The coupling was always there; you just couldn't see it at lower speeds. Fixing these issues is what enables sustainable speed improvements.

Principle 1: Shorten the Critical Path with Dependency Design

For us, a big problem was that lint was gating things it shouldn't gate. When your E2E tests take 7-10 minutes, blocking them on 2-3 minutes of lint and type checks just makes the whole workflow longer. The lint will finish long before E2E does, so why not let them run in parallel?

Decouple unrelated test suites

Our integration tests and main E2E suite don't need the API v2 build, so we removed that dependency (#26170). Conversely, the API v2 E2E tests don't need the main web build. Each test suite now depends only on what it actually requires, allowing maximum parallelism.

Consolidate preflight jobs

We had separate jobs for file change detection (changes) and E2E label checking (check-label). Each job has startup overhead: spinning up a runner, checking out code, restoring context. By merging these into a single prepare job, we eliminated redundant overhead (#26101). We went further and moved our dependency installation steps into the prepare job as well, saving another 20 seconds per workflow by eliminating yet another job boundary (#26320).

Make reporting non-blocking

E2E test reports are valuable, but they shouldn't sit on the critical path. We moved the merge-reports, publish-report, and cleanup-report jobs into a separate workflow triggered by workflow_run (#26157). Now engineers can re-run failed jobs immediately without waiting for report generation to complete.

name: E2E Report

on:
  workflow_run:
    workflows: ["PR Update"]
    types

The rule is simple: be loose about letting E2E jobs start running sooner rather than later. Don't block long-running jobs on short-running prerequisites unless there's a genuine dependency. Your CI's wall-clock time is determined by the critical path, and every unnecessary dependency extends that path.

Principle 2: Caching Is King (But Only If It Hits)

Effective caching requires keys that invalidate when they should, stay stable when they should, and a cache store that remains healthy over time.

Make use of lookup-only

Our most dramatic caching improvement came from a simple observation: we were downloading approximately 1.2GB of cached data even when we didn't need to install anything. The cache existed, the hit was successful, but we were still paying the download cost before discovering we could skip the install step.

The fix was to use lookup-only cache checks (#26314). We have dedicated steps that run before all jobs to ensure dependencies are installed, so by the time individual jobs run, we know the cache exists. Before attempting to restore the full cache, we first check if it exists without downloading. If all caches hit, we skip the entire restore-and-install flow. No downloads, no installs, just proceed to the actual work. This alone saved significant time on every cache-hit scenario, which, with proper cache key design, should be most runs.

- name: Check yarn cache (lookup-only)
  if: ${{ inputs.skip-install-if-cache-hit == 'true' }}
  uses: actions/cache/restore@v4
  id: yarn-download-cache-check
  with:
    path: ${{ steps.yarn-config.outputs.CACHE_FOLDER }}
    key: yarn-download-cache-${{ hashFiles('yarn.lock') }}
    lookup-only: true

We applied the same logic to Playwright browser installation (#26060). Since we have a dedicated step that ensures Playwright browsers are cached before test jobs run, we can use the same lookup-only approach. If the Playwright cache hits, skip the install entirely. Now we check first, then decide whether to download at all.

We extended this pattern to our database cache as well (#26344). Using lookup-only checks, we can skip rebuilding the DB cache when it already exists. We also simplified the cache key by removing the PR number and SHA, making it much more reusable across runs.

Cache correctness

Cache correctness matters as much as cache speed. We discovered that our **/node_modules/ glob pattern was capturing apps/web/.next/node_modules/, which is Next.js build output containing symlinks and generated files that change during builds. This led to corrupted cache archives with tar extraction errors (#26268). The fix was a simple exclusion pattern, but finding it required understanding exactly what was being cached and why.

We also implemented automatic cache cleanup (#26312). When a PR is merged or closed, we now delete its associated build caches. This keeps our cache store lean and prevents accumulation of stale entries that would never be used again. We simplified our cache key format in the process, removing unnecessary segments like runner.os and node_version that added complexity without improving hit rates.

delete-cache-build-on-pr-close:
  if: github.event_name == 'pull_request' && github.event.action == 'closed'
  runs-on: blacksmith-2vcpu-ubuntu-2404
  env:
    CACHE_NAME: prod-build
  steps:
    - name: Delete cache-build cache
      uses: useblacksmith/cache-delete@v1
      with:
        key: ${{ env.CACHE_NAME }}-${{ github.event.pull_request.head.ref }}
        prefix: "true"

Turbo Remote Caching

We upgraded to Next.js 16 (#26093), which reduced our builds from 6+ minutes to ~1 minute (shoutout to the Next.js team for huge improvements). With Turbo Remote Caching enabled, subsequent builds take just 7 seconds to pull from cache compared to the ~1 minute for a full build.

We also enabled Turbo Remote Caching for our API v2 E2E tests (#26331). Previously, each E2E shard would rebuild the platform packages from scratch. Now each shard benefits from Turbo's remote cache, and we optimized Jest's TypeScript compilation by enabling isolatedModules and disabling diagnostics in CI, which significantly speeds up test startup time.

Principle 3: Scale Tests with Sharding

When you have 1,100+ test cases across 82 E2E test files, running them sequentially is leaving performance on the table. Test sharding, which splits your test suite across multiple parallel runners, is the most direct way to reduce wall-clock time for large test suites. We already had sharding in place for our main E2E suite, but we increased it from 4 to 8 shards (#26342), which reduced the total E2E suite time by another minute. Our API v2 E2E tests were still running as a single job.

We sharded our API v2 E2E tests into 4 parallel jobs using Jest's built-in --shard option (#26183). Each shard runs independently with its own Postgres and Redis services, and artifacts are named uniquely per shard. What previously took 10+ minutes as a single job now completes significantly faster with the work distributed across four runners.

- name: Run Tests
  working-directory: apps/api/v2
  run: |
    yarn test:e2e:ci --shard=${{ matrix.shard }}/${{ strategy.job-total }}
    EXIT_CODE=$?
    echo "yarn test:e2e:ci --shard=${{ matrix.shard }}/${{ strategy.job-total }} command exit code: $EXIT_CODE"
    exit $EXIT_CODE

Sharding at scale also surfaces infrastructure issues you might not notice with sequential execution. When multiple jobs try to populate the same database cache simultaneously, you get race conditions. We solved this by creating a dedicated setup-db job that runs before all E2E and integration test jobs (#26171). This single job populates the database cache once, and all downstream jobs restore from it. No races, no duplicate work.

We also created a dedicated workflow for API v2 unit tests, separating them from the main test suite (#26189). This allows them to run in parallel with other checks rather than competing for resources in a monolithic test job.

The Compound Effect

None of these changes in isolation would have cut our CI time by 83%. The gains come from their combination:

Smart dependencies let jobs start earlier. Effective caching means those jobs spend less time on setup. Sharding means the actual test execution happens in parallel. Together, they compress the critical path from every directio

The investment compounds over time. Every PR now gets faster feedback. Engineers stay in flow longer. The temptation to batch changes decreases, which means smaller, more reviewable diffs. Code quality improves because the feedback loop is tighter.

We'll keep investing in CI performance because the returns are real and measurable. The goal isn't speed for its own sake; it's enabling engineers to ship quality software faster. When CI is fast, moving quickly and maintaining high standards work together rather than against each other.

Referenced Pull Requests

Foundation: Buildjet to Blacksmith Migration

  • #26247 - Migrate GitHub workflows from Buildjet to Blacksmith

Test Stability

  • #26269 - Fix flaky e2e tests with isolated user sessions

  • #26278 - Stabilize e2e tests with scoped locators and deterministic schedules

  • #26283 - Improve test isolation for managed event type e2e tests

Turbo Remote Caching

  • #26093 - Upgrade to Next 16 (builds reduced from 6+ minutes to ~1 minute)

  • #26331 - Enable Turbo Remote Caching (subsequent builds take 7s)

Job Dependencies and Workflow Optimization

  • #26101 - Consolidate changes and check-label jobs into prepare

  • #26170 - Decouple non-API v2 tests from API v2 build

  • #26157 - Make E2E report jobs non-blocking by moving to separate workflow

  • #26320 - Move deps job to prepare job step to save ~20s per workflow

Make use of lookup-only

  • #26314 - Use lookup-only cache check to skip deps job downloads

  • #26060 - Skip Playwright install on cache hit

  • #26344 - Use lookup-only for DB cache and simplify cache key

Cache correctness

  • #26268 - Exclude .next/node_modules from yarn cache to prevent corruption

  • #26312 - Delete cache-build cache entries on PR close

Test Sharding and Infrastructure

  • #26342 - Increase main E2E shards from 4 to 8 (saves 1 minute)

  • #26183 - Shard API v2 E2E tests into 4 parallel jobs

  • #26171 - Add dedicated setup-db job to eliminate cache race condition

  • #26189 - Create new workflow for API v2 unit tests

Get started with Cal.com for free today!

Experience seamless scheduling and productivity with no hidden fees. Sign up in seconds and start simplifying your scheduling today, no credit card required!