Slow CI is one of those things that grinds engineering team’s gears. Engineers wait longer for feedback, context-switching increases, and the temptation to batch changes grows, which only makes things worse. We spent a few weeks optimizing our GitHub Actions workflows and cut our PR checks from 30 minutes to 5 minutes. That’s an 83% reduction in wait time, achieved without sacrificing any test coverage or quality gates.
There wasn’t a single magic fix. We looked at where time was actually going and applied three principles: smart job dependencies, aggressive caching, and test sharding. Here’s what we learned.
The Foundation: Migrating to Blacksmith
We'd actually tried migrating to Blacksmith before but ran into issues. With a main branch as active as ours, we didn't want to deal with CI instability. This time around, the migration was straightforward (#26247): replace buildjet-*vcpu-ubuntu-2204 runners with blacksmith-*vcpu-ubuntu-2404 across 28 workflow files, swap out the cache and setup-node actions, and adjust vCPU allocations. The only real issue we hit was the corrupted cache problem with .next files (more on that in the caching section).
Once we upgraded, we noticed an immediate 2x speed improvement for pulling cache. With yarn install being 1.2GB, that saves around 22 seconds on every job that needs to yarn install, which is most of them. We're also leveraging Blacksmith's container caching for services used across many jobs (Postgres, Redis, Mailhog). The Initialize containers job is now down over 50%, from 20 seconds to 9 seconds.
What we had to fix
Blacksmith runners expose the host machine's CPU count rather than the allocated vCPUs (typically 4). Playwright, being helpful, would spawn workers to match, causing resource contention and flaky test failures. We simply needed to cap workers explicitly with --workers=4 to match our actual allocation.
We also discovered that higher parallelism exposed race conditions in our test cleanup logic. Integration tests that worked fine sequentially would collide when running concurrently, producing idempotencyKey errors. The solution required rethinking how tests clean up after themselves: querying bookings by eventTypeId rather than tracking individual booking IDs, which missed bookings created indirectly by the code under test (#26269, #26278, #26283).
Even our screenshot tests needed adjustment (#26247). Different runners have subtle font rendering differences, so we added deviceScaleFactor: 1 for consistent rendering and slightly increased our diff threshold from 0.05 to 0.07 to account for antialiasing variations without masking real regressions.
Moving faster often exposes hidden coupling in your system. The coupling was always there; you just couldn't see it at lower speeds. Fixing these issues is what enables sustainable speed improvements.
Principle 1: Shorten the Critical Path with Dependency Design
For us, a big problem was that lint was gating things it shouldn't gate. When your E2E tests take 7-10 minutes, blocking them on 2-3 minutes of lint and type checks just makes the whole workflow longer. The lint will finish long before E2E does, so why not let them run in parallel?
Decouple unrelated test suites
Our integration tests and main E2E suite don't need the API v2 build, so we removed that dependency (#26170). Conversely, the API v2 E2E tests don't need the main web build. Each test suite now depends only on what it actually requires, allowing maximum parallelism.
Consolidate preflight jobs
We had separate jobs for file change detection (changes) and E2E label checking (check-label). Each job has startup overhead: spinning up a runner, checking out code, restoring context. By merging these into a single prepare job, we eliminated redundant overhead (#26101). We went further and moved our dependency installation steps into the prepare job as well, saving another 20 seconds per workflow by eliminating yet another job boundary (#26320).
Make reporting non-blocking
E2E test reports are valuable, but they shouldn't sit on the critical path. We moved the merge-reports, publish-report, and cleanup-report jobs into a separate workflow triggered by workflow_run (#26157). Now engineers can re-run failed jobs immediately without waiting for report generation to complete.
The rule is simple: be loose about letting E2E jobs start running sooner rather than later. Don't block long-running jobs on short-running prerequisites unless there's a genuine dependency. Your CI's wall-clock time is determined by the critical path, and every unnecessary dependency extends that path.
Principle 2: Caching Is King (But Only If It Hits)
Effective caching requires keys that invalidate when they should, stay stable when they should, and a cache store that remains healthy over time.
Make use of lookup-only
Our most dramatic caching improvement came from a simple observation: we were downloading approximately 1.2GB of cached data even when we didn't need to install anything. The cache existed, the hit was successful, but we were still paying the download cost before discovering we could skip the install step.
The fix was to use lookup-only cache checks (#26314). We have dedicated steps that run before all jobs to ensure dependencies are installed, so by the time individual jobs run, we know the cache exists. Before attempting to restore the full cache, we first check if it exists without downloading. If all caches hit, we skip the entire restore-and-install flow. No downloads, no installs, just proceed to the actual work. This alone saved significant time on every cache-hit scenario, which, with proper cache key design, should be most runs.
We applied the same logic to Playwright browser installation (#26060). Since we have a dedicated step that ensures Playwright browsers are cached before test jobs run, we can use the same lookup-only approach. If the Playwright cache hits, skip the install entirely. Now we check first, then decide whether to download at all.
We extended this pattern to our database cache as well (#26344). Using lookup-only checks, we can skip rebuilding the DB cache when it already exists. We also simplified the cache key by removing the PR number and SHA, making it much more reusable across runs.
Cache correctness
Cache correctness matters as much as cache speed. We discovered that our **/node_modules/ glob pattern was capturing apps/web/.next/node_modules/, which is Next.js build output containing symlinks and generated files that change during builds. This led to corrupted cache archives with tar extraction errors (#26268). The fix was a simple exclusion pattern, but finding it required understanding exactly what was being cached and why.
We also implemented automatic cache cleanup (#26312). When a PR is merged or closed, we now delete its associated build caches. This keeps our cache store lean and prevents accumulation of stale entries that would never be used again. We simplified our cache key format in the process, removing unnecessary segments like runner.os and node_version that added complexity without improving hit rates.
Turbo Remote Caching
We upgraded to Next.js 16 (#26093), which reduced our builds from 6+ minutes to ~1 minute (shoutout to the Next.js team for huge improvements). With Turbo Remote Caching enabled, subsequent builds take just 7 seconds to pull from cache compared to the ~1 minute for a full build.
We also enabled Turbo Remote Caching for our API v2 E2E tests (#26331). Previously, each E2E shard would rebuild the platform packages from scratch. Now each shard benefits from Turbo's remote cache, and we optimized Jest's TypeScript compilation by enabling isolatedModules and disabling diagnostics in CI, which significantly speeds up test startup time.
Principle 3: Scale Tests with Sharding
When you have 1,100+ test cases across 82 E2E test files, running them sequentially is leaving performance on the table. Test sharding, which splits your test suite across multiple parallel runners, is the most direct way to reduce wall-clock time for large test suites. We already had sharding in place for our main E2E suite, but we increased it from 4 to 8 shards (#26342), which reduced the total E2E suite time by another minute. Our API v2 E2E tests were still running as a single job.
We sharded our API v2 E2E tests into 4 parallel jobs using Jest's built-in --shard option (#26183). Each shard runs independently with its own Postgres and Redis services, and artifacts are named uniquely per shard. What previously took 10+ minutes as a single job now completes significantly faster with the work distributed across four runners.
Sharding at scale also surfaces infrastructure issues you might not notice with sequential execution. When multiple jobs try to populate the same database cache simultaneously, you get race conditions. We solved this by creating a dedicated setup-db job that runs before all E2E and integration test jobs (#26171). This single job populates the database cache once, and all downstream jobs restore from it. No races, no duplicate work.
We also created a dedicated workflow for API v2 unit tests, separating them from the main test suite (#26189). This allows them to run in parallel with other checks rather than competing for resources in a monolithic test job.
The Compound Effect
None of these changes in isolation would have cut our CI time by 83%. The gains come from their combination:
Smart dependencies let jobs start earlier. Effective caching means those jobs spend less time on setup. Sharding means the actual test execution happens in parallel. Together, they compress the critical path from every directio
The investment compounds over time. Every PR now gets faster feedback. Engineers stay in flow longer. The temptation to batch changes decreases, which means smaller, more reviewable diffs. Code quality improves because the feedback loop is tighter.
We'll keep investing in CI performance because the returns are real and measurable. The goal isn't speed for its own sake; it's enabling engineers to ship quality software faster. When CI is fast, moving quickly and maintaining high standards work together rather than against each other.
Referenced Pull Requests
Foundation: Buildjet to Blacksmith Migration
#26247 - Migrate GitHub workflows from Buildjet to Blacksmith
Test Stability
#26269 - Fix flaky e2e tests with isolated user sessions
#26278 - Stabilize e2e tests with scoped locators and deterministic schedules
#26283 - Improve test isolation for managed event type e2e tests
Turbo Remote Caching
#26093 - Upgrade to Next 16 (builds reduced from 6+ minutes to ~1 minute)
#26331 - Enable Turbo Remote Caching (subsequent builds take 7s)
Job Dependencies and Workflow Optimization
#26101 - Consolidate changes and check-label jobs into prepare
#26170 - Decouple non-API v2 tests from API v2 build
#26157 - Make E2E report jobs non-blocking by moving to separate workflow
#26320 - Move deps job to prepare job step to save ~20s per workflow
Make use of lookup-only
#26314 - Use lookup-only cache check to skip deps job downloads
#26060 - Skip Playwright install on cache hit
#26344 - Use lookup-only for DB cache and simplify cache key
Cache correctness
#26268 - Exclude .next/node_modules from yarn cache to prevent corruption
#26312 - Delete cache-build cache entries on PR close
Test Sharding and Infrastructure

Beginnen Sie noch heute kostenlos mit Cal.com!
Erleben Sie nahtlose Planung und Produktivität ohne versteckte Gebühren. Melden Sie sich in Sekunden an und beginnen Sie noch heute, Ihre Planung zu vereinfachen, ganz ohne Kreditkarte!

