Case Study - SSH Certificate Enforcement

Migrated 2,100 users from static SSH keys to certificate-based authentication across 5 enforcement waves. Nobody got locked out.

5,000-person SaaS company

Application Integration

Problem

Five thousand employees, years of growth, and SSH keys scattered across thousands of hosts with no inventory. When someone left, their keys stayed. Nobody revoked them because there was no central place to do it — keys live on individual hosts, and nobody had built a system to track which keys belonged to whom.

Ask "who can SSH into this server right now?" and the honest answer was: nobody knows.

The company already had a certificate authority wired to their identity provider. The tooling for short-lived certificates existed. The hard part was not the technology — it was migrating 2,100 active users off static keys without breaking anyone's workflow.

Approach

We ran five enforcement waves. Wave 1 covered 99 users — small enough that if something broke, it broke quietly. Each wave expanded scope, finishing with 865 users in a single pass.

Users migrated
2,100
Enforcement waves
5
Downtime
0

Each wave followed the same pattern: identify users, confirm certificate auth works for each one, kill static key auth, monitor.

Wave 2 is where it got interesting. A CI/CD pipeline was authenticating with a static key that nobody knew existed. It broke silently — the team only noticed because a nightly deploy failed. After that, we added a pre-wave step: scan for active key usage in the 72 hours before enforcement, not just key presence on disk. That step caught three more undocumented integrations in later waves.

We built custom automation to find and remove deprecated static keys across thousands of host configs. Two cleanup passes after the final wave. When it was done, the only static keys left belonged to 14 service accounts with documented exceptions and scheduled rotation dates.

Outcome

Every SSH user authenticates with short-lived certificates now. When someone leaves and their IDP account gets deprovisioned, SSH access dies with it. No orphaned key sitting on a host for six months waiting for someone to notice.

The things that broke — the CI/CD pipeline, a monitoring agent, a legacy cron job on a bastion host — all surfaced in waves 1 and 2 when the blast radius was small enough to fix before anyone noticed.

Before this project, the security team had no audit trail for SSH. Now every certificate issuance is logged: user identity, target host, time-bounded expiry. "Who had access to this host last Tuesday?" is a query, not a shrug.

Handoff

The infrastructure team runs the certificate authority and enforcement configuration. We left them a runbook for certificate issuance, authentication troubleshooting, and service account exceptions.

The wave-based approach is documented as a repeatable playbook. Next acquisition or infrastructure expansion, same process, no re-invention.

Similar challenges in your stack? Start with a one-week assessment.