Valentin Averin
8 min read
Cloud HSMs, Hybrid Environments and Key Management Risks
13:01

In the previous posts of our Key Management Series, we’ve reviewed the history of key management and the evolution path for most critical key management principles, understood why split knowledge and dual control are not the only required principles to manage cryptographic keys securely and observed top key management failures we see in PCI assessments. We also spotted that the industry is changing, becoming more complex and moving their most sensitive payment infrastructure into the cloud, including payment HSMs and key management services. Now it is time to get a final look at the key management risks and challenges the payment industry faces, while making this move and why PCI SSC is planning to release a new Standard for Key Management for the payment industry.

 

What HSM Operational Model Actually Means

In traditional payment environments, key management was linked with physical controls and ownership of the Secure Cryptographic Devices (SCDs), like Hardware Security Modules (HSMs) and Key Loading Devices (KLDs). Payment service providers that were building their key management systems and preparing for PCI assessments were:

  • Owned the HSMs
  • Performed hardening of the HSMs and other key management SCDs
  • Defined and organised controlled access to the datacentres and secure rooms with HSMs and SCDs
  • Performed key ceremonies in controlled environments on their own
  • Managed the lifecycle of keys within clearly defined and controlled procedures.

There was a direct relationship between who owned the key management infrastructure and who controlled the keys and related security risks. The introduction of the HSM-as-a-service model has changed things. Payment providers are no longer fully in control of their payment processing infrastructure and key management processes and now rely on the HSMs and other SCDs controlled by the Third Party. Let's break down the HSM operational models, the compliance and security risks, and how responsibility of critical PCI key management controls are shared among the industry players.

HSM Operational Models and Who Actually Holds the Keys

When payment service providers transition to cloud key management, they are essentially choosing how much trust they want to place in their HSM-as-a-service Cloud Service Provider (CSP). With the trust, payment providers also place a portion of PCI Compliance responsibility. Compliance to standards like PCI P2PE or PCI PIN where cryptographic key management plays a critical role becomes not only payment providers’ responsibility, but also heavily dependent on those CSPs. The operational and compliance models generally fall into different categories of hardware architectures and key management responsibility. Let’s review those operational models and analyse the compliance key management challenges behind them.

 

On-Prem HSM Owned by the Payment Service Provider

This is a traditional model where HSMs and all related key management operations are in full control and visibility of a payment service provider. This model provides full isolation and maximum environmental and physical ownership. Payment provider fully controls HSM mode of operations, its hardware and firmware versions, configuration and hardening of the devices and decides the high-availability scenarios and key management hierarchy together with physical security and environmental controls.

Key management and key ownership. For this scenario key management has always been under the payment service provider’s responsibility and all compliance and security risks managed by the payment provider’s internal team. Key hierarchy is built internally and all key types, e.g. master keys, data encryption or key encryption keys managed by the payment service provider.

 

Single-Tenant Dedicated HSM managed by CSP

These types of services offer a payment provider dedicated HSM or set of HSMs for the exclusive usage. CSP is responsible for offering to the payment service provider specific HSM models to choose for their payment operations and taking care of the physical security and environmental controls for those devices. The exclusive access to the HSMs belongs to the payment provider, but CSP can be engaged for hands on configuration and management on behalf of the payment provider.

Key management and key ownership. This approach keeps tenant cryptographic keys separate from other tenants, offering strong assurance, but requires the tenant to manage high availability, clustering, and disaster recovery by requesting proper architecture and services from HSM-as-a-service CSP. Also, there is a possibility for the payment provider to generate the key in their own on-premises HSM and import it into the cloud provider's HSMs (BYOK). While the payment provider controls the cryptographic key's origin, once it is uploaded, it resides in the provider's infrastructure and the CSP technically has access to use it or can impact on the key's security.

 

Multi-Tenant HSM managed by CSP

Multi-Tenant HSMs serve multiple payment service providers (tenants) on the same physical HSM hardware or HSM clusters, separated by logical boundaries. CSP is responsible for offering to the payment service provider an agreed service of payment processing, usually not revealing to the tenant the exact hardware and firmware versions of the HSMs or specific hardening scenario for the devices, but the tenant can choose geographical spread and high-availability architectures for the setup. The exclusive access to the HSMs belongs to the CSP and the tenant only assured that level management of service provided in line with agreed SLAs and specific compliance requirements. While cost-effective, this introduces "noisy neighbor" risks where one tenant's heavy cryptographic load can impact other tenant performance, and they can expand the attack surface for side-channel attacks.

Key management and key ownership. As in the previous scenario, this approach keeps tenant cryptographic keys separate from other tenants, offering strong assurance, but usually, payment providers are not allowed to exclusively change key management principles or HSM settings provided by the CSP. In addition, a separate layer of isolation is in place between the tenant keys and keys CSP maintains to secure tenant operations.

 

Key Management Challenges and the Way to Address Them

As we can see, the shift to HSM-as-a-service model or Cloud HSMs is fundamentally reshaping cryptographic architectures for the payment service providers. Although it eliminates the significant challenges associated with maintaining physical hardware and securing data centres, it introduces more sophisticated risks related to logical key management processes, operations, and multi-tenancy environments. To manage this new risk structure, the upcoming PCI Key Management Operations (KMO) Standard will include specific requirements that were designed to strictly regulate HSM-as-a-Service environments and how tenants should be protected and assured their keys are on the same level of security as within the on-prem HSMs. Let’s review core challenges introduced by HSM-as-a-service, the underlying risks, and how the future PCI KMO standard can address them.

 

The Multi-Tenancy Logical Isolation Risk

In a multi-Tenant HSM managed by the CSP, multiple tenants often share the same physical HSM infrastructure or clusters of HSMs. There is a risk of a boundary failure. A vulnerability in the HSM hypervisor, API, or logical partitioning could permit one tenant’s cryptographic workload, malware, or compromised keys to access, interfere with, or overwrite another tenant’s keys.

How To Address the Risk

  • Strict Hierarchical Separation. It should be mandatory for the HSM-as-a-Service to never share cryptographic keys between the hierarchies of different HSM tenants.
  • Isolated Execution. The HSM processing element must ensure that cleartext secret and private keys are processed in isolated execution paths and memory areas, separated entirely from other tenants.
  • Isolated Access & Configuration. Key loading and access must be fully isolated per tenant. Furthermore, any configuration, setting, or firmware update made by an individual HSM tenant must be entirely isolated from affecting other tenants.

 

The HSM-as-a-Service Insider Threats and Unauthorized Access Risk

When you don't own the hardware, you don't fully own the perimeter. HSM-as-a-Service providers have highly privileged administrators who manage the HSM virtualization systems and processing elements. There is a risk that, if a provider credential is compromised, an insider acts maliciously, or a government access order is issued, tenant keys could be improperly accessed or misused without authorization.

How To Address the Risk

  • Cryptographically Authenticated Operations. The HSM-as-a-Service CSP cannot arbitrarily use a tenant's keys. It should be a requirement that operations performed with an HSM tenant’s keys be cryptographically authenticated. Furthermore, it must be impossible to import or export a tenant's keys without cryptographically verifiable approval directly from the HSM tenant.

  • Hardened Remote Administration. Providers cannot casually SSH into the HSM management plane. There should be a control that requires remote administrative access utilising secure multi-factor authentication, with at least one authentication factor physically bound to an SCD or Hardware Management Device (HMD).

  • Mandatory Erasure Upon Decommissioning. Before an HSM processing element is removed from service, the HSM-as-a-Service CSP must securely erase all HSM tenant keys from that specific element.

 

Loss of Transparency and Control Risk

In On-Prem HSM scenario, payment providers have direct access to system logs, configuration states, and hardware versions for quick incident response or assessment purposes. Cloud HSMs often obscure this data, returning only generic API responses. If a breach occurs, the tenant struggles to understand exactly which hardware processed their data or what firmware version was running at the time.

How To Address the Risk

  • Granular, Transparent Auditing. There should be a control that forces the HSM-as-a-Service CSP to maintain detailed logs of operations that are accessible to the tenant. Crucially, these logs must specify exactly which HSM virtualization system, and which physical HSM processing element executed the action.

  • Tenant Control Over Changes. HSM CSPs cannot silently push updates that degrade security. Any changes to the HSM service that negatively affect a tenant's key security and compliance are explicitly communicated to, and accepted by, the tenant before the change is made.

  • Emergency Suspension. If a tenant suspects a compromise, the HSM-as-a-Service CSP must support the tenant's ability to immediately disable or suspend access to their cryptographic keys without waiting for provider intervention and without impacting other tenants.

The Virtualization Boundary Risk

Modern Cloud HSMs rely on an "HSM virtualization system" to create and manage tenants’ partitions, abstract API commands, and route tasks to physical HSM processing elements. If this virtualization layer is compromised, the entire HSM service is essentially breached, regardless of how secure the underlying HSM is.

How To Address the Risk

Securing the Virtualization Layer. A HSM virtualization system must either exist entirely within the tamper-responsive boundary of the physical HSM itself, or it must be installed in a highly secure, restricted physical environment.

 

Conclusion

In the end, managing a Cloud HSM environment involves giving up direct control over physical hardware in exchange for handling more complex access controls and strict oversight of the cloud service provider. To effectively manage this risk, industry is moving towards a new compliance framework structure and looking for a set of controls that were not specified by the existing PCI Standards. Pushing secure cryptographic devices and key management systems into the shared responsibility model payment service providers must have an assurance that strict logical isolation between cloud HSM tenants is followed, Cloud HSM operational logging exist, and payment service provider is in explicit approval for any use of their keys.

 

How We Help

Our PCI Assessor team provides the technical guidance needed to secure your environment while minimizing your compliance overhead.

Learn more about our PCI Crypto Practice and get support from the most trusted team.

Subscribe to our Blog

Request more information

Contact PCI QSA for strategic advisory 

Valentin Averin
Valentin Averin

Head of PCI PIN and PCI 3DS Practice at Foregenix. He is an experienced professional in information security and payment security, who has more than 17 years of a strong track record in the financial services security industry and compliance. His expertise spans across: - Payment & data security for both traditional and emerging payment instruments (mobile wallets, real-time payments, digital assets, and tokenized payment solutions). - PCI Security Standards family: PCI DSS, P2PE, PIN Security, PCI 3DS, PCI TSP. - ISO 27001 & Cybersecurity Governance in banking and payment ecosystems. - Risk & Compliance Management across regulated financial environments.

See All Articles
SUBSCRIBE

Subscribe to our blog

Security never stops. Get the most up-to-date information by subscribing to the Foregenix blog.