Skip to content

Infrastructure Provider Audit

This page will help you do your due diligence and ensure you choose a Infrastructure Provider that provides a solid foundation for Compliant Kubernetes and your application. Elastisys regularly uses this template to validate cloud partners, as required for ISO 27001 certification.

Rationale

Compliant Kubernetes is designed to build upon the security and compliance of the underlying Infrastructure Provider. If you cannot trust the underlying provider with controls such as physical security to the servers, safe disposal of hard drives, access control to infrastructure control plane, then no technical measure will help you achieve your security and compliance goals. Trying to take preventive measures in Compliant Kubernetes -- i.e., at the platform level -- is inefficient at best and downright dangerous at worst. Failing to due your due diligence will end up in security theatre, putting your reputation at risk.

Overview

The remainder of this page contains open questions that you should ask your Infrastructure Provider. Notice the following:

  • Make sure you ask open questions and note down the answers. Burden of proof lies with the provider that they do an excellent job with protecting data.
  • Ask all questions, then evaluate the provider's suitability. It is unlikely that you'll find the perfect provider, but you'll likely find one that is sufficient for your present and future needs.
  • The least expected the answer, the more "digging" is needed.
  • "You" represents the Infrastructure Provider and "I" represents the Compliant Kubernetes administrator.

Technical Capability Questionnaire

  1. Availability Zones:
    1. Where are your data centers located?
    2. How are they presented, i.e., single API vs. multiple independent APIs?
  2. Services:
    1. What services do you offer? (e.g., VMs, object storage)
    2. Are all your services available in all zones?
  3. Identity and Access Management (IAM):
    1. Do you offer IAM?
    2. How can I create roles? Via an API? Via a UI? Via a Terraform provider?
    3. What services can I configure role-based access control for?
    4. Can IAM be configured via API? Can IAM be configured via Terraform?
    5. Can one single user be given access to multiple projects?
  4. Infrastructure-aaS:

    1. Which IaaS engine do you use? (e.g., OpenStack, VMware, proprietary)
    2. Do you have a Terraform provider for your API?
    3. Do you have pre-uploaded Ubuntu images? Which?
      1. Do these images have AutomaticSecurityUpdates by default?
      2. Do these images have NTP enabled by default?
    4. Do you have a Kubernetes integration for your IaaS?
      1. Can I use a cloud-controller for automatic discovery of Nodes and labeling Nodes with the right Zone?
    5. Can you handle large diurnal capacity changes, a.k.a., auto-scaling? E.g., 40 VMs from 6.00 to 10.00, but only 10 VMs from 10.00-6.00.
      1. Can I reserve VMs? How do you bill for reserved but unused VMs?
      2. What technical implementation do you recommend? E.g., pause/unpause VMs, stop/start VMs, terminate/recreate VMs.
    6. Do you support anti-affinity?
      1. If not, how can we ensure that VMs don't end up on the same physical servers?
  5. Storage capabilities:

    1. Do you offer Object Storage as a Service (OSaaS)?
      1. Can I use the object storage via an S3-compatible API?
      2. Can I create buckets via API?
      3. Can I create bucket credentials via API?
      4. Do you have a Terraform provider for your API?
      5. In which zones?
      6. Do you have immutable storage or object lock?
      7. Is OSaaS stretched across zones?
      8. Is object storage replicated across zones?
    2. Do you offer Block storage as a Service (BLaaS)?
      1. Which API (OpenStack, VMware)?
      2. In which zones?
      3. Can I use a Container Storage Interface (CSI) driver for automatic creating of PersistentVolumes?
      4. [For NFS] How did you configure User ID Mapping, specifically root_squash, no_root_squash, all_squash, anonuid and anongid? Mapping the root UID to values typically used by containers, e.g., 1000, will lead to permission denied errors. For example, OpenSearch's init containers do chown 1000 which fails with squash_root and anonuid=1000.
      5. Is BSaaS stretched across zones?
      6. Is block storage replicated across zones?
      7. Does the CSI driver support the Snapshot feature? This is needed for more consistent Velero backups.
    3. Do you offer encryption-at-rest?
      1. Encrypted object storage: Do you offer this by default?
      2. Encrypted block storage: Do you offer this by default?
      3. Encrypted boot discs: Do you offer this by default?
      4. If not, how do you dispose of media potentially containing personal data (e.g., hard drivers, backup tapes)?
  6. Networking capabilities:

    1. Can the VMs be set up on a private network? Do you have a Terraform provider for your API?
      1. Is your private network stretched across zones?
      2. Do you trust the network between your data centers?
      3. Does the private network overlap:
        1. The default Docker network (172.17.0.0/16)?
        2. The default Kubernetes Service network (10.233.0.0/18)?
        3. The default Kubernetes Pod network (10.233.64.0/18)?
    2. Firewall-aaS
      1. Are Firewall-aaS available?
      2. What API? (e.g., OpenStack, VMware)
      3. Do you have a Terraform provider for your API?
    3. Do you offer Load Balancer-aaS (LBaaS)?
      1. Can I create a LB via API?
      2. Do you have a Terraform provider for your API?
      3. Can I use a cloud-controller for automatic creation of external LoadBalancers?
      4. Can I set up a LB across zones? Via API?
      5. Can VMs see themselves via the LB's IP? (If not, then VMs need a minor fix.)
      6. Do your LBs preserve source IPs? Usually, this involves clever DNAT or PROXY protocol support.
    4. Do you offer IPv6 support? By default?
    5. Do you offer DNS as a Service? Which API?
  7. Network security:

    1. Do you allow NTP (UDP port 123) for clock synchronization to the Internet?
      1. If not, do you have a private NTP server?
    2. Do you allow ACME (TCP port 80) for automated certificate provisioning via Let's Encrypt?
      1. If not, how will you provision certificates?

Organizational capabilities

  1. What regulations are your existing customers subject to? (e.g., GDPR, public sector regulations, some ISO 27001 profile)
  2. Can you show us your ISO-27001 certification?
    1. Which profile?
    2. Which organization made the audit?
    3. Can we get a copy of the Statement of Applicability (SoA)?
  3. Who is overall responsible with compliance in your organization?
  4. How do you implement regulatory and contractual requirements?
  5. How is a new requirement discovered?
    1. What is the journey that a requirement takes from discovery, to updating policies, to training employees, to implementation, to evidence of implementation?
  6. Do your data-centers fulfill physical security "skyddsklass 3" according to SSF 130 and SSF 200?
    1. If not, how do you comply with Directive (EU) 2022/2557 Resilience of critical entities Art. 13 p. 1(b)?
    2. If not, how is physical security handled?
  7. How do you handle incidents and deviations?
    1. What response times / time to resolution do you offer?
    2. What are your actual response times / time to resolution?
  8. What is your change management process?
  9. How do you handle technical vulnerabilities?
  10. How do you handle capacity management?
  11. In case of a breach, how long until you notify your customers?
  12. What SLA do you offer?
    1. What uptime do you offer?
    2. What is your measured uptime?
    3. Do you have a public status page?
  13. How do you handle access control?
  14. Does your operation team have individual accounts? How do you handle team member onboarding / offboarding?
  15. How do you communicate credentials to your customers?
  16. Do you have audit logs?
    1. How long do you store audit logs? Who has access to them? How are they protected against disclosure and tampering?
  17. How do you handle business continuity?
    1. How often do you test fail-over? How did the last test go?
  18. How do you handle disaster recovery?
    1. How often do you test disaster recovery? How did the last test go?
  19. What is your use of cryptography policy?
  20. How do you deal with DDoS attacks?
  21. Who are your colocation providers? Are they subprocessors? See Guidance from the Danish Data Protection Authority.
    1. Does your colocation provider have access to personal data, e.g. access to the server cabinet and can access the information that is processed on the servers or transferred via switches?
    2. Can your colocation provider replace hard drives, memory, etc.?
    3. Can your colocation provider move, restart or otherwise handle the servers?
    4. Does your colocation provider provide additional services beyond physical facilities as well as electricity and Internet?
  22. When did you perform the last penetration test?
    1. Can you share anything about the major findings and how you resolved them?

For Elastisys Self-Managed Customers

Feel free to skip the questions below. They are designed for our Managed Service and might not be relevant for you. We share them here for the sake of full transparency.

  1. Do you fully operate under EU jurisdiction?
  2. Is your ownership fully under EU jurisdiction?
  3. Are your suppliers fully under EU jurisdiction?
    1. Even the web fonts and analytics code on your front-page?
  4. Do you have a DPO?
    1. Is this an internal employee or outsourced?
  5. Can you show us your Data Processing Agreement (DPA)?
  6. [HIPAA only] Are you familiar with Business Associate Agreements?
    1. Are you ready to sign one with us?

Collaboration

  1. How can we collaborate with your on-call team?
    1. What collaboration channels do you offer? (e.g., Slack, Teams, phone, service desk)
    2. What response times can we expect?
    3. Is your on-call team available 24/7?
  2. Are you open to having quarterly operations (engineering) retrospectives? Our engineering team wants to keep a close loop with vendors and regularly discuss what went well, what can be improved, and devise a concrete action plan.
  3. Are you open to having quarterly roadmap discussions?

Environment Management

  1. What environmental policies and certifications do you have?
  2. What energy sources are your datacenters using?
  3. How do you work to become more energy efficient?
  4. How do you recycle used/old equipment?
  5. Do you do any form of environmental compensation activities?

Evidence

The audit should conclude with gathering the following documents in an "evidence package":

  1. Filled questionnaire
  2. All relevant certificates, e.g., ISO 14001, ISO 27001, “green cloud”
  3. Latest version of the Terms of Service and Data Protection Agreement
  4. All relevant certificates from data-centre providers
  5. Signed and transparent ownership structure