Standard Template for on-prem Environment¶

This document contains instructions on how to set-up a new Welkin on-prem environment.

Prerequisites¶

Important

Decisions regarding the following items should be made before venturing on deploying Welkin.

Overall architecture, i.e., VM sizes, load-balancer configuration, storage configuration, etc.
Identity Provider (IdP) choice and configuration. See this page.
On-call Management Tool (OMT) choice and configuration.

Make sure you install all prerequisites on your computer.
Prepare Ubuntu-based VMs: If you are using public clouds, you can create VMs using the scripts included in Kubespray:
- For Azure, use AzureRM scripts.
- For other clouds, use their respective Terraform scripts.

Create a git working folder to store Welkin configurations in a version-controlled manner. Run the following commands from the root of the configuration repository.

export CK8S_CONFIG_PATH=~/.ck8s/my-cluster-path
export CK8S_CLOUD_PROVIDER=# run 'compliantkubernetes-apps/bin/ck8s providers' to list available providers
export CK8S_ENVIRONMENT_NAME=my-environment-name
export CK8S_FLAVOR=# run 'compliantkubernetes-apps/bin/ck8s flavors' to list available flavors
export CK8S_K8S_INSTALLER=# run 'compliantkubernetes-apps/bin/ck8s k8s-installers' to list available k8s-installers
export CK8S_PGP_FP=<your GPG key fingerprint>  # retrieve with gpg --list-secret-keys

export CLUSTERS=( "sc" "wc" )
export DOMAIN=example.com # your domain

Add the Welkin Kubespray repository as a git submodule to the configuration repository and install pre-requisites as follows:

Note

Remember to switch to the desired version of compliantkubernetes-kubespray.

git submodule add https://github.com/elastisys/compliantkubernetes-kubespray.git
git submodule update --init --recursive
cd compliantkubernetes-kubespray
git switch -d $(git tag --sort=committerdate | tail -1) # this will switch to the latest release tag
pip3 install -r kubespray/requirements.txt  # this will install Ansible
ansible-playbook -e 'ansible_python_interpreter=/usr/bin/python3' --ask-become-pass --connection local --inventory 127.0.0.1, get-requirements.yaml

Add the Welkin Apps repository as a git submodule to the configuration repository and install pre-requisites as follows:

Note

Remember to switch to the desired version of compliantkubernetes-apps.

git submodule add https://github.com/elastisys/compliantkubernetes-apps.git
cd compliantkubernetes-apps
git switch -d $(git tag --sort=committerdate | tail -1) # this will switch to the latest release tag
./bin/ck8s install-requirements

Create the domain name. You need to create a domain name to access the different services in your environment. You will need to set up the following DNS entries.
- Point these domains to the Workload Cluster Ingress Controller (this step is done during Welkin Apps installation):
- *.$DOMAIN
- Point these domains to the Management Cluster Ingress Controller (this step is done during Welkin Apps installation):
- *.ops.$DOMAIN
- dex.$DOMAIN
- grafana.$DOMAIN
- harbor.$DOMAIN
- opensearch.$DOMAIN
If both Management and Workload Clusters are in the same subnet

If both the Management and Workload Clusters are in the same subnet, it would be great to configure the following domain names to the private IP addresses of Management Cluster's worker nodes.
- *.thanos.ops.$DOMAIN
- *.opensearch.ops.$DOMAIN
Create S3 credentials and add them to .state/s3cfg.ini.
Set up load balancer

You need to set up two load balancers, one for the Workload Cluster and one for the Management Cluster.
Make sure you have all necessary tools.

Deploying Welkin using Kubespray¶

How to change Default Kubernetes Subnet Address

If the default IP block ranges used for Docker and Kubernetes are the same as the internal IP ranges used in the company, you can change the values to resolve the conflict as follows. Note that you can use any valid private IP address range, the values below are put as an example.

For KubernetesFor Docker

* For Management Cluster: Add `kube_service_addresses: 10.178.0.0/18` and `kube_pods_subnet: 10.178.120.0/18` in `${CK8S_CONFIG_PATH}/sc-config/group_vars/k8s_cluster/ck8s-k8s-cluster.yaml` file.
* For Workload Cluster:  Add `kube_service_addresses: 10.178.0.0/18` and `kube_pods_subnet: 10.178.120.0/18` in `${CK8S_CONFIG_PATH}/wc-config/group_vars/k8s_cluster/ck8s-k8s-cluster.yaml` file.

* For Management Cluster: Add `docker_options: "--default-address-pool base=10.179.0.0/24,size=24"` in `${CK8S_CONFIG_PATH}/sc-config/group_vars/all/docker.yml` file.
* For Workload Cluster:  Add `docker_options: "--default-address-pool base=10.179.4.0/24,size=24"` in `${CK8S_CONFIG_PATH}/wc-config/group_vars/all/docker.yml` file.

Init Kubespray configuration in your configuration path¶

for CLUSTER in ${CLUSTERS[@]}"; do
    compliantkubernetes-kubespray/ck8s-kubespray init $CLUSTER $CK8S_CLOUD_PROVIDER $CK8S_PGP_FP
done

Configure OIDC¶

To configure OpenID access for Kubernetes API and other services, Dex should be configured with your identity provider (IdP). Check what Dex needs from your identity provider.

Configure OIDC endpoint¶

The Management Cluster is recommended to be configured with an external OIDC endpoint provided by the IdP of your choice. This can be configured in ${CK8S_CONFIG_PATH}/sc-config/group_vars/k8s_cluster/ck8s-k8s-cluster.yaml by setting the following variables:

kube_oidc_auth should be set to true, this enables OIDC authentication for the api-server
kube_oidc_url should be set to an OIDC endpoint from your IdP (e.g. for Google this would be https://accounts.google.com)
kube_oidc_client_id should be retrieved from your IdP
kube_oidc_client_secret should be retrieved from your IdP

To configure the Workload Cluster to use Dex running in the Management Cluster for authentication you will also need to configure the following in ${CK8S_CONFIG_PATH}/wc-config/group_vars/k8s_cluster/ck8s-k8s-cluster.yaml:

kube_oidc_auth should be set to true, this enables OIDC authentication for the api-server
kube_oidc_url should be set to https://dex.$DOMAIN
kube_oidc_client_id should be set to kubelogin
kube_oidc_client_secret should be set to a Dex client secret generated with the apps configuration, it can be found in ${CK8S_CONFIG_PATH}/secrets.yaml under the key dex.kubeloginClientSecret after running ck8s init (see instructions on deploying apps).

To generate kubeconfigs that use OIDC for authentication, the following variables should be set in the configuration files for both Clusters (both can't be true):

create_oidc_kubeconfig: true
kubeconfig_localhost: false

For more information on managing OIDC kubeconfigs and RBAC, or on running without OIDC, see the Welkin Kubespray documentation.

Copy the VMs information to the inventory files¶

Add the host name, user and IP address of each VM that you prepared above in ${CK8S_CONFIG_PATH}/sc-config/inventory.ini for Management Cluster and ${CK8S_CONFIG_PATH}/wc-config/inventory.ini for Workload Cluster. Moreover, you also need to add the host names of the master Nodes under [kube_control_plane], etcd Nodes under [etcd] and worker Nodes under [kube_node].

Note

Make sure that the user has SSH access to the VMs.

Run Kubespray to deploy the Kubernetes Clusters¶

for CLUSTER in "${CLUSTERS[@]}"; do
    compliantkubernetes-kubespray/bin/ck8s-kubespray apply $CLUSTER --flush-cache
done

Note

The kubeconfig for the Workload Cluster (.state/kube_config_wc.yaml) will not be usable until you have installed Dex in the Management Cluster (by deploying apps).

Rook Block Storage¶

Normally, we want to use block storage solutions provided by the infra provider. However, this is not always available, especially for on-prem environments. In such cases we can partition separate volumes on Nodes in the Cluster for Rook-Ceph and use that as a block storage solution.

Deploy Rook¶

To deploy Rook, go to the welkin-rook repository and follow the instructions here for each Cluster.

Note

If the kubeconfig files for the Clusters are encrypted with SOPS, you need to decrypt them before using them:

sops --decrypt ${CK8S_CONFIG_PATH}/.state/kube_config_$CLUSTER.yaml > $CLUSTER.yaml
export KUBECONFIG=$CLUSTER.yaml

Please restart the operator Pod, rook-ceph-operator*, if some Pods stalls in initialization state as shown below:

rook-ceph     rook-ceph-crashcollector-minion-0-b75b9fc64-tv2vg    0/1     Init:0/2   0          24m
rook-ceph     rook-ceph-crashcollector-minion-1-5cfb88b66f-mggrh   0/1     Init:0/2   0          36m
rook-ceph     rook-ceph-crashcollector-minion-2-5c74ffffb6-jwk55   0/1     Init:0/2   0          14m

Warning

Pods in pending state usually indicate resource shortage. In such cases you need to use bigger instances.

Test Rook¶

Note

If the Workload Cluster kubeconfig is configured with authentication to Dex running in the Management Cluster, part of apps needs to be deployed before it is possible to run the commands below for wc.

To test Rook, proceed as follows:

for CLUSTER in sc wc; do
    kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default apply -f https://raw.githubusercontent.com/rook/rook/v1.11.9/deploy/examples/csi/rbd/pvc.yaml
    kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default apply -f https://raw.githubusercontent.com/rook/rook/v1.11.9/deploy/examples/csi/rbd/pod.yaml
done

for CLUSTER in sc wc; do
    kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default get pvc rbd-pvc
    kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default get pod csirbd-demo-pod
done

You should see PVCs in Bound state, and that the Pods which mounts the volumes are running.

Important

If you have taints on certain Nodes which should support running Pods that mounts rook-ceph PVCs, you need to ensure these Nodes are tolerated by the rook-ceph DaemonSet csi-rbdplugin, otherwise, Pods on these Nodes will not be able to attach or mount the volumes.

If you want to clean the previously created PVCs:

for CLUSTER in sc wc; do
    kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default delete pvc rbd-pvc
    kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default delete pod csirbd-demo-pod
done

Deploying Welkin Apps¶

How to change local DNS IP if you change the default Kubernetes subnet address

You need to change the default coreDNS default IP address in common-config.yaml file if you change the default IP block used for Kubernetes services above. To get the coreDNS IP address, run the following commands.

${CK8S_CONFIG_PATH}/compliantkubernetes-apps/bin/ck8s ops kubectl sc get svc -n kube-system coredns

Once you get the IP address edit ${CK8S_CONFIG_PATH}/common-config.yaml file and set the value to global.clusterDns field.

Configure the load balancer IP on the loopback interface for each worker Node

The Kubernetes data plane Nodes (i.e., worker Nodes) cannot connect to themselves with the IP address of the load balancer that fronts them. The easiest is to configure the load balancer's IP address on the loopback interface of each Nodes. Create /etc/netplan/20-eip-fix.yaml file and add the following to it. ${loadblancer_ip_address} should be replaced with the IP address of the load balancer for each cluster.

network:
  version: 2
  ethernets:
    lo0:
      match:
        name: lo
      dhcp4: false
      addresses:
      - ${loadblancer_ip_address}/32

After adding the above content, run the following command in each worker Node:

sudo netplan apply

Initialize the apps configuration¶

compliantkubernetes-apps/bin/ck8s init both

This will initialise the configuration in the ${CK8S_CONFIG_PATH} directory. Generating configuration files sc-config.yaml and wc-config.yaml, as well as secrets with randomly generated passwords in secrets.yaml. This will also generate read-only default configuration under the directory defaults/ which can be used as a guide for available and suggested options.

ls -l $CK8S_CONFIG_PATH

Configure the apps¶

Edit the configuration files ${CK8S_CONFIG_PATH}/sc-config.yaml, ${CK8S_CONFIG_PATH}/wc-config.yaml and ${CK8S_CONFIG_PATH}/secrets.yaml and set the appropriate values for some of the configuration fields. Note that, the latter is encrypted.

vim ${CK8S_CONFIG_PATH}/sc-config.yaml

vim ${CK8S_CONFIG_PATH}/wc-config.yaml

vim ${CK8S_CONFIG_PATH}/common-config.yaml

Edit the secrets.yaml file and add the credentials for:

S3 - used for backup storage.
Dex - connectors -- check your identity provider.
On-call management tool configurations-- Check supported on-call management tools.

sops ${CK8S_CONFIG_PATH}/secrets.yaml

The default configuration for the Management Cluster and Workload Cluster are available in the directory ${CK8S_CONFIG_PATH}/defaults/ and can be used as a reference for available options.

Warning

Do not modify the read-only default configurations files found in the directory ${CK8S_CONFIG_PATH}/defaults/. Instead configure the Cluster by modifying the regular files ${CK8S_CONFIG_PATH}/sc-config.yaml and ${CK8S_CONFIG_PATH}/wc-config.yaml as they will override the default options.

Create S3 buckets¶

You can use the following script to create required S3 buckets. The script uses s3cmd in the background and gets configuration and credentials for your S3 provider from ${HOME}/.s3cfg file.

# Use your default s3cmd config file: ${HOME}/.s3cfg
scripts/S3/entry.sh create

Warning

You should not use your own credentials for S3. Rather create a new set of credentials with write-only access, when supported by the object storage provider.

Install Welkin Apps¶

Start with the Management Cluster:

compliantkubernetes-apps/bin/ck8s apply sc

Then the Workload Cluster:

compliantkubernetes-apps/bin/ck8s apply wc

Settling¶

Important

Leave sufficient time for the system to settle, e.g., request TLS certificates from Let's Encrypt, perhaps as much as 20 minutes.

Check if all Helm charts succeeded.

compliantkubernetes-apps/bin/ck8s ops helm wc list -A --all

You can check if the system settled as follows:

for CLUSTER in sc wc; do
    compliantkubernetes-apps/bin/ck8s ops kubectl ${CLUSTER} get --all-namespaces pods
done

Check the output of the command above. All Pods needs to be Running or Completed.

for CLUSTER in sc wc; do
    compliantkubernetes-apps/bin/ck8s ops kubectl ${CLUSTER} get --all-namespaces issuers,clusterissuers,certificates
done

Check the output of the command above. All resources need to have the Ready column True.

Testing¶

After completing the installation step you can test if the apps are properly installed and ready using the commands below:

for CLUSTER in sc wc; do
  compliantkubernetes-apps/bin/ck8s test ${CLUSTER}
done

Done. Navigate to the endpoints, for example grafana.$BASE_DOMAIN, kibana.$BASE_DOMAIN, harbor.$BASE_DOMAIN, etc. to discover Welkin's features.

Operate¶

The following endpoints can be probed to ensure Welkin services are up and running:

curl --head https://dex.$DOMAIN/healthz
curl --head https://harbor.$DOMAIN/healthz
curl --head https://grafana.$DOMAIN/healthz
curl --head https://grafana.ops.$DOMAIN/healthz
curl --head app.$DOMAIN/healthz  # Pokes the WC Ingress Controller
curl --head app.ops.$DOMAIN/healthz  # Pokes the SC Ingress Controller
# All commands above should return 'HTTP/2 200'

curl --head -k https://kube-apiserver.$DOMAIN
curl --head https://thanos-receiver.ops.$DOMAIN
curl --head https://opensearch.ops.$DOMAIN
curl --head https://opensearch.$DOMAIN/api/status
# The commands above should return 'HTTP/2 401'

Note

Some of these subdomains can be overwritten in configuration (see example here)