Standard Template for on-prem Environment¶
This document contains instructions on how to set-up a new Welkin on-prem environment.
Prerequisites¶
Important
Decisions regarding the following items should be made before venturing on deploying Welkin.
- Overall architecture, i.e., VM sizes, load-balancer configuration, storage configuration, etc.
- Identity Provider (IdP) choice and configuration. See this page.
- On-call Management Tool (OMT) choice and configuration.
-
Make sure you install all prerequisites on your computer.
-
Prepare Ubuntu-based VMs: If you are using public clouds, you can create VMs using the scripts included in Kubespray:
- For Azure, use AzureRM scripts.
- For other clouds, use their respective Terraform scripts.
-
Create a git working folder to store Welkin configurations in a version-controlled manner. Run the following commands from the root of the config repo.
export CK8S_CONFIG_PATH=~/.ck8s/my-cluster-path export CK8S_CLOUD_PROVIDER=# run 'compliantkubernetes-apps/bin/ck8s providers' to list available providers export CK8S_ENVIRONMENT_NAME=my-environment-name export CK8S_FLAVOR=# run 'compliantkubernetes-apps/bin/ck8s flavors' to list available flavors export CK8S_K8S_INSTALLER=# run 'compliantkubernetes-apps/bin/ck8s k8s-installers' to list available k8s-installers export CK8S_PGP_FP=<your GPG key fingerprint> # retrieve with gpg --list-secret-keys export CLUSTERS=( "sc" "wc" ) export DOMAIN=example.com # your domain
-
Add the Welkin Kubespray repo as a
git submodule
to the configuration repo and install pre-requisites as follows:Note
Remember to switch to the desired version of
compliantkubernetes-kubespray
.git submodule add https://github.com/elastisys/compliantkubernetes-kubespray.git git submodule update --init --recursive cd compliantkubernetes-kubespray git switch -d $(git tag --sort=committerdate | tail -1) # this will switch to the latest release tag pip3 install -r kubespray/requirements.txt # this will install Ansible ansible-playbook -e 'ansible_python_interpreter=/usr/bin/python3' --ask-become-pass --connection local --inventory 127.0.0.1, get-requirements.yaml
-
Add the Welkin Apps repo as a
git submodule
to the configuration repo and install pre-requisites as follows:Note
Remember to switch to the desired version of
compliantkubernetes-apps
.git submodule add https://github.com/elastisys/compliantkubernetes-apps.git cd compliantkubernetes-apps git switch -d $(git tag --sort=committerdate | tail -1) # this will switch to the latest release tag ./bin/ck8s install-requirements
-
Create the domain name. You need to create a domain name to access the different services in your environment. You will need to set up the following DNS entries.
- Point these domains to the Workload Cluster Ingress Controller (this step is done during Welkin Apps installation):
*.$DOMAIN
- Point these domains to the Management Cluster Ingress Controller (this step is done during Welkin Apps installation):
*.ops.$DOMAIN
dex.$DOMAIN
grafana.$DOMAIN
harbor.$DOMAIN
opensearch.$DOMAIN
If both Management and Workload Clusters are in the same subnet
If both the Management and Workload Clusters are in the same subnet, it would be great to configure the following domain names to the private IP addresses of Management Cluster's worker nodes.
*.thanos.ops.$DOMAIN
*.opensearch.ops.$DOMAIN
-
Create S3 credentials and add them to
.state/s3cfg.ini
. -
Set up load balancer
You need to set up two load balancers, one for the Workload Cluster and one for the Management Cluster.
-
Make sure you have all necessary tools.
Deploying Welkin using Kubespray¶
How to change Default Kubernetes Subnet Address
If the default IP block ranges used for Docker and Kubernetes are the same as the internal IP ranges used in the company, you can change the values to resolve the conflict as follows. Note that you can use any valid private IP address range, the values below are put as an example.
* For Management Cluster: Add `kube_service_addresses: 10.178.0.0/18` and `kube_pods_subnet: 10.178.120.0/18` in `${CK8S_CONFIG_PATH}/sc-config/group_vars/k8s_cluster/ck8s-k8s-cluster.yaml` file.
* For Workload Cluster: Add `kube_service_addresses: 10.178.0.0/18` and `kube_pods_subnet: 10.178.120.0/18` in `${CK8S_CONFIG_PATH}/wc-config/group_vars/k8s_cluster/ck8s-k8s-cluster.yaml` file.
* For Management Cluster: Added `docker_options: "--default-address-pool base=10.179.0.0/24,size=24"` in `${CK8S_CONFIG_PATH}/sc-config/group_vars/all/docker.yml` file.
* For Workload Cluster: Added `docker_options: "--default-address-pool base=10.179.4.0/24,size=24"` in `${CK8S_CONFIG_PATH}/wc-config/group_vars/all/docker.yml` file.
Init Kubespray config in your config path¶
for CLUSTER in ${CLUSTERS[@]}"; do
compliantkubernetes-kubespray/ck8s-kubespray init $CLUSTER $CK8S_CLOUD_PROVIDER $CK8S_PGP_FP
done
Configure OIDC¶
To configure OpenID access for Kubernetes API and other services, Dex should be configured with your identity provider (IdP). Check what Dex needs from your identity provider.
Configure OIDC endpoint¶
The Management Cluster is recommended to be configured with an external OIDC endpoint provided by the IdP of your choice. This can be configured in ${CK8S_CONFIG_PATH}/sc-config/group_vars/k8s_cluster/ck8s-k8s-cluster.yaml
by setting the following variables:
kube_oidc_auth
should be set to true, this enables OIDC authentication for the api-serverkube_oidc_url
should be set to an OIDC endpoint from your IdP (e.g. for Google this would behttps://accounts.google.com
)kube_oidc_client_id
should be retrieved from your IdPkube_oidc_client_secret
should be retrieved from your IdP
To configure the Workload Cluster to use Dex running in the Management Cluster for authentication you will also need to configure the following in ${CK8S_CONFIG_PATH}/wc-config/group_vars/k8s_cluster/ck8s-k8s-cluster.yaml
:
kube_oidc_auth
should be set to true, this enables OIDC authentication for the api-serverkube_oidc_url
should be set tohttps://dex.$DOMAIN
kube_oidc_client_id
should be set tokubelogin
kube_oidc_client_secret
should be set to a Dex client secret generated with the apps config, it can be found in${CK8S_CONFIG_PATH}/secrets.yaml
under the keydex.kubeloginClientSecret
after runningck8s init
(see instructions on deploying apps).
To generate kubeconfigs that use OIDC for authentication, the following variables should be set in the config files for both clusters (both can't be true):
create_oidc_kubeconfig: true
kubeconfig_localhost: false
For more information on managing OIDC kubeconfigs and RBAC, or on running without OIDC, see the Welkin Kubespray documentation.
Copy the VMs information to the inventory files¶
Add the host name, user and IP address of each VM that you prepared above in ${CK8S_CONFIG_PATH}/sc-config/inventory.ini
for Management Cluster and ${CK8S_CONFIG_PATH}/wc-config/inventory.ini
for Workload Cluster. Moreover, you also need to add the host names of the master nodes under [kube_control_plane]
, etcd nodes under [etcd]
and worker nodes under [kube_node]
.
Note
Make sure that the user has SSH access to the VMs.
Run Kubespray to deploy the Kubernetes clusters¶
for CLUSTER in "${CLUSTERS[@]}"; do
compliantkubernetes-kubespray/bin/ck8s-kubespray apply $CLUSTER --flush-cache
done
Note
The kubeconfig for wc .state/kube_config_wc.yaml
will not be usable until you have installed Dex in the Management Cluster (by deploying apps).
Rook Block Storage¶
Normally, we want to use block storage solutions provided by the infra provider. However, this is not always available, especially for on-prem environments. In such cases we can partition separate volumes on Nodes in the cluster for Rook-Ceph and use that as a block storage solution.
Deploy Rook¶
To deploy Rook, go to the compliantkubernetes-kubespray
repo, change directory to rook
and follow the instructions here for each cluster.
Note
If the kubeconfig files for the clusters are encrypted with SOPS, you need to decrypt them before using them:
sops --decrypt ${CK8S_CONFIG_PATH}/.state/kube_config_$CLUSTER.yaml > $CLUSTER.yaml
export KUBECONFIG=$CLUSTER.yaml
Please restart the operator Pod, rook-ceph-operator*
, if some pods stalls in initialization state as shown below:
rook-ceph rook-ceph-crashcollector-minion-0-b75b9fc64-tv2vg 0/1 Init:0/2 0 24m
rook-ceph rook-ceph-crashcollector-minion-1-5cfb88b66f-mggrh 0/1 Init:0/2 0 36m
rook-ceph rook-ceph-crashcollector-minion-2-5c74ffffb6-jwk55 0/1 Init:0/2 0 14m
Warning
Pods in pending state usually indicate resource shortage. In such cases you need to use bigger instances.
Test Rook¶
Note
If the Workload Cluster kubeconfig is configured with authentication to Dex running in the Management Cluster, part of apps needs to be deployed before it is possible to run the commands below for wc
.
To test Rook, proceed as follows:
for CLUSTER in sc wc; do
kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default apply -f https://raw.githubusercontent.com/rook/rook/v1.11.9/deploy/examples/csi/rbd/pvc.yaml
kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default apply -f https://raw.githubusercontent.com/rook/rook/v1.11.9/deploy/examples/csi/rbd/pod.yaml
done
for CLUSTER in sc wc; do
kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default get pvc rbd-pvc
kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default get pod csirbd-demo-pod
done
You should see PVCs in Bound state, and that the Pods which mounts the volumes are running.
Important
If you have taints on certain Nodes which should support running Pods that mounts rook-ceph
PVCs, you need to ensure these Nodes are tolerated by the rook-ceph DaemonSet csi-rbdplugin
, otherwise, Pods on these Nodes will not be able to attach or mount the volumes.
If you want to clean the previously created PVCs:
for CLUSTER in sc wc; do
kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default delete pvc rbd-pvc
kubectl --kubeconfig ${CK8S_CONFIG_PATH}/.state/kube_config_${CLUSTER}.yaml -n default delete pod csirbd-demo-pod
done
Deploying Welkin Apps¶
How to change local DNS IP if you change the default Kubernetes subnet address
You need to change the default coreDNS default IP address in common-config.yaml
file if you change the default IP block used for Kubernetes services above. To get the coreDNS IP address, run the following commands.
${CK8S_CONFIG_PATH}/compliantkubernetes-apps/bin/ck8s ops kubectl sc get svc -n kube-system coredns
${CK8S_CONFIG_PATH}/common-config.yaml
file and set the value to global.clusterDns
field.
Configure the load balancer IP on the loopback interface for each worker node
The Kubernetes data plane Nodes (i.e., worker Nodes) cannot connect to themselves with the IP address of the load balancer that fronts them. The easiest is to configure the load balancer's IP address on the loopback interface of each Nodes. Create /etc/netplan/20-eip-fix.yaml
file and add the following to it. ${loadblancer_ip_address}
should be replaced with the IP address of the load balancer for each cluster.
network:
version: 2
ethernets:
lo0:
match:
name: lo
dhcp4: false
addresses:
- ${loadblancer_ip_address}/32
sudo netplan apply
Initialize the apps configuration¶
compliantkubernetes-apps/bin/ck8s init both
This will initialise the configuration in the ${CK8S_CONFIG_PATH}
directory. Generating configuration files sc-config.yaml
and wc-config.yaml
, as well as secrets with randomly generated passwords in secrets.yaml
. This will also generate read-only default configuration under the directory defaults/
which can be used as a guide for available and suggested options.
ls -l $CK8S_CONFIG_PATH
Configure the apps¶
Edit the configuration files ${CK8S_CONFIG_PATH}/sc-config.yaml
, ${CK8S_CONFIG_PATH}/wc-config.yaml
and ${CK8S_CONFIG_PATH}/secrets.yaml
and set the appropriate values for some of the configuration fields.
Note that, the latter is encrypted.
vim ${CK8S_CONFIG_PATH}/sc-config.yaml
vim ${CK8S_CONFIG_PATH}/wc-config.yaml
vim ${CK8S_CONFIG_PATH}/common-config.yaml
Edit the secrets.yaml file and add the credentials for:
- S3 - used for backup storage.
- Dex - connectors -- check your identity provider.
- On-call management tool configurations-- Check supported on-call management tools.
sops ${CK8S_CONFIG_PATH}/secrets.yaml
The default configuration for the Management Cluster and Workload Cluster are available in the directory ${CK8S_CONFIG_PATH}/defaults/
and can be used as a reference for available options.
Warning
Do not modify the read-only default configurations files found in the directory ${CK8S_CONFIG_PATH}/defaults/
. Instead configure the cluster by modifying the regular files ${CK8S_CONFIG_PATH}/sc-config.yaml
and ${CK8S_CONFIG_PATH}/wc-config.yaml
as they will override the default options.
Create S3 buckets¶
You can use the following script to create required S3 buckets.
The script uses s3cmd
in the background and gets configuration and credentials for your S3 provider from ${HOME}/.s3cfg
file.
# Use your default s3cmd config file: ${HOME}/.s3cfg
scripts/S3/entry.sh create
Warning
You should not use your own credentials for S3. Rather create a new set of credentials with write-only access, when supported by the object storage provider.
Install Welkin Apps¶
Start with the Management Cluster:
compliantkubernetes-apps/bin/ck8s apply sc
Then the Workload Cluster:
compliantkubernetes-apps/bin/ck8s apply wc
Settling¶
Important
Leave sufficient time for the system to settle, e.g., request TLS certificates from Let's Encrypt, perhaps as much as 20 minutes.
Check if all Helm charts succeeded.
compliantkubernetes-apps/bin/ck8s ops helm wc list -A --all
You can check if the system settled as follows:
for CLUSTER in sc wc; do
compliantkubernetes-apps/bin/ck8s ops kubectl ${CLUSTER} get --all-namespaces pods
done
Check the output of the command above. All Pods needs to be Running or Completed.
for CLUSTER in sc wc; do
compliantkubernetes-apps/bin/ck8s ops kubectl ${CLUSTER} get --all-namespaces issuers,clusterissuers,certificates
done
Check the output of the command above. All resources need to have the Ready column True.
Testing¶
After completing the installation step you can test if the apps are properly installed and ready using the commands below:
for CLUSTER in sc wc; do
compliantkubernetes-apps/bin/ck8s test ${CLUSTER}
done
Done.
Navigate to the endpoints, for example grafana.$BASE_DOMAIN
, kibana.$BASE_DOMAIN
, harbor.$BASE_DOMAIN
, etc. to discover Welkin's features.
Operate¶
The following endpoints can be probed to ensure Welkin services are up and running:
curl --head https://dex.$DOMAIN/healthz
curl --head https://harbor.$DOMAIN/healthz
curl --head https://grafana.$DOMAIN/healthz
curl --head https://grafana.ops.$DOMAIN/healthz
curl --head app.$DOMAIN/healthz # Pokes the WC Ingress Controller
curl --head app.ops.$DOMAIN/healthz # Pokes the SC Ingress Controller
# All commands above should return 'HTTP/2 200'
curl --head -k https://kube-apiserver.$DOMAIN
curl --head https://thanos-receiver.ops.$DOMAIN
curl --head https://opensearch.ops.$DOMAIN
curl --head https://opensearch.$DOMAIN/api/status
# The commands above should return 'HTTP/2 401'
Note
Some of these subdomains can be overwritten in config (see example here)