Long-term log retention¶
Welkin by default sets an retention of 30 days for logs. Many regulators, including Swedish Healthcare, require a minimum of 5 year log retention.
This is not provided at the platform level by Welkin as it runs the risk of GDPR non-compliance. Logs may include sensitive information like personal data, which requires that the the retention scheme is designed together with application-specific knowledge to ensure compliance. Specifically, this includes that the retention scheme ensures that erased personal data can not be accidentally restored, as per Art. 17 GDPR Right to erasure (‘right to be forgotten’).
Using application-specific knowledge would also make it possible to reduce the amount of logs stored, by filtering out so only the required logs are kept. Minimising the kept data, storage costs and storage management.
Exporting logs for long-term storage¶
To enable long-term log retention we instead recommend using Elasticdump. This tool can export logs from OpenSearch within Welkin on a per document basis in either CSV or JSON format, allowing other tools to process the logs and ship them somewhere else. It can also perform transformations, compress using Gzip, and write them into a file or send them to S3 object storage.
Using this tool, along with the REST API of OpenSearch, then it is possible to create scripts to export logs for long-term storage using a Kubernetes CronJob. Down below are some examples how to discover indices to export from OpenSearch, some commands to use with Elasticdump, and an example Dockerfile and some Kubernetes manifests.
Accessing OpenSearch¶
Info
To access OpenSearch contact your Welkin administrator and ask them to create a user with suitable permissions listed here below. For Elastisys managed services customers this can be done by filing a service ticket.
This is the permissions required for the OpenSearch API snippets and the permissions required by Elasticdump:
cluster_permissions:
- cluster_monitor
- indices:data/read/scroll
- indices:data/read/scroll/clear
index_permissions:
- index_patterns:
- "*"
allowed_actions:
- indices:admin/aliases/get # Can be omitted when aliases is not used
- indices:monitor/*
- index_patterns:
- kubernetes*
allowed_actions:
- indices:admin/get
- indices:admin/mappings/get
- indices_monitor
- read
- search
With ${DOMAIN}
set to the domain of your environment, the variables then needed to connect becomes:
export OS_PROTOCOL="https"
export OS_ENDPOINT="opensearch.ops.${DOMAIN}"
export OS_USERNAME="<provided-by-admin>"
export OS_PASSWORD="<provided-by-admin>"
# The index pattern we want to export, normally "kubernetes*"
export OS_PATTERN="kubernetes*"
Important
These variables will be used later on in the example snippets.
Discovering aliases and indices¶
In OpenSearch logs are stored into indices. These indices are managed in a way that will limit them both in time and size, to make them more manageable. Each index typically represents a days worth of logs, but if the size of the index exceeds a set threshold a new one will be created to limit their maximum size.
The indices are all grouped within index aliases, a sort of virtual index that behind the scenes links to other indices. This allows one to read from all indices and write to one designated write index, all using the same name.
Since only the write index can change, one method to select indices for exporting into log-term storage is to only export the read indices. This way there is no need to check and update indices in case they've changed since the previous export run, simplifying the export logic.
Example: List all indices using a pattern
# call as: get_indices <pattern>
get_indices() {
pattern="$1"
res="$(curl -u "${OS_USERNAME}:${OS_PASSWORD}" -XGET "${OS_PROTOCOL}://${OS_ENDPOINT}/_cat/indices?h=index")"
if echo "${res}" | grep "error" >&2; then
exit 1
elif echo "${res}" | grep "fail" >&2; then
exit 1
else
echo "${res}" | sed -n "/${pattern}/p" | sort
fi
}
This will generate a list of all indices within OpenSearch for the specified pattern.
The pattern accepts regex used by sed
and should in most instances be kubernetes*
to only include the application logs indices.
Example: List all write indices using a pattern
# call as: get_write_index <pattern>
get_write_index() {
pattern="$1"
res="$(curl -u "${OS_USERNAME}:${OS_PASSWORD}" -XGET "${OS_PROTOCOL}://${OS_ENDPOINT}/_cat/aliases?h=alias,index,is_write_index")"
if echo "${res}" | grep "error" >&2; then
exit 1
elif echo "${res}" | grep "fail" >&2; then
exit 1
else
echo "${res}" | grep "true" | sed -n "/${pattern}/p" | awk '{print $2}' || true
fi
}
This will generate a list of all write indices within OpenSearch for the specified pattern.
The pattern accepts regex used by sed
and should in most instances be kubernetes*
to only include the application logs alias.
Since the pattern should only match a single alias, and since each alias can only have a single write index, the output should be validated that it at most only contains one index.
Using these two example functions we can now fetch the indices for a pattern and find the write index for any matching alias, meaning we can iterate over them, filter out the write index, and perform our backup:
Example: Iterating over indices, filtering out the write index, and perform export action
indices=$(get_indices "${OS_PATTERN}")
write_index=$(get_write_index "${OS_PATTERN}")
for index in ${indices}; do
if [ "${index}" = "${write_index}" ]; then
continue # skipping as it is the write index
fi
perform_export "${index}"
done
For more complex tasks checkout the OpenSearch REST API reference.
Exporting indices¶
With Elasticdump it is possible to export logs out to the console, to file, or to S3. In these example snippets we will export them to S3. To be able to do some management functions we will also use s3cmd, most importantly to be able to check for existing exports.
Using S3 both for Elasticdump and s3cmd will require the following variables:
export S3_BUCKET="<bucket>"
export S3_REGION="<region>"
export S3_ENDPOINT="<region-endpoint>"
export S3_FORCE_PATH_STYLE="<false|true>" # Generally "false" for AWS and Exoscale, else "true"
export AWS_ACCESS_KEY_ID="<access-key>"
export AWS_SECRET_ACCESS_KEY="<secret-key>"
if [ "$S3_FORCE_PATH_STYLE" = "true" ]; then
export S3_BUCKET_ENDPOINT="${S3_ENDPOINT}"
else
export S3_BUCKET_ENDPOINT="%(bucket)s.${S3_ENDPOINT}"
fi
Important
These variables will be used later on in the example snippets.
Example: Export entire index to S3
# With ${index} set to the index to export.
elasticdump \
--input "${OS_PROTOCOL}://${OS_USERNAME}:${OS_PASSWORD}@${OS_ENDPOINT}/${index}" \
--output "s3://$S3_BUCKET/${index}.json.gz" \
--s3Region ${S3_REGION} \
--s3Endpoint ${S3_ENDPOINT} \
--s3ForcePathStyle ${S3_FORCE_PATH_STYLE} \
--s3Compress \
--concurrency 40 \
--concurrencyInterval 1000 \
--intervalCap 20 \
--limit 1000
This process will take a while depending on the size of the index. By default it will not try to delete, replace or update any resources, so this must be enabled using the appropriate flags or it should be managed by other means like s3cmd.
Caution
If this process is aborted it will leave multipart uploads after itself that should be cleared, else they will still use storage on the S3 service!
These can be listed and then removed with s3cmd:
s3cmd \
--host=${S3_ENDPOINT} \
--host-bucket=${S3_BUCKET_ENDPOINT} \
multipart "s3://${S3_BUCKET}"
s3cmd \
--host=${S3_ENDPOINT} \
--host-bucket=${S3_BUCKET_ENDPOINT} \
abortmp "s3://${S3_BUCKET}/<multipart-upload-path>" "<multipart-upload-id>"
The example above can be modified to only export certain logs, by adding a query using OpenSearch Query DSL with the --searchBody '<query>'
flag.
This way it is possible to filter on certain labels to only export logs for a particular namespace, deployment, or even using identifier within structured logs.
An example for a specific namespace would be:
{
"query": {
"term": {
"kubernetes.namespace_name": "production"
}
}
}
Since the Query DSL is in JSON format it must be properly quoted or escaped to keep its format, preferably the whitespace should be stripped before sending it as an argument to Elasticdump.
Example: Putting it all together
# call as: perform_export <index>
perform_export() {
index="$1"
check="$(s3cmd "--host=${S3_ENDPOINT}" "--host-bucket=${S3_BUCKET_ENDPOINT}" ls "s3://${S3_BUCKET}/${index}.json.gz"
if [ -n "${check}" ]; then
return # skipping as it is already exported
fi
# Just as an example
query='{"query": {"term": {"kubernetes.namespace_name": "production"}}}'
elasticdump \
--input "${OS_PROTOCOL}://${OS_USERNAME}:${OS_PASSWORD}@${OS_ENDPOINT}/${index}" \
--output "s3://$S3_BUCKET/${index}.json.gz" \
--s3Region ${S3_REGION} \
--s3Endpoint ${S3_ENDPOINT} \
--s3ForcePathStyle ${S3_FORCE_PATH_STYLE} \
--s3Compress \
--concurrency 40 \
--concurrencyInterval 1000 \
--intervalCap 20 \
--limit 1000 \
--searchBody "${query}"
}
Deploying CronJobs¶
The simplest way to prepare this for deployment is to build a container image including Bash, Elasticdump and s3cmd, and set up a CronJob to run this on a preferred schedule.
Here are some examples of how to build and deploy them:
Example: Containerfile / Dockerfile
FROM docker.io/library/ubuntu:jammy
ARG DEBIAN_FRONTEND=noninteractive
ARG TZ=Etc/UTC
RUN apt update && \
apt install -y --no-install-recommends ca-certificates curl npm s3cmd && \
apt clean -y && \
rm -rf /var/lib/apt
RUN npm install elasticdump@v6.88.0 -g
CMD ["elasticdump"]
Example: Kubernetes resources
---
apiVersion: v1
kind: ConfigMap
metadata:
name: export-script
data:
script.sh: |-
<bash-script>
---
apiVersion: v1
kind: Secret
metadata:
name: export-secret
type: Opaque
stringData:
# OpenSearch
OS_PROTOCOL: "${OS_PROTOCOL}"
OS_ENDPOINT: "${OS_ENDPOINT}"
OS_USERNAME: "${OS_USERNAME}"
OS_PASSWORD: "${OS_PASSWORD}"
OS_PATTERN: "${OS_PATTERN}"
# S3
S3_BUCKET: "${S3_BUCKET}"
S3_REGION: "${S3_REGION}"
S3_ENDPOINT: "${S3_ENDPOINT}"
S3_BUCKET_ENDPOINT: "${S3_BUCKET_ENDPOINT}"
S3_FORCE_PATH_STYLE: "${S3_FORCE_PATH_STYLE}"
AWS_ACCESS_KEY_ID: "${AWS_ACCESS_KEY_ID}"
AWS_SECRET_ACCESS_KEY: "${AWS_SECRET_ACCESS_KEY}"
---
apiVersion: batch/v1
kind: CronJob
metadata:
name: export
spec:
schedule: <schedule> # example "@daily"
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
metadata:
labels:
app: log-export
spec:
automountServiceAccountToken: false
restartPolicy: Never
containers:
- name: export
image: <image>
command:
- /scripts/script.sh
envFrom:
- secretRef:
name: export-secret
resources:
requests:
cpu: 500m
memory: 500Mi
limits:
cpu: 1000m
memory: 750Mi
securityContext:
runAsNonRoot: true
capabilities:
drop:
- ALL
volumeMounts:
- name: script
mountPath: /scripts
readOnly: true
securityContext:
runAsNonRoot: true
runAsGroup: 65534
runAsUser: 65534
fsGroup: 65534
volumes:
- name: script
configMap:
name: export-script
defaultMode: 0777
For more information about CronJobs checkout the Kubernetes documentation and reference about the subject.