How can I monitor performance and resource usage on my Kubernetes (LKE) cluster?
Hello, how can I monitor my cluster's performance using LKE? Also, is your team thinking of adding the metrics-server service to LKE, or will that be up to us to deploy it into a cluster? I ask because the kubectl top command is quite helpful for me and I just want to know if I should wait until it is added or if I should look into adding it to my cluster.
There are a few ways you can gather insight for your cluser's performance with LKE.
For starters, you can run these two commands right out-of-the-box with LKE. They provide valuable insight the the status of your cluster.
$ kubectl describe pods $ kubectl describe nodes
However, it seems that you may be looking for more real-time insight, so I've included some more options below.
I can't say that metrics-server will come pre-installed with LKE. However, you can certainly install this on your own and use
kubectl top to view resource metrics from your cluster. Here are some resources that may be helpful in setting this up:
If you follow the second guide, you'll need to upgrade metrics-server since metrics.k8s isn't available in our current cluster. After installation, the following command should get you on your way:
$ helm upgrade my-release bitnami/metrics-server \ --set apiService.create=true
Another option you may be interested in exploring is Prometheus. This tool collects time series data, then allows you to visualize metrics and configure alerts. This guide has instructions that you may follow to get this set up: GitHub: Prometheus
The following command should get you started:
$ helm install [name] stable/prometheus
Here's another similar tool for monitoring your cluster. Grafana offers data visualization which allows you to generate graphs and maps to understand your data. You may also set up alerts to be notified when certain conditions are met. Here's the page for GitHub: Grafana. To kick things off, you can run the following command to start the installation process:
$ helm install [name] grafana stable/grafana
Hope this helps!
Grafana requires RWX storage, though, which the Linode CSI driver currently doesn't support, AFAICT?
That's correct @catalina — at the moment, the Linode CSI driver doesn't support directly ReadWriteMany (RWX) access for persistent volumes.
However, Grafana only requires RWX access for scaling. On a single pod, it should work fine.
Otherwise, if you do wish to scale your Grafana deployment, you can create a RWX-accessible Block Storage Volume using the NFS Server Provisioner helm chart. For example, this command would create a 100GiB Volume for the StorageClass
nfs that could support RWX-accessible PVC's:
helm install nfs-server stable/nfs-server-provisioner \ --set persistence.enabled=true,persistence.storageClass=linode-block-storage-retain,persistence.size=100Gi
You could then set up a PVC with RWX with the following manifest:
--- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: nfs-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 10Gi storageClassName: nfs
claimName: nfs-pvc for in your deployment's PVC spec would then enable you to scale your deployment while utilizing the same persistent data.
Quick follow up! Here's another post that details installing metrics server on LKE: