Add a complete tutorial on how to ship logs from AWS EKS. (#2338)

* Add a complete tutorial on how to ship logs from AWS EKS.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>

* Update docs/clients/aws/eks/promtail-eks.md

* Update docs/clients/aws/eks/promtail-eks.md

* Final touches.

Signed-off-by: Cyril Tovena <cyril.tovena@gmail.com>
k25
Cyril Tovena 5 years ago committed by GitHub
parent ce76ad099a
commit 36594e21ae
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 69
      docs/clients/aws/eks/eventrouter.yaml
  2. BIN
      docs/clients/aws/eks/namespace-grafana.png
  3. 265
      docs/clients/aws/eks/promtail-eks.md
  4. 219
      docs/clients/aws/eks/values.yaml

@ -0,0 +1,69 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: eventrouter
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: eventrouter
rules:
- apiGroups: [""]
resources: ["events"]
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: eventrouter
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: eventrouter
subjects:
- kind: ServiceAccount
name: eventrouter
namespace: kube-system
---
apiVersion: v1
data:
config.json: |-
{
"sink": "stdout"
}
kind: ConfigMap
metadata:
name: eventrouter-cm
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: eventrouter
namespace: kube-system
labels:
app: eventrouter
spec:
replicas: 1
selector:
matchLabels:
app: eventrouter
template:
metadata:
labels:
app: eventrouter
tier: control-plane-addons
spec:
containers:
- name: kube-eventrouter
image: gcr.io/heptio-images/eventrouter:latest
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config-volume
mountPath: /etc/eventrouter
serviceAccount: eventrouter
volumes:
- name: config-volume
configMap:
name: eventrouter-cm

Binary file not shown.

After

Width:  |  Height:  |  Size: 518 KiB

@ -0,0 +1,265 @@
# Sending logs from EKS with Promtail
In this tutorial we'll see how setup promtail on [EKS][eks]. Amazon Elastic Kubernetes Service (Amazon [EKS][eks]) is a fully managed Kubernetes service, using Promtail we'll get full visibility into our cluster logs. We'll start by forwarding pods logs then nodes services and finally Kubernetes events.
After this tutorial you will able to query all your logs in one place using Grafana.
<!-- TOC -->
- [Sending logs from EKS with Promtail](#sending-logs-from-eks-with-promtail)
- [Requirements](#requirements)
- [Setting up the cluster](#setting-up-the-cluster)
- [Adding Promtail DaemonSet](#adding-promtail-daemonset)
- [Fetching kubelet logs with systemd](#fetching-kubelet-logs-with-systemd)
- [Adding Kubernetes events](#adding-kubernetes-events)
- [Conclusion](#conclusion)
<!-- /TOC -->
## Requirements
Before we start you'll need:
- The [AWS CLI][aws cli] configured (run `aws configure`).
- [kubectl][kubectl] and [eksctl][eksctl] installed.
- A Grafana instance with a Loki data source already configured, you can use [GrafanaCloud][GrafanaCloud] free trial.
For the sake of simplicity we'll use a [GrafanaCloud][GrafanaCloud] Loki and Grafana instances, you can get an free account for this tutorial on our [website][GrafanaCloud], but all the steps are the same if you're running your own Open Source version of Loki and Grafana instances.
## Setting up the cluster
In this tutorial we'll use [eksctl][eksctl], a simple command line utility for creating and managing Kubernetes clusters on Amazon EKS. AWS requires creating many resources such as IAM roles, security groups and networks, by using `eksctl` all of this is simplified.
> We're not going to use a Fargate cluster. Do note that if you want to use Fargate daemonset are not allowed, the only way to ship logs with EKS Fargate is to run a fluentd or fluentbit or Promtail as a sidecar and tee your logs into a file. For more information on how to do so, you can read this [blog post][blog ship log with fargate].
```bash
eksctl create cluster --name loki-promtail --managed
```
You have time for a coffee ☕, this usually take 15minutes. When this is finished you should have `kubectl context` configured to communicate with your newly created cluster. Let's verify everything is fine:
```bash
kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.5", GitCommit:"e6503f8d8f769ace2f338794c914a96fc335df0f", GitTreeState:"clean", BuildDate:"2020-07-04T15:01:15Z", GoVersion:"go1.14.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"16+", GitVersion:"v1.16.8-eks-fd1ea7", GitCommit:"fd1ea7c64d0e3ccbf04b124431c659f65330562a", GitTreeState:"clean", BuildDate:"2020-05-28T19:06:00Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
```
## Adding Promtail DaemonSet
To ship all your pods logs we're going to setup [Promtail][Promtail] as a DaemonSet in our cluster. This means it will run on each nodes of the cluster, we will then configure it to find the logs of your containers on the host.
What's nice about Promtail is that it uses the same [service discovery as Prometheus][prometheus conf], you should make sure the `scrape_configs` of Promtail matches the Prometheus one. Not only this is simpler to configure, but this also means Metrics and Logs will have the same metadata (labels) attached by the Prometheus service discovery. When querying Grafana you will be able to correlate metrics and logs very quickly, you can read more about this on our [blogpost][correlate].
We'll use helm 3, but if you want to use helm 2 it's also fine just make sure you have properly [installed tiller][tiller install].
Let's add the Loki repository and list all available charts.
```bash
helm repo add loki https://grafana.github.io/loki/charts
"loki" has been added to your repositories
helm search repo
NAME CHART VERSION APP VERSION DESCRIPTION
loki/fluent-bit 0.1.4 v1.5.0 Uses fluent-bit Loki go plugin for gathering lo...
loki/loki 0.30.1 v1.5.0 Loki: like Prometheus, but for logs.
loki/loki-stack 0.38.1 v1.5.0 Loki: like Prometheus, but for logs.
loki/promtail 0.23.2 v1.5.0 Responsible for gathering logs and sending them...
```
If you want to install Loki, Grafana, Prometheus and Promtail all together you can use the `loki-stack` chart, for now we'll focus on Promtail. Let's create a new helm value file, we'll fetch the [default][default value file] one and work from there:
```bash
curl https://raw.githubusercontent.com/grafana/loki/master/production/helm/promtail/values.yaml > values.yaml
```
First we're going to tell Promtail to send logs to our Loki instance, the example below shows how to send logs to [GrafanaCloud][GrafanaCloud], replace your credentials. The default value will send to your own Loki and Grafana instance if you're using the `loki-chart` repository.
```yaml
loki:
serviceName: "logs-prod-us-central1.grafana.net"
servicePort: 443
serviceScheme: https
user: <userid>
password: <grafancloud apikey>
```
Once you're ready let's create a new namespace monitoring and add Promtail to it:
```bash
kubectl create namespace monitoring
namespace/monitoring created
helm install promtail --namespace monitoring loki/promtail -f values.yaml
NAME: promtail
LAST DEPLOYED: Fri Jul 10 14:41:37 2020
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Verify the application is working by running these commands:
kubectl --namespace default port-forward daemonset/promtail 3101
curl http://127.0.0.1:3101/metrics
```
Verify that promtail pods are running. You should see only two since we're running a two nodes cluster.
```bash
kubectl get -n monitoring pods
NAME READY STATUS RESTARTS AGE
promtail-87t62 1/1 Running 0 35s
promtail-8c2r4 1/1 Running 0 35s
```
You can reach your Grafana instance and start exploring your logs. For example if you want to see all logs in the `monitoring` namespace use `{namespace="monitoring"}`, you can also expand a single log line to discover all labels available from the Kubernetes service discovery.
![grafana logs namespace][grafana logs namespace]
## Fetching kubelet logs with systemd
So far we're scrapings logs from containers, but if you want to get more visibility you could also scrape [systemd][systemd] logs from each of your machine. This means you can also get access to `kubelet` logs.
Let's edit our values file again and `extraScrapeConfigs` to add the [systemd][systemd] job:
```yaml
extraScrapeConfigs:
- job_name: journal
journal:
path: /var/log/journal
max_age: 12h
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
- source_labels: ['__journal__hostname']
target_label: 'hostname'
```
> Feel free to change the [relabel_configs][relabel_configs] to match what you would use in your own environnement.
Now we need to add a volume for accessing systemd logs:
```yaml
extraVolumes:
- name: journal
hostPath:
path: /var/log/journal
```
And add a new volume mount in Promtail:
```yaml
extraVolumeMounts:
- name: journal
mountPath: /var/log/journal
readOnly: true
```
Now that we're ready we can update the promtail deployment:
```bash
helm upgrade promtail loki/promtail -n monitoring -f values.yaml
```
Let go back to Grafana and type in the query below to fetch all logs related to Volume from Kubelet:
```logql
{unit="kubelet.service"} |= "Volume"
```
[Filters][Filters] expressions are powerful in [LogQL][LogQL] they help you scan through your logs, in this case it will filter out all your [kubelet][kubelet] logs not having the `Volume` word in it.
The workflow is simple, you always select a set of labels matchers first, this way you reduce the data you're planing to scan.(such as an application, a namespace or even a cluster).
Then you can apply a set of [Filters][Filters] to find the logs you want.
> Promtail also support [syslog][syslog].
## Adding Kubernetes events
Kubernetes Events (`kubectl get events -n monitoring`) are a great way to debug and troubleshoot your kubernetes cluster. Events contains information such as Node reboot, OOMKiller and Pod failures.
We'll deploy a the `eventrouter` application created by [Heptio][eventrouter] which logs those events to `stdout`.
But first we need to configure Promtail, we want to parse the namespace to add it as a label from the content, this way we can quickly access events by namespace.
Let's update our `pipelineStages` to parse logs from the `eventrouter`:
```yaml
pipelineStages:
- docker:
- match:
selector: '{app="eventrouter"}'
stages:
- json:
expressions:
namespace: event.metadata.namespace
- labels:
namespace: ""
```
> Pipeline stages are great ways to parse log content and create labels (which are [indexed][labels post]), if you want to configure more of them, check out the [documentation][pipeline].
Now update Promtail again:
```bash
helm upgrade promtail loki/promtail -n monitoring -f values.yaml
```
And deploy the `eventrouter` using:
```bash
kubectl create -f https://raw.githubusercontent.com/grafana/loki/master/docs/clients/aws/eks/eventrouter.yaml
serviceaccount/eventrouter created
clusterrole.rbac.authorization.k8s.io/eventrouter created
clusterrolebinding.rbac.authorization.k8s.io/eventrouter created
configmap/eventrouter-cm created
deployment.apps/eventrouter created
```
Let's go in Grafana [Explore][explore] and query events for our new `monitoring` namespace using `{app="eventrouter",namespace="monitoring"}`.
For more information about the `eventrouter` make sure to read our [blog post][blog events] from Goutham.
## Conclusion
That's it ! You can download the final and complete [`values.yaml`][final config] if you need.
Your EKS cluster is now ready, all your current and future application logs will now be shipped to Loki with Promtail. You will also able to [explore][explore] [kubelet][kubelet] and Kubernetes events. Since we've used a DaemonSet you'll automatically grab all your node logs as you scale them.
If you want to push this further you can check out [Joe's blog post][blog annotations] on how to automatically create Grafana dashboard annotations with Loki when you deploy new Kubernetes applications.
> If you need to delete the cluster simply run `eksctl delete cluster --name loki-promtail`
[eks]: https://aws.amazon.com/eks/
[aws cli]: https://aws.amazon.com/cli/
[GrafanaCloud]: https://grafana.com/signup/
[blog ship log with fargate]: https://aws.amazon.com/blogs/containers/how-to-capture-application-logs-when-using-amazon-eks-on-aws-fargate/
[correlate]: https://grafana.com/blog/2020/03/31/how-to-successfully-correlate-metrics-logs-and-traces-in-grafana/
[tiller install]: https://v2.helm.sh/docs/using_helm/
[default value file]: https://github.com/grafana/loki/blob/master/production/helm/promtail/values.yaml
[systemd]: https://github.com/grafana/loki/tree/master/production/helm/promtail#run-promtail-with-systemd-journal-support
[grafana logs namespace]: namespace-grafana.png
[relabel_configs]:https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config
[syslog]: https://github.com/grafana/loki/tree/master/production/helm/promtail#run-promtail-with-syslog-support
[Filters]: https://github.com/grafana/loki/blob/master/docs/logql.md#filter-expression
[kubelet]: https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#:~:text=The%20kubelet%20works%20in%20terms,PodSpecs%20are%20running%20and%20healthy.
[LogQL]: https://github.com/grafana/loki/blob/master/docs/logql.md
[blog events]: https://grafana.com/blog/2019/08/21/how-grafana-labs-effectively-pairs-loki-and-kubernetes-events/
[labels post]: https://grafana.com/blog/2020/04/21/how-labels-in-loki-can-make-log-queries-faster-and-easier/
[pipeline]: https://github.com/grafana/loki/blob/master/docs/clients/promtail/pipelines.md
[final config]: values.yaml
[blog annotations]: https://grafana.com/blog/2019/12/09/how-to-do-automatic-annotations-with-grafana-and-loki/
[kubectl]: https://kubernetes.io/docs/tasks/tools/install-kubectl/
[eksctl]: https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html
[Promtail]: ./../../promtail/README.md
[prometheus conf]: https://prometheus.io/docs/prometheus/latest/configuration/configuration/
[eventrouter]: https://github.com/heptiolabs/eventrouter
[explore]: https://grafana.com/docs/grafana/latest/features/explore/

@ -0,0 +1,219 @@
## Affinity for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
affinity: {}
annotations: {}
# The update strategy to apply to the DaemonSet
##
deploymentStrategy: {}
# rollingUpdate:
# maxUnavailable: 1
# type: RollingUpdate
initContainer:
enabled: false
fsInotifyMaxUserInstances: 128
image:
repository: grafana/promtail
tag: 1.5.0
pullPolicy: IfNotPresent
## Optionally specify an array of imagePullSecrets.
## Secrets must be manually created in the namespace.
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/
##
# pullSecrets:
# - myRegistryKeySecretName
livenessProbe: {}
loki:
serviceName: "logs-prod-us-central1.grafana.net"
servicePort: 443
serviceScheme: https
user: <grafana cloud user id>
password: <grafana cloud api key>
nameOverride: promtail
## Node labels for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
nodeSelector: {}
pipelineStages:
- docker:
- match:
selector: '{app="eventrouter"}'
stages:
- json:
expressions:
namespace: event.metadata.namespace
- labels:
namespace: ""
## Pod Labels
podLabels: {}
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "http-metrics"
## Assign a PriorityClassName to pods if set
# priorityClassName:
rbac:
create: true
pspEnabled: true
readinessProbe:
failureThreshold: 5
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
# limits:
# cpu: 200m
# memory: 128Mi
# requests:
# cpu: 100m
# memory: 128Mi
# Custom scrape_configs to override the default ones in the configmap
scrapeConfigs: []
# Custom scrape_configs together with the default ones in the configmap
extraScrapeConfigs:
- job_name: journal
journal:
path: /var/log/journal
max_age: 12h
labels:
job: systemd-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: 'unit'
- source_labels: ['__journal__hostname']
target_label: 'hostname'
securityContext:
readOnlyRootFilesystem: true
runAsGroup: 0
runAsUser: 0
serviceAccount:
create: true
name:
## Tolerations for pod assignment
## ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
# Extra volumes to scrape logs from
volumes:
- name: docker
hostPath:
path: /var/lib/docker/containers
- name: pods
hostPath:
path: /var/log/pods
# Custom volumes together with the default ones
extraVolumes:
- name: journal
hostPath:
path: /var/log/journal
volumeMounts:
- name: docker
mountPath: /var/lib/docker/containers
readOnly: true
- name: pods
mountPath: /var/log/pods
readOnly: true
# Custom volumeMounts together with the default ones
extraVolumeMounts:
- name: journal
mountPath: /var/log/journal
readOnly: true
# Add extra Commandline args while starting up promtail.
# more info : https://github.com/grafana/loki/pull/1530
extraCommandlineArgs: []
# example:
# extraCommandlineArgs:
# - -client.external-labels=hostname=$(HOSTNAME)
config:
client:
# Maximum wait period before sending batch
batchwait: 1s
# Maximum batch size to accrue before sending, unit is byte
batchsize: 102400
# Maximum time to wait for server to respond to a request
timeout: 10s
backoff_config:
# Initial backoff time between retries
min_period: 100ms
# Maximum backoff time between retries
max_period: 5s
# Maximum number of retries when sending batches, 0 means infinite retries
max_retries: 20
# The labels to add to any time series or alerts when communicating with loki
external_labels: {}
server:
http_listen_port: 3101
positions:
filename: /run/promtail/positions.yaml
target_config:
# Period to resync directories being watched and files being tailed
sync_period: 10s
serviceMonitor:
enabled: false
interval: ""
additionalLabels: {}
annotations: {}
# scrapeTimeout: 10s
# Extra env variables to pass to the promtail container
env: []
# enable and configure if using the syslog scrape config
syslogService:
enabled: false
type: ClusterIP
port: 1514
## Specify the nodePort value for the LoadBalancer and NodePort service types.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
##
# nodePort:
## Provide any additional annotations which may be required. This can be used to
## set the LoadBalancer service type to internal only.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
##
annotations: {}
labels: {}
## Use loadBalancerIP to request a specific static IP,
## otherwise leave blank
##
loadBalancerIP:
# loadBalancerSourceRanges: []
## Set the externalTrafficPolicy in the Service to either Cluster or Local
# externalTrafficPolicy: Cluster
Loading…
Cancel
Save