fix query-frontend request load balancing when using k8s service (#7966)

**What this PR does / why we need it**:
We noticed an imbalance in the requests sent to the Query Frontend pods
using `query-frontend` k8s service. This seems to be caused by
`query-frontend` being a headless service that resolves to each QF pod
IP and leaves it up to the client to load balance the requests as
mentioned
[here](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services).

This PR fixes the issue by creating two separate services for pod IP
discovery and load balancing of queries:
* `query-frontend` to be used for load balancing incoming Loki queries.
* `query-frontend-headless` to be used for discovering QF pod IPs from
queriers to connect as workers.

**Checklist**
- [x] `CHANGELOG.md` updated
- [x] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
pull/7969/head
Sandeep Sukhani 2 years ago committed by GitHub
parent cc4ab7487a
commit 9e7ac3d853
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 1
      CHANGELOG.md
  2. 23
      docs/sources/upgrading/_index.md
  3. 11
      production/ksonnet/loki/query-frontend.libsonnet
  4. 2
      production/ksonnet/loki/query-scheduler.libsonnet

@ -25,6 +25,7 @@
* [7880](https://github.com/grafana/loki/pull/7880) **sandeepsukhani**: consider range and offset in queries while looking for schema config for query sharding.
* [7937](https://github.com/grafana/loki/pull/7937) **ssncferreira**: Deprecate CLI flag `-ruler.wal-cleaer.period` and replace it with `-ruler.wal-cleaner.period`.
* [7906](https://github.com/grafana/loki/pull/7906) **kavirajk**: Add API endpoint that formats LogQL expressions and support new `fmt` subcommand in `logcli` to format LogQL query.
* [7966](https://github.com/grafana/loki/pull/7966) **sandeepsukhani**: Fix query-frontend request load balancing when using k8s service.
##### Changes

@ -57,6 +57,29 @@ ruler:
period: 5s
```
### Querier
#### query-frontend k8s headless service changed to load balanced service
*Note:* This is relevant only if you are using [jsonnet for deploying Loki in Kubernetes](https://grafana.com/docs/loki/latest/installation/tanka/)
The `query-frontend` k8s service was previously headless and was used for two purposes:
* Distributing the Loki query requests amongst all the available Query Frontend pods.
* Discover IPs of Query Frontend pods from Queriers to connect as workers.
The problem here is that a headless service does not support load balancing and leaves it up to the client to balance the load.
Additionally, a load-balanced service does not let us discover the IPs of the underlying pods.
To meet both these requirements, we have made the following changes:
* Changed the existing `query-frontend` k8s service from headless to load-balanced to have a fair load distribution on all the Query Frontend instances.
* Added `query-frontend-headless` to discover QF pod IPs from queriers to connect as workers.
If you are deploying Loki with Query Scheduler by setting [query_scheduler_enabled](https://github.com/grafana/loki/blob/cc4ab7487ab3cd3b07c63601b074101b0324083b/production/ksonnet/loki/config.libsonnet#L18) config to `true`, then there is nothing to do here for this change.
If you are not using Query Scheduler, then to avoid any issues on the Read path until the rollout finishes, it would be good to follow below steps:
* Create just the `query-frontend-headless` service without applying any changes to the `query-frontend` service.
* Rollout changes to `queriers`.
* Roll out the rest of the changes.
## 2.7.0
### Loki

@ -42,7 +42,11 @@ local k = import 'ksonnet-util/kausal.libsonnet';
local service = k.core.v1.service,
query_frontend_service:
// A headless service for discovering IPs of each query-frontend pod.
// It leaves it up to the client to do any load-balancing of requests,
// so if the intention is to use the k8s service for load balancing,
// it is advised to use the below `query-frontend` service instead.
query_frontend_headless_service:
$.util.grpclbServiceFor($.query_frontend_deployment) +
// Make sure that query frontend worker, running in the querier, do resolve
// each query-frontend pod IP and NOT the service IP. To make it, we do NOT
@ -52,6 +56,9 @@ local k = import 'ksonnet-util/kausal.libsonnet';
// Query frontend will not become ready until at least one querier connects
// which creates a chicken and egg scenario if we don't publish the
// query-frontend address before it's ready.
service.mixin.spec.withPublishNotReadyAddresses(true),
service.mixin.spec.withPublishNotReadyAddresses(true) +
service.mixin.metadata.withName('query-frontend-headless'),
query_frontend_service:
k.util.serviceFor($.query_frontend_deployment, $._config.service_ignored_labels),
}

@ -20,7 +20,7 @@ local k = import 'ksonnet-util/kausal.libsonnet';
max_outstanding_per_tenant: max_outstanding,
},
frontend_worker+: {
frontend_address: 'query-frontend.%s.svc.cluster.local.:9095' % $._config.namespace,
frontend_address: 'query-frontend-headless.%s.svc.cluster.local.:9095' % $._config.namespace,
},
},
},

Loading…
Cancel
Save