Hardcode ring.WriteNoExtend for distributor push operations (#7517)

With zone awareness enabled we allow restarting multiple ingesters
within a single zone at the same time, this means that for many more
write requests there's potentially only 2 of the 3 replicas for a stream
up at the time the write request is received. This lead to a large
increase in 500s during rollouts.

My suspicion is that the 500s on rollouts was always related to the
following; the replica set ring code has something referred to as
"extending the replica set". This essentially is the adding of ingesters
in the ring that are not in an ACTIVE state (for example, are in
LEAVING) to the set of replicas considered valid for an operation.
Adding of the instance in a state other than ACTIVE can be controlled by
using a different operation type, which Mimir was using but we were not.
Here are the relevant functions,
[ring.Get](8d6d914ef6/ring/ring.go (L335-L403))
(particularly these lines
[1](8d6d914ef6/ring/ring.go (L381-L389))
and
[2](8d6d914ef6/ring/ring.go (L394-L397)))
and
[replicationStrategy.Filter](789ec0ca4a/ring/replication_strategy.go (L22-L71))
(particularly these lines
[1](789ec0ca4a/ring/replication_strategy.go (L32-L36))
and
[2](789ec0ca4a/ring/replication_strategy.go (L54-L67)))

The tl; dr of the above code snippets is that we seem to always have
required a minSuccess from replicationFactor/2 + 1 except we overwrite
replicationFactor to be the # of instances which is likely always going
to be more than just 3 ingester replicas because of the replica set
extension. At least for zone aware rollouts this seems to be the case.
This is because many ingesters will be in the LEAVING state. We can
avoid this by just using `WriteNoExtend` operations instead of `Write`
operations
([here](789ec0ca4a/ring/ring.go (L83-L93)))

In this PR I've hardcoded our call to the ring in `distributor.Push` to
always use `ring.WriteNoExtend`. IMO this is the better option to
allowing `ring.WriteNoExtend` to be used optionally and defaulting to
`ring.Write`. Extending the replication set on writes feels like a
footgun for write outages and there's even some feelings internally that
allowing that extension is a bug.

For more background information see:
https://github.com/grafana/mimir/issues/1854

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Signed-off-by: Callum Styan <callumstyan@gmail.com>
k121
Callum Styan 3 years ago committed by GitHub
parent b16ad8cb38
commit 46b40a5582
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 2
      pkg/distributor/distributor.go

@ -369,7 +369,7 @@ func (d *Distributor) Push(ctx context.Context, req *logproto.PushRequest) (*log
streamsByIngester := map[string][]*streamTracker{}
ingesterDescs := map[string]ring.InstanceDesc{}
for i, key := range keys {
replicationSet, err := d.ingestersRing.Get(key, ring.Write, descs[:0], nil, nil)
replicationSet, err := d.ingestersRing.Get(key, ring.WriteNoExtend, descs[:0], nil, nil)
if err != nil {
return nil, err
}

Loading…
Cancel
Save