mirror of https://github.com/grafana/loki
Hardcode ring.WriteNoExtend for distributor push operations (#7517)
With zone awareness enabled we allow restarting multiple ingesters within a single zone at the same time, this means that for many more write requests there's potentially only 2 of the 3 replicas for a stream up at the time the write request is received. This lead to a large increase in 500s during rollouts. My suspicion is that the 500s on rollouts was always related to the following; the replica set ring code has something referred to as "extending the replica set". This essentially is the adding of ingesters in the ring that are not in an ACTIVE state (for example, are in LEAVING) to the set of replicas considered valid for an operation. Adding of the instance in a state other than ACTIVE can be controlled by using a different operation type, which Mimir was using but we were not. Here are the relevant functions, [ring.Get](k1218d6d914ef6/ring/ring.go (L335-L403)
) (particularly these lines [1](8d6d914ef6/ring/ring.go (L381-L389)
) and [2](8d6d914ef6/ring/ring.go (L394-L397)
)) and [replicationStrategy.Filter](789ec0ca4a/ring/replication_strategy.go (L22-L71)
) (particularly these lines [1](789ec0ca4a/ring/replication_strategy.go (L32-L36)
) and [2](789ec0ca4a/ring/replication_strategy.go (L54-L67)
)) The tl; dr of the above code snippets is that we seem to always have required a minSuccess from replicationFactor/2 + 1 except we overwrite replicationFactor to be the # of instances which is likely always going to be more than just 3 ingester replicas because of the replica set extension. At least for zone aware rollouts this seems to be the case. This is because many ingesters will be in the LEAVING state. We can avoid this by just using `WriteNoExtend` operations instead of `Write` operations ([here](789ec0ca4a/ring/ring.go (L83-L93)
)) In this PR I've hardcoded our call to the ring in `distributor.Push` to always use `ring.WriteNoExtend`. IMO this is the better option to allowing `ring.WriteNoExtend` to be used optionally and defaulting to `ring.Write`. Extending the replication set on writes feels like a footgun for write outages and there's even some feelings internally that allowing that extension is a bug. For more background information see: https://github.com/grafana/mimir/issues/1854 Signed-off-by: Callum Styan <callumstyan@gmail.com> Signed-off-by: Callum Styan <callumstyan@gmail.com>
parent
b16ad8cb38
commit
46b40a5582
Loading…
Reference in new issue