mirror of https://github.com/watcha-fr/synapse
Add developer documentation to explain room DAG concepts like `outliers` and `state_groups` (#10464)
parent
a6ea32a798
commit
2bae2c632f
@ -0,0 +1 @@ |
||||
Add some developer docs to explain room DAG concepts like `outliers`, `state_groups`, `depth`, etc. |
@ -0,0 +1,79 @@ |
||||
# Room DAG concepts |
||||
|
||||
## Edges |
||||
|
||||
The word "edge" comes from graph theory lingo. An edge is just a connection |
||||
between two events. In Synapse, we connect events by specifying their |
||||
`prev_events`. A subsequent event points back at a previous event. |
||||
|
||||
``` |
||||
A (oldest) <---- B <---- C (most recent) |
||||
``` |
||||
|
||||
|
||||
## Depth and stream ordering |
||||
|
||||
Events are normally sorted by `(topological_ordering, stream_ordering)` where |
||||
`topological_ordering` is just `depth`. In other words, we first sort by `depth` |
||||
and then tie-break based on `stream_ordering`. `depth` is incremented as new |
||||
messages are added to the DAG. Normally, `stream_ordering` is an auto |
||||
incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement. |
||||
|
||||
--- |
||||
|
||||
- `/sync` returns things in the order they arrive at the server (`stream_ordering`). |
||||
- `/messages` (and `/backfill` in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`. |
||||
|
||||
The general idea is that, if you're following a room in real-time (i.e. |
||||
`/sync`), you probably want to see the messages as they arrive at your server, |
||||
rather than skipping any that arrived late; whereas if you're looking at a |
||||
historical section of timeline (i.e. `/messages`), you want to see the best |
||||
representation of the state of the room as others were seeing it at the time. |
||||
|
||||
|
||||
## Forward extremity |
||||
|
||||
Most-recent-in-time events in the DAG which are not referenced by any other events' `prev_events` yet. |
||||
|
||||
The forward extremities of a room are used as the `prev_events` when the next event is sent. |
||||
|
||||
|
||||
## Backwards extremity |
||||
|
||||
The current marker of where we have backfilled up to and will generally be the |
||||
oldest-in-time events we know of in the DAG. |
||||
|
||||
This is an event where we haven't fetched all of the `prev_events` for. |
||||
|
||||
Once we have fetched all of its `prev_events`, it's unmarked as a backwards |
||||
extremity (although we may have formed new backwards extremities from the prev |
||||
events during the backfilling process). |
||||
|
||||
|
||||
## Outliers |
||||
|
||||
We mark an event as an `outlier` when we haven't figured out the state for the |
||||
room at that point in the DAG yet. |
||||
|
||||
We won't *necessarily* have the `prev_events` of an `outlier` in the database, |
||||
but it's entirely possible that we *might*. The status of whether we have all of |
||||
the `prev_events` is marked as a [backwards extremity](#backwards-extremity). |
||||
|
||||
For example, when we fetch the event auth chain or state for a given event, we |
||||
mark all of those claimed auth events as outliers because we haven't done the |
||||
state calculation ourself. |
||||
|
||||
|
||||
## State groups |
||||
|
||||
For every non-outlier event we need to know the state at that event. Instead of |
||||
storing the full state for each event in the DB (i.e. a `event_id -> state` |
||||
mapping), which is *very* space inefficient when state doesn't change, we |
||||
instead assign each different set of state a "state group" and then have |
||||
mappings of `event_id -> state_group` and `state_group -> state`. |
||||
|
||||
|
||||
### Stage group edges |
||||
|
||||
TODO: `state_group_edges` is a further optimization... |
||||
notes from @Azrenbeth, https://pastebin.com/seUGVGeT |
Loading…
Reference in new issue