How Google Docs Syncs Multiple People Editing The Same Document?
I tried to understand how Google Docs lets multiple people type into the same document at the same time.
No merge dialog. No overwrite warning. No "please refresh, this file changed."
That shouldn't work.
And yet it does.
Two people can type into the same sentence. Both changes show up. The document stays in sync. Nobody loses their text.
If you think about it for more than a minute, it starts feeling wrong. Two users changed the same thing. One of them should win. The other should get erased.
The simple version does exactly that.
Start with the obvious idea
A document is just text, so the simplest sync model is: whenever someone edits, send the whole updated text to the server.
Start here:
helloNow two users open the document at the same time.
User A changes it to:
hello worldUser B changes it to:
hello thereIf both clients send their full state, the server receives two perfectly valid versions:
A -> "hello world"
B -> "hello there"Now the server has to choose.
The simple rule is "last write wins."
This is exactly how most apps work.
And it's exactly why they break.
If A arrives last:
hello worldIf B arrives last:
hello thereEither way, one person loses their edit.
Simple Sync
Flip the arrival order. One valid edit always gets dropped.
Two valid edits
Server result
Lost edit: " there"
Why this is the actual problem
What's tricky here is that neither user did anything wrong.
Both users:
- started from the same state
- made a valid edit
- expected their change to survive
And honestly, a human looking at those two edits would say the same thing: keep both.
So the problem is not "two people typed at once."
The problem is that we're syncing the wrong thing.
State is not enough
The full text tells you what the document looks like after an edit.
It does not tell you what the user meant.
Start with:
hello |Two users type at the same position:
User A types: "world"
User B types: "there"
Now the system has two valid insertions at the same position.
There is no single correct ordering.
The only requirement is:
- both edits are preserved
- all users see the same result
The system is not trying to produce the sentence you expected.
It is trying to preserve the edits that actually happened.
Once you collapse an edit into full state, that intent is gone.
Send operations instead
Instead of sending the whole document, send the edit itself.
User A didn't mean "replace the document with hello world." They meant:
Insert " world" at position 5User B meant:
Insert " there" at position 5That's a much better representation.
Now we're not syncing snapshots. We're syncing operations.
This breaks too
Those operations are better, but they still conflict.
Both users want to insert at the same position:
Insert " world" at 5
Insert " there" at 5Now order matters.
If one machine applies A first and then B as-is, it gets:
hello there worldIf another machine applies B first and then A as-is, it gets:
hello world thereBoth preserved the edits, but now the document diverged. That's just a different failure mode.
The deeper issue is that position 5 only made sense in the original document:
helloAfter one insert lands, the other operation is now talking about an outdated version of reality.
The key realization
State is not enough.
Raw operations are not enough either.
What you need is a way to adjust an operation when the document has changed underneath it.
That adjustment is the whole trick.
Everything else is implementation details.
Transform the operation
Say the system decides A's operation is applied first:
A: Insert " world" at 5
B: Insert " there" at 5After A runs, the document becomes:
hello worldBut B's operation was created against the older document. To preserve B's original intent, the system rewrites B relative to A's change:
Insert " there" at 11Now apply the transformed version:
hello
-> hello world
-> hello world thereBoth edits survive.
Notice what just happened.
The operation didn't change its meaning.
Only its position changed.
OT Transform
The second operation gets rewritten to preserve intent.
Concurrent ops
Transformation
B shifts right because A already inserted 6 characters at the same position.
If the system had chosen B first, it would transform A instead and converge on:
hello there worldEither result is acceptable. What matters is that every participant follows the same rule and ends up with the same final document.
The name for this
This is Operational Transformation, or OT.
All of this happens underneath a real-time communication layer.
In practice, systems like Google Docs use persistent connections, like WebSockets, to send operations between clients and the server as they happen.
The transport makes it feel instant.
But the transport is not what makes it correct.
Even if you send updates perfectly in real time, without transforming operations, concurrent edits would still overwrite each other.
Real-time delivery solves speed.
Operational Transformation solves correctness.
The name sounds more intimidating than the idea.
OT says:
When one operation arrives after another concurrent operation, transform it so it still means what its author originally intended.
Not "pick the newest state."
Not "retry from scratch."
Transform the edit.
Why operations are the right level
Once you think this way, a text editor becomes a stream of tiny actions:
- insert text at a position
- delete text from a position
That sounds almost too small to matter. But this is exactly the right level of abstraction.
If I insert text before your cursor, your operation may need to move right.
If I delete text before your cursor, your operation may need to move left.
If we both delete overlapping text, one delete may need to shrink or disappear.
OT handles these cases by rewriting operations against each other so the document stays consistent and each user's intent is preserved as much as possible.
The real insight
The system is not syncing state.
It's syncing intent.
State tells you what the world looks like.
Intent tells you how it got there.
And only one of those survives concurrency.
A collaborative editor works because it doesn't ask "what does the document look like right now?" It asks "what did each user try to do, and how do we preserve that as other edits arrive?"
Once you frame it that way, real-time collaboration stops looking like magic and starts looking like a careful bookkeeping problem.
Where this connects to diff
This reminded me a lot of the diff rabbit hole.
Diff asks:
Given two states, what changed?
OT asks:
While changes are happening, how do we keep them from breaking each other?
Diff reconstructs the past.
OT protects the present.
In both cases, the hard part isn't the data.
It's the transformation.
Written by Harshit Sharma. If you want to know when new posts are out, follow me on Twitter.