TL;DR
- I've had an incident with SSE that's caused real client pain, I detail it below
- Server Sent Events are a long established and recommended method of pushing from the server to the client
- Many articles proclaim their benefits like automatic reconnection including catching up on missed messages, being easier and more reliable than sockets etc
- Buried down in the spec, in the fine print, is something that renders them completely unreliable in uncontrolled environments (apps and public websites that must be generally available)
- Don't expect your events to get delivered any time soon on some corporate or older networks
Fire! Fire!
I hate feeling like an idiot, you know the score, I released a version of our software, automated and manual tests say it's fine, scalability tests say it's fine. Then 10 days later a client account manager says "XYZ Corp" are complaining that the software is "slow" to login.
Ok you think, slow to login let's take a look. Nope, nothing slow, no undue load, all the servers operating well. Hmmmm.
Client reports it's still "very slow" to login. Ok, eventually I think to ask "how slow?" - 20 minutes - wooooo - 20 minutes isn't "slow" to login, 20 minutes is basically utterly f**ked.
We look everywhere, everything is fine. Their account is fine. It must be a network thing and sure enough it is.
Since launch we've used Server Sent Events for notifications to the client, but recently we move to using it for more -> basically we send requests to the server which return immediately that they are enqueued and then the results pitch up later via a server event. It was a major clean up of our process and much faster and massively scalable. Except our events were never ever getting delivered for a small number of clients.
"Our events were never getting delivered for some clients" - oh crap, architectural failure, a reason for sporadic bugs has suddenly escalated into a priority 1, my underwear is on fire, cock up.
What's happening is this - something between our servers and the client's computer is screwing up the events, holding them forever. The reason it ever works is that every few minutes it reconnects and the "reliability" of SSE means we get the messages that were swallowed.
Cue a bunch of devs and devops scouring the internet for what is actually happening, what we forgot and what we need to set to make this work. The answer is bad news: there is nothing we can do!
The problem with SSE
The issue is this. SSE opens a stream to the client with no content length and sends packets down it when they become available. Here's the rub though, it uses Transfer Encoding which only guarantees the method of delivery to the next node in the chain.
Any old proxy in the middle of the connection between your server and the client's device can legally just store all those packets up and wait for the stream to close before forwarding them. It will do that because it sees no Content-Length and thinks - my client want's to know how big it is. Maybe the code predates text/event-stream that needs no Content-Length, who knows, but they are out there and they're gonna steal your lunch money.
Yep, this is all spec and legal and there is no header you can send to disable it. You say "I wanna send it all" the next node in the chain just overrides that and says "I think I'll chunk this until it's done".
Sure you can disable it on NGINX (one hop from your server) but who knows what out there just broke your app. Bottom line is, if you don't control the network infrastructure you can't rely on SSE.
Bummer.
Pushing to a client
Ok so there are basically 4 ways of getting "uninitiated" requests from a server:
Method | Description | Comments |
---|---|---|
Websockets | Bidirectional communication between client and server - an open "socket" on each end to receive information. |
Sockets are great when they work, getting them to work stably is difficult and we can use libraries like socket.io to help us, which uses a number of techniques (like Long Polling) when a socket isn't available.
|
Server Sent Events | One way push communication from the server to the client with an always open connection. |
|
Long Polling | Client opens a connection to the server which waits until it has messages and send them. Client immediately opens a new connection. Feels like always available to push from server. |
|
Polling | Naive method, client requests events on a regular basis. |
|
Here is one of the articles we used when initially deciding on using SSE
Conclusion
We've now rewritten our layer to use Long Polling so we can have consistent performance in the normal environments our software operates on (the internet and some very old corporate and industrial networks). It works. I wish I'd known the limitations of SSE before - but only found one paragraph in the spec very very late in the day.