Troubleshooting
This page lists common symptoms with targeted diagnosis steps. Start from your symptom and follow the checks in order.
Clients cannot connect at all
Section titled “Clients cannot connect at all”Symptoms: Browser WebSocket error, net::ERR_CONNECTION_REFUSED, or immediate close before any Phoenix messages.
Checks:
-
Is Mist listening? Confirm your HTTP server started without error.
mist.serveormist.serve_sslreturns aResult— make sure you handleError. -
Path mismatch. The Phoenix JS client appends
/websocketto the socket path you pass:new Socket("/socket", ...) // → connects to /socket/websocketYour transport config must match:
mist_transport.default_config("/socket/websocket")Raw WebSocket clients (non-Phoenix) connect directly to the path with no suffix.
-
on_connect rejection. If you configured
with_on_connect, a returningError(Nil)sends an HTTP 403 before the upgrade. Check your auth logic and incoming headers. -
Reverse proxy not forwarding upgrade headers. See Reverse proxy / nginx below.
Client connects but joins are never acknowledged
Section titled “Client connects but joins are never acknowledged”Symptoms: Phoenix JS client hangs in "connecting" or "joining" state; no phx_reply received.
Checks:
-
Is the channel registered?
beryl.registermust be called before any client connects. Confirm the pattern matches the topic the client is joining:// Pattern "room:*" matches "room:lobby", "room:42", etc.// Pattern "room:lobby" matches ONLY "room:lobby"beryl.register(channels, "room:*", my_channel.new()) -
Is
beryl.Channelspassed to the transport? Themist_transport.upgradecall must receive the samechannelsvalue that was registered against:use <- mist_transport.upgrade(req, channels.coordinator, config) -
Join callback panics or crashes. A panic in
jointerminates the coordinator actor. Under unsupervisedberyl.start, the coordinator dies and no more joins are processed. Useberyl/supervisor.startso the coordinator restarts, then fix the panic. -
Topic segment mismatch.
"document:*:ops"uses segment wildcards — each*matches exactly one colon-delimited segment."document:tenant-a:sub:ops"would not match because there is an extra segment. Verify withtopic.parse_patternandtopic.matches.
Messages sent from the client are not received
Section titled “Messages sent from the client are not received”Symptoms: handle_in is never called; no reply or push received.
Checks:
-
Did the client successfully join?
handle_inis only called after a successfulphx_join. If join was rejected, no further messages are delivered. -
Rate limits dropping messages. If
with_message_rateorwith_channel_rateis configured and the client is sending faster than the limit, excess messages are silently dropped. Check your rate limit values or add server-side logging inhandle_in. -
Event name mismatch.
handle_inreceives the raw event string. Verify the client sends the exact event name your handler expects.
Broadcasts are not received by clients
Section titled “Broadcasts are not received by clients”Symptoms: beryl.broadcast is called server-side but connected clients do not receive the event.
Checks:
-
Topic string must match exactly.
beryl.broadcast("room:lobby", ...)delivers only to sockets subscribed to the exact topic"room:lobby". Wildcard patterns are for routing incoming messages, not for targeting broadcasts. -
Client has not joined the topic. A socket must have successfully completed
phx_joinfor the topic before it receives broadcasts on that topic. -
Single-node vs. multi-node. Without PubSub, broadcasts are local to the node. If your deployment runs multiple BEAM nodes, configure PubSub:
let assert Ok(ps) = pubsub.start(pubsub.default_config())let config = beryl.default_config() |> beryl.with_pubsub(ps) -
broadcast_from excluding the wrong socket.
beryl.broadcast_fromexcludes the socket whose ID you pass. Verify that the socket ID matches the sender.
Presence is stale or incorrect
Section titled “Presence is stale or incorrect”Symptoms: presence.list returns entries for users who have disconnected; joins/leaves are not reflected.
Checks:
-
untrack_allin terminate. Callpresence.untrack_all(p, socket.id(socket))from yourterminatecallback:fn terminate(_reason, socket) -> Nil {presence.untrack_all(presence, socket.id(socket))}Without this, presence entries for the disconnected socket remain in the CRDT indefinitely.
-
Cross-node sync. If running multiple nodes, each node must be configured with the same PubSub instance and each presence actor needs a unique replica ID. The CRDT merges state over PubSub; without PubSub, nodes have independent state.
-
on_diffnot broadcasting. If clients rely on receivingpresence_diffevents, confirmon_diffis configured and callsberyl.broadcast_presence_diff. See the Presence guide. -
CRDT compaction. The CRDT can accumulate causal history. Call
presence.compact(on the state layer) if memory usage grows unexpectedly over a long uptime.
Authentication failures
Section titled “Authentication failures”Symptoms: All clients get 403 on connect, or all joins are rejected.
Checks:
-
on_connectbug. Add logging to youron_connectcallback to confirm tokens are being extracted correctly from headers/query parameters. -
Token validation error. Check that your token validation logic handles expired or malformed tokens gracefully and returns
Error(Nil)rather than panicking. -
Join handler returning JoinError for all. Log the
payloadargument injointo confirm the client is sending the expected shape.payloadarrives asgleam/json.Json(already decoded from the raw frame).
Heartbeat disconnects
Section titled “Heartbeat disconnects”Symptoms: Clients are disconnected after a period of inactivity; terminate is called with HeartbeatTimeout.
Checks:
-
Client heartbeat interval vs. server timeout. The Phoenix JS client sends heartbeats every 30 s by default. The beryl default server timeout is 60 s, which gives a safe margin. If you've lowered
heartbeat_timeout_ms, ensure the client interval is at least half the server timeout. -
Load balancer idle timeout. Some load balancers (AWS ALB, nginx) have their own WebSocket idle timeouts. Set the load balancer timeout to be longer than the client heartbeat interval, or configure load-balancer-level keepalives.
-
Network interruption. Mobile clients behind NAT may lose the WebSocket connection without a TCP close. The Phoenix JS client detects missed heartbeat replies and reconnects automatically.
PubSub cluster issues
Section titled “PubSub cluster issues”Symptoms: Broadcasts do not propagate across Erlang nodes; presence state diverges.
Checks:
-
Nodes are clustered. beryl PubSub uses Erlang
pg, which requires Erlang distribution. Confirm nodes can reach each other:Node.list()in the Erlang shell should return connected nodes. -
Same pg scope. All nodes must use the same
pgscope name.pubsub.default_config()uses the default scope. If you customized it, make sure all nodes use the same value. -
broadcast_from exclusion is per-coordinator.
beryl.broadcast_fromexcludes the socket on the originating coordinator. On remote nodes, all sockets subscribed to the topic receive the message, including (if any) a socket with the same ID on a different node. This is expected behavior.
Rate limiting is unexpectedly aggressive
Section titled “Rate limiting is unexpectedly aggressive”Symptoms: Clients receive partial message delivery; high-frequency operations are silently dropped.
Checks:
-
Check burst values. The
burstparameter sets the token bucket capacity. If burst is too small, a legitimate burst of messages (e.g., on reconnect) exceeds the limit. -
message_rate vs. channel_rate.
message_rateis per-socket total;channel_rateis per-socket-per-topic. If a client joins many topics,message_ratelimits across all of them whilechannel_ratelimits each topic independently. -
No error is sent to the client. Rate-limited messages are dropped silently. If you need clients to know they were limited, implement application-level feedback in
handle_in.
Reverse proxy / nginx
Section titled “Reverse proxy / nginx”WebSocket upgrades require forwarding the Upgrade and Connection headers. A minimal nginx configuration:
location /socket/websocket { proxy_pass http://localhost:4000; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade"; proxy_set_header Host $host; proxy_read_timeout 86400s; # Long timeout for persistent connections}Without proxy_http_version 1.1 and the upgrade headers, nginx downgrades to HTTP/1.0 and the WebSocket handshake fails. proxy_read_timeout should exceed your client heartbeat interval to avoid proxy-side idle disconnects.
Coordinator crash / no messages processed
Section titled “Coordinator crash / no messages processed”Symptoms: All WebSocket operations stop working; the coordinator is unresponsive.
Checks:
-
Are you using
beryl.start(unsupervised)? A panic in any callback kills the coordinator actor. Switch toberyl/supervisor.startso the coordinator is automatically restarted. -
Panic in a callback. Gleam's
assertexpressions panic on mismatch. Audit yourjoin,handle_in, andterminatecallbacks forlet assertexpressions that may fail on unexpected inputs. -
After restart, clients must rejoin. A restarted coordinator has no socket state. Connected clients will see their WebSocket close (or stop receiving replies) and the Phoenix JS client will reconnect and rejoin automatically.