let's talk webrtc

let's talk webrtc

a detailed deep dive into webrtc based on a real project.

·

5 min read

in this blog we are going to discuss about how webrtc works by taking an example of a recent project i made.

you can find the project here : https://github.com/shivamhwp/omegle-webrtc

all the steps on how to setup the project locally are there in the repo.

in the project we have two dirs that are backend and frontend. backend contains the server logic for the application where as frontend contains the client side logic and frontend code.

wait, what's the architecture we use here.

This is a omegle type app which means only two people can connect with each other at a time. so what we do is as the users joins we add them to a queue of users to be matched with each other and two people are randomly picked from the queue array and a websocket connection is established bw them. but they are also removed from the queue so that they don't match each other again.

here's a general overview of the architecture. let's discuss it in detail.

backend

lets explore the backend. as you see here, there is a mangers directory containing UserManager.ts and RoomManager.ts files . UserManger manages the adding of users removing of users, initialising the handlers and clearing the queue. Room Manager manages the room logic i.e creation of room, and how the users get connected.

UserManager.ts

this file is a quite simple to understand. we define a UserManager Class and four methods in the class named addUser, removeUser, clearQueue and initHandlers .

  1. addUser( name, socket{ } ) : adds the user to the users array. their socket id to queue array. fires a "lobby" event. clears the queue and initialise the handlers.

  2. removeUser( name, socket{ } ) : removes user from the users array in case the user get's disconnected.

  3. clearQueue(){ } : as the name suggests, it clears the queue array. so the same two persons are never matched with each other again.

  4. initHandlers(){ } : this method fires three socket events that are offer, answer and addIceCandidate. which calls certain methods in RoomManager class.

let's go to RoomManger.ts to know what they are.

RoomManager.ts

to establish a connection bw the users there needs to be exchange of some information which is done through these methods.

  1. createRoom(){ user1{ } , user2{ } } : creates a room with user1 and user2 object. and emits "send-offer" event from both users.

sdp stands for Session Description Protocol which describes the media streams to be exchanged like codecs etc. and the attributes like certificate hashes for security.

  1. onOffer(){ roomId, sdp , senderSocketid} : identifies the receiving user and emits an offer event with sdp and roomId.

  2. onAnswer(){ roomId, sdp , senderSocketid}: identifies the receiving user and emits an answer event with sdp and roomId.

  3. onIceCandidates(roomId, senderSocketId, candidate, type: "sender" | "receiver"){ } : this is an interesting and crucial one. so to establish a connection bw two users we have to exchange info about their communication methods, connectivity checks etc. this is done through ICE candidate exchange. it also facilitates the offer/answer exchange.

now, before going to frontend code, if you have noticed in the repo that when we create room user1 and user2 both fire the same event which means there are two websocket connections are made bw the users, and why is that. this is to remove a problem called glaring .

if we try to connect two users over a same connection they both have to send their audio/video data through same connection which makes it like a loopback scenario. so two connections are made to send and recieve the data from different connections.

now let's go to the frontend code where it all will make sense.

frontend

in the components folder there are two files names Landing.tsx and Room.tsx

  1. Landing.tsx : here in the Landing component, there is a getCam() function where your microphone and camera permission are requested Media Captue and Streams Web API. the audio and video tracks are extracted from the stream and attached to the video tag. it sends the localAudioTrack, localVideoTrack and name to the room component as props.

  2. Room.tsx : this again is an interesting one. whenever Room Component is rendered, useEffect fires. when the "send-offer" event is received it creates a new RTCPeerConnection (pc) and adds localVideoTrack and localAudioTrack to it. "add-ice-candidate": when pc gathers a ice candidate, this event is fired it sends the ice candidate as a sender to the other peer.

    onnegotiationneeded : this event is fired to start a negotiation process, it creates an offer event and sets it as localDescription.

    offer : when socket recieves offer event from the server i.e from the other peer. it set it as remoteDescription and creates an answer and sets it as localDescription.

    in the setTimeout we get tracks from the pc object and based on its kind we set them as audio and video tracks respectively. at last we attach them to the video tags.

    to get more context about the room.tsx part go through the code here. https://github.com/shivamhwp/omegle-webrtc/blob/main/frontend/src/components/Room.tsx

Limitation of this architecture :

  1. why p2p is not good at scale

    for e.g if you are connected to 10 people on a call, you have to take the video/audio data from all the people and also send your audio/video to all of them which makes it very network inefficient. there are lot of other factors such as performance degradation, security risks etc.

    thats why server are introduced for scaling this architecture which provide centralise management, reduced network congestion etc.

interesting:

DePINs : it stands for Decentralised Physical Infrastructure Networks.

as you know that in DeFi, the nodes used for evaluating transactions, which are managed by group of individuals and they are rewarded in the form of cryptocurrencies. Same as this, rather than using centralised servers ( like google's or microsoft's ) dRTC gravitate towards using individuals as these server/node operators to process the data. which makes it decentralised and secure. it surely is an interesting concept to think about.

thanks for reading.

more about me : shivam.ing