Mechanism behind QR code scanning of WhatsApp web/desktop app

  • I could not find any answers related to the working mechanism of QR code scanning used on WhatsApp Web.
  • How does the authentication happen when the phone (any smartphone running WhatsApp) scans the QR code on the browser.
  • I don't want to know about the technology stack behind them. Like WhatsApp uses modified version of xmpp, uses erlang, uses web technologies like socket.io and ajax for the web version to implement such functionality.
  • The question might be broad. But I am eager to know about the implementation behind it.

Solution 1:

It works like this :

1- You open the following URL on your browser : https://web.whatsapp.com/

2- The Browser loads the page with all sorts of JS and CSS stuff , but also opens a WebSocket ( wss://w4.web.whatsapp.com/ws ) - Check this image :

enter image description here

2.1- Every 20000 milliseconds you see traffic on the WebSocket for a refresh on the QR code you have on you screen. This is sent by the Server to the Browser, throught the WebSocket (WS we call it from now onwards)

enter image description here

2.2- On each QR Code refresh received on the WS , your browser does a GET request for the new QR Code in BASE64 encode .

2.3 - Notice that this specific WS that the server has open between the Server and the Browser is associated with the unique QR code !!! So, knowing the QR code, the server knows which WS is associated with it!

---- At this stage your browser is ready do the WhatsApp App work , but it does not know what is your ID (Whatsapp identifier which is your mobile number) , because it can't really get you phone number from thin air .

It also does not require you to type it, because the server wouldn't be sure that the number really belongs to you .

So, to let the Servers know that the WS session belongs to a specific phone, you need to use the phone for QR reading

3- You grab your phone, which is authenticated (otherwise you wouldn't have access to the section to scan QR codes) , and do the QR Code reading thing

4- When your mobile reads the QR code, it contacts the WhatsApp servers and tells them : My number is XXXX , My auth creds are YYYYY , and the WS associated with this QR code can now receive my DATA

5- The server now knows that it can direct Traffic to the specific WS socket that belongs to that QR Code, and does so !

6- On the Browser WS you can see the Server sending data regarding the user, regarding the conversations that you are having and which photo thumbnails to go and Grab.

enter image description here

7- The Browser gets this data from the WebSocket , and makes the corresponding GET requests to get the Thumbs, and other resources it needs, like an MP3 for notifications

7.1 - The WS listener on the Browser also makes Javascript calls, on the javascript files that were received at step 1 , to redraw the page DOM with the new interface .

8- The interface is now redraw to look like the WhatsApp app , and you continue to receive data on the WS , and sending when needed, and updates are made to the interface as data is arriving on the WS .

That is it.

Using Chrome, and Developer tools , you can see all this happening live. You can also see the WS communication (most of it, the binary frames you would need another tool ) and see what is happening all steps of the way.

Also:

  • Check a complete Tutorial on this : HERE

  • Source code for the Tutorial : Android Client

  • Source code for the Tutorial : Java Play Server

Solution 2:

It uses something like below.

  1. Whatsapp web application is opened by user via web browser.
  2. Server creates a UNIQUE token (number) and embeds that number in QR-Code
  3. Whatsapp phone application reads QR-Code and decodes token.
  4. Whatsapp phone application sends information about its current user and this newly read token to whatsapp server.
  5. Whatsapp server matches token (+ phone app user information) with web browser.
  6. It automatically authenticates user and open new web page with his/her information on it.