You type a URL into your browser and you see images, videos and text pop up on your screen. Sweet! But how much do you know about what is needed for the page to reach your screen? Let me give you a high level tour into the network events that are happening in the background.
In my first post I mentioned that the internet could technically be implemented using carrier pigeons. For easier visualization, let us continue with that idea. Instead of bits over a wire, our data will be transferred by a pigeon.
In this post we will go through with an example of a HTTP page being requested. In truth, most of the sites in the internet no longer provide a plain HTTP version, only a HTTPS version. With HTTPS, the HTTP content is secured in a way that an eavesdropper cannot see the content. But I will leave that to another post, and this time focus on HTTP.
So, let us begin! Someone recommended a magnificent new blog to us, and gave us the link: https://nyget.in/. Nice. Browser, get me the page!
The browser is happy to fulfill our request, but unfortunately it has no clue where to get the page from. The browser now knows the domain name (nyget.in), but it needs to find out the actual address where the domain is located. Imagine you want to send a letter to someone and you know their name but not their address.
In the internet, these addresses are called IP addresses, and they are presented by a bunch of numbers. The information about which domain name resides behind which IP address is stored in a global system called the Domain Name System, or DNS. As you can imagine, this information can alternate quickly, as new domains can pop up any time, and existing domains can move to a new address. There is no one place which stores all information about all addresses. Instead, the information is spread across the internet in many different name servers.
Now, the browser needs to find out what IP address the domain nyget.in is located at. Luckily it has a friend which lives on the same machine that can help. The friend is called the DNS client.
The DNS client is responsible for finding out what IP address a domain lives at. The DNS client does not know the address for nyget.in yet, but it knows who to ask. It sends a pigeon to the root server to ask where nyget.in lives.
The root server does not know either what the address of nyget.in is, but it does know the address of the name server where the addresses of all domains ending with “.in” is stored. That is great! Our pigeon returns back to the DNS client with this information, and the DNS client again sends the pigeon away, this time to ask the “.in” name server about the whereabouts of nyget.in.
The name server for “.in” is now able to give our pigeon the address where nyget.in lives at. Our pigeon returns victorious and reports the IP address to the DNS client, which in return gives it to the browser.
The browser now knows the IP address of the server where nyget.in is located at. But it cannot just request the information out of the blue – as with any new acquaintance, it will first need to perform a handshake. The browser sends the pigeon away to the server to let it know that the browser would like to form a new connection with it.
The pigeon delivers the handshake request to the server, which happily acknowledges the request and sends a reply with the pigeon to let the browser know that it is pleased to make its acquaintance.
Now we are almost there! The browser sends the pigeon away with two messages, the first one saying that the handshake has been concluded, and the second one asking for the contents of http://nyget.in.
The pigeon finally delivers the contents of the page from the server to the browser. The content is written in a language called HTML, but we do not need to worry about that – the browser knows what to do with it, and transforms it into a nicely formatted page with pretty pictures.
And there we have it!
As promised, this was a big simplification of the process. What I have described here is an iterative DNS query, as the pigeon keeps going back and forth between the DNS client and the DNS servers to get the correct IP address. The pigeon could also have waited while the first DNS server fetches the correct IP address from the other name servers – that would have been a recursive DNS query. Or if someone else had recently asked for nyget.in from the first DNS server, it could have had the IP address still in its cache and it would have been able to give the correct IP address straight away for the pigeon. In addition, the final connection between the browser and the server was a TCP connection, and the handshake performed between the endpoints was a three-way handshake, required for a TCP connection to begin.
Hopefully you got something out of this post, be it a new found understanding of the referenced technologies, or a looming desire to implement a DNS server which works only with carrier pigeons. Thank you for reading, and see you again soon.