Ivan Milic - Networks expert Ivan Milic CEO Ivan Milic

Top Menu

Nat Traversal fundamentals

NAT history

Internet pioneers in early days of the internet (before mid. 1990) designed network architecture and protocols that are still used today. They had in mind network should be able to provide end-to-end connectivity between any of the host connected to the internet as one of the primary uses.

Drastic expansion of network resulted in public IP address and become a valuable resource. End-to-end connectivity also had security issue because you directly expose your computer/device to potential abuses. Because of this reasons in mid. 1990 NAT (network address translation) devices become popular. Today their presence is common in our houses and offices.

NAT devices (routers) enable multiple devices in the local network to share single IP making internet easily distributable to any computer/device in home or office network.

NAT also provides a basic shield to the local network from possible outside attacks. It is because they don't let any traffic from outside reach computer/device on the local network unless device/computer in the local network initiates the connection by sending the request to remote server. NAT keep records of requests and lets back data from remote service only if the local client has initiated data transfer. Routers commonly also have a built-in firewall that further increases security protection by router device.

This is great, but what about End-To-End connectivity. Original concepts had flaws to use it without consequences but no dough we need it and it's irreplaceable for some sorts of communication.     

NAT device controllable port mapping by clients

Utilities to enable End-To-End connectivity where added to NAT (router) devices. Protocols that enabled client applications to request port mapping from router such that all data targeting mapped port reaches client that requested mapping. These protocols are:

  UPnP  - most common, port mapping creation using universal plug and play protocol (based on XML messages)

  NAT PMP (Nat address translation port mapping protocol) - old protocol you would really found in your router. It had intensive use in some large AIRWAY companies. 

  PCP (Port control protocol) - proposed by Apple latest protocol to be adopted as RFC standard. It is compatible with NAT PMP protocol. You can commonly find it in Apple's NAT devices (newer).

Presently (2013) you will most likely find UPnP in your router settings, unfortunately, disabled by default. It is because if it is enabled, can be abused by viruses and trojans if there are already present on your computer. It is expected that you enable one of this only if you need it, like in situation when you want to play some online multiplayer game.

Also, these solutions have sense only if your router has public IP. It is becoming rare these days because internet providers tend to share public IP between several users. Also, it's not rare they use devices and routing software that can full your WAN device that it's on public IP.

So, if you design some software that only use these methods to create direct connections you will have maybe in best case 10% users that can use it. That if you explicitly tell them they should enable particular protocol. This stands for all devices accessing internet generally. If your users are people behind home routers that play some multiplayer online game over the computer, most likely they will have UPnP enabled because at least some friend will help them configure it so maybe you will have 20% usability in this case.

Traversal using Intended NAT Table manipulation

So we see port mapping protocols can be used only in a small number of cases. What can we do now? We can cheat our NAT device to create the mapping by sending some packet to the remote client. Then instructing the remote client to send the packet that looks like response (matched source address and port) to us. If all ok and record in NAT mapping table are matched (like we are relay lucky then), the packet from the remote client will reach us. This would be an explanation of oldest known technique of NAT traversal using intended NAT table manipulation referred as "UDP Hole punching". Earlier, this technique was really usable and had a great success rate. TCP connection could be even created after using same ports and even TCP hole punch was fairly successful.

UDP Hole punching become even more usable when STUN technique was invented (Cornel university). STUN is used to learn if the computer is behind NAT, NAT behavior and ports that router with public IP mapped as external. (Every NAT device can change packet source port, change is recorded in the table so it knows how to modify response packets source port). They recognized  four observable classifications of NAT behavior:

  1. A full cone NATis one where all requests from the same internal IP address and port are
    mapped to the same external IP address and port. Furthermore, any external host can send

    apacket to the internal host, by sending a packet to the mapped external address.
  2. A restricted cone NATis one where all requests from the same internal IP address and
    port are mapped to the same external IP address and port. Unlike a full cone NAT, an external
    host (with IP address X) can send a packet to the internal host only if the internal host

    had previously sent a packet to IP address X.
  3. A port restricted cone NATis like a restricted cone NAT, but the restriction
    includes port numbers. Specifically, an external host can send a packet, with source IP
    address X and source port P, to the internal host only if the internal host had previously

    sent a packet to IP address X and port P.
  4. A symmetric NATis one where all requests from the same internal IP address and port
    to a specific destination IP address and port, are mapped to the same external IP address and
    port. If the same host sends a packet with the same source address and port, but to

    adifferent destination, a different mapping is used. Furthermore, only the external host that
    receives a packet can send a UDP packet back to the internal host.

Since router (NAT) design is not standardized in this terms it quickly became clear that this classification is not valid, because it can lead us the wrong way. Probably classification was "more valid" in time when invited because it's based on empiric conclusions, but eventually due new NAT designs become outdated. Commonly when you use some STUN testing client on your computer behind NAT to test if you can get 4 different results for NAT classification of the router. So this simply cannot be taken as valid information. What you can use is the fact that if you get any of above four results you can be sure there is NAT present. Mapped ports you get from responses are also usable because they will tell you most probable area of value next mapping will take. (Note that even if there is NAT sometimes very rarely STUN query may tell you that you are behind open internet - public IP)

Why did this techniques become outdated? Unfortunately, security administrator and us, IT engineers developing NAT traversal software, are in constant struggle. We are basically both right and wrong. They claim NAT traversal is used only by crackers and pirates. We claim NAT traversal is sometimes needed and security level does not degrade. It is so because NAT traversal finds some random port for communication and does application specific data transfer. Even if abuser manages to guess one of 65536 ports his data will get into some application process. That will throw an exception because of false data or simply break. So, speaking about security while transferring data using direct channels is far far far... more secure than using the intermediate server. Communicating with servers is less secure than communication with some host directly. This is because servers are well known and they are subject to crackers attacks. Also, client-server communication is commonly based on well-known protocols. That is also suitable for injecting entity of arbitrary code. Besides all that you can never be sure someone from cloud hosting company does not pick at your data.

To return to our story, later NAT devices and networks are not so thankful for NAT traversal. It is because some engineers design routers that recognized it as a security threat. TCP traversal is almost impossible unless you have the ability to use raw sockets which is unpractical because most OS-s enforce high-security rules for their use or even don't support it. TCP uses 3 step handshake involving packet number, session number, packet type.... and in most cases you need to match all of them to trick your NAT device not speaking of possibility that ICMP packet of type "Destination Port Unreachable" resets your try. Basic UDP hole punch will work in the small number of cases. Usable if you have the router from some quality company like Cisco (Linksys) or NETGEAR, chances of success are greater. It is because their engineers design devices that are better and they recognize the need so they properly design their devices. For example STUN and basic UDP hole punch will be enough to traversal Linksys router NAT. Linksys will preserve source original port if possible or will take some near-by value so NAT traversal on such quality router is fairly easy.

Modern-day method of NAT traversal by intended NAT table manipulation should involve next external ports prediction, price packets TTL manipulation which is a key factor in cases of symmetric and port restricted cone NAT, multiple retries, and side swap. As we already said NAT behavior is not standardized so designing good "piercing" method involves a lot of testing on different NAT consultations between peers so good routine could be designed based on empiric conclusions. 

Final NAT traversal solution

Industrial standard NAT traversal solution should be able to apply all possible methods mentioned in above texts. It should inspect network environment of both peers and decide which method of NAT traversal should be applied. If one method fails it should be able to try other methods or repeating swapping peer sides. If neither methods of direct tunnel creation succeed, the relay will be the last solution that we know for sure that must work. It is because it's based on a standard client-server model. The relay is the most expensive resource of peer-to-peer system network. Having better success with direct tunnel methods will make the system more flexible and cheap to maintain. The simple calculation to demonstrate this:

Let's say we have one server having connection bandwidth 100Mb/s. We want to support our Video-Over-IP application which requires let's say 500KB/s per peer pair = 4000Kb = 4Mb.

We want to have quality service guarantying 500KB/s for each peer-to-peer session under any conditions.

If we don't use NAT Traversal we will be able to support 100/4 = 20 sessions at once per server.

If we use NAT traversal and percentage of all tunnels made by the relay is 5% we will be able to support (100/4) + 95%/5% * (100/4) = 400 sessions at once per server.

Also, if we get users over the proposed limit quality of our service will degrade in 20 times slower rate on NAT traversal equipped system.

So the cost of the system equipped with NAT traversal would be about 20 times less than pure relay system.

This is just one advantage of NAT traversal equipped peer-to-peer systems. 

Direct Peer-To-Peer VS Cloud

Peer-to-peer and cloud systems (virtualization) are by their nature totally different things. In quite a few cases you may use either to achieve the same thing. In this cases, you may wonder what to chose for your implementation. So, we will talk about key differences, advantages, and disadvantages in the following text.

Virtualization (most common use of cloud computing) abstracts the physical infrastructure, which is the most rigid component and makes it available as a soft component that is easy to use and manage. In referral to peer to peer systems, we will focus just on uses of clouding suitable for comparison.

The most common thing you could do every day over peer-to-peer and cloud would be file transfer. There are quite a few services offering this use like dropbox, google drive... 

You transfer files by uploading them first from client A to intermediate cloud server S. Then these files are available for download from clients B, C, D... until you explicitly delete them from S. Key advantages of this implementation are that clients A, B, C, D... are not required to be active in the same time. Cloud storage practically serves as a shared network disk. You can download that files wherever and whenever you want if you have an internet connection.

But unfortunately, there are many down-sides. Most important are security and privacy of your data. Anyone anytime can access your data if he has your username and password. Also, fact that some employee in cloud hosting company will not peek at your data can not be guaranteed in another way that by company promise which is also not relevant refiring to particular employees. Also, we could mention organized government surveillance programs that could bother you much if you are some other government protecting the sovereignty of your country. Good information is the mightiest weapon these days. Also, imagine you hold 10.000.000,00$ expencive software source code on such server. Or you are public figure storing there some media material that could compromise you if exposed. That would not be recommended for sure.

Also, intermediate servers storing this cloud data are known places exposed to attacks. If someone wants to observe your data, he will know the right place to look for it. It is because it's concentrated in one single place. In the end you might finish with apologies from cloud hosting company.

Cloud file storage systems often limit your free storage space. At the end they need physical disks space to store your data on their servers and that costs money. This is overcome by subscribing to some paid plan that helps them cover storage expenses.

One thing in which cloud system could never compare to peer to peer systems is real-time communications. Since cloud if far easier to implement than peer-to-peer systems there where some attempts to implement cloud streaming. Such systems result in poor performance and enormous cost. Simply intermediate server becomes hot spot all clients communicate with so total bandwidth is shared between all clients. Peer to peer systems overcome this by skipping intermediate hot spots. They communicate directly so that has no impact on servers.

Encrypted peer-to-peer communication tunnel (direct tunnel created using nat traversal) is most secure and private way of transferring data between two hosts. These are some of the facts that earn it that:

- The tunnel is stealth to monitoring/observation/surveillance systems because it happens on one (destination port) of 65535 ports that is randomly chosen during traversal operation and its existence is very hard to be recognized. Monitoring/observation/surveillance systems usually track some well-known ports you use every day for common client-server communication like 80, 443, 25, 22, 23, 995 ... where they also expect certain data transfer protocol based on port value.  

- Secure encryption keys generated in short period are totally secret to the 3rd party. With clouding your keys may be half-exposed because you can not be sure attacker is monitoring server and that he is not aware of one part of the key. 

 In above text, we focused on most important differences. That fact is important to note if you are designing some system that is required to provide high standards for data security and privacy or quality real-time communication between a large number of peers. Usually, cloud storage may be a handy and fast solution for some everyday small-scale solutions that serve a small number of people. 

In some cases, you even may combine cloud virtualization with peer-to-peer system to get the best result.

+ Most important thing cloud virtualization gives you is always accessible data

+ Most important thing peer-to-peer system gives you is secure and totally private real-time communication  


Nat Traversal/Peer-To-Peer system

In this text we will focus on key elements one general purpose peer-to-peer system must have.
Mhe main thing about peer-to-peer world is certainly communication channel between two hosts (peers). But to get to that point there must be some way those two can find each other.

Or in some cases, peer X may be interesting to peer Y for connection establishment. That only if he can provide certain relevance to peer Y so there also must be an option to publish some meta-data about peer that others can look up.

Peer X may want to refuse connection to peer Y for some reason so there also must be some sort of negotiation before the tunnel is made.

Also, Peer X and Peer Y may want to communicate in totally secure and private manner so data encryption and secure key exchange may come very handy in those cases.
So we need to provide these required abstract functionalities:

- NAT traversal for direct tunnel creation. Also is some small number of cases (~ 2 - 3%) nat traversal may fail so there must be relay service that will handle situations like this. The relay is most expensive part of the system so designing good NAT traversal routing is the key factor of the quality peer-to-peer system.

- Instant messaging for negotiation and other control or short data messages

- Peer lookup by unique identification and/or published metadata

- Peer status notification for all other peers in relation  

Add-on functionalities most app will find a use:

- Secure key exchange and data encryption

- Virtual user/networks/membership service that is closely aware of states of peers

  This system should give the ability to permanently store meta-data about users, networks and other need abstract objects. This meta-data should be searchable and editable.
So let us now think what services we would need to provide to be able to support all these features.

- We need some service to which peers will report their presence and status. This service must be able to provide peer with all necessary information it needs. This service must be equipped with an instant messaging system that is able to instantly notify peer about some changes on network relevant to him or to carry instant messages from one peer and deliver them instantly to another peer. Since this is the place where information about active peers is available, this system also should provide peer lookup by the unique identifier or some searchable meta-data. We will call this service "CHECKPOINT" in further text.
Technically communication between peer and CHECKPOINT should be UDP based and here is why:

- CHECKPOINT is expected to receive and send a massive number of short messages to/from peers

- Each peer must be always accessible and able to receive notification, so with UDP this is easily achieved. When the peer sends packet first time to CHECKPOINT trough his NAT device, CHECKPOINT may have around XXs lasting permission from NAT device to send some message back. If both peer and CHECKPOINT would be inactive for more than X sec, NAT would close gate and notification messages arriving from CHECKPOINT would be thrown as unsolicited. So peer should send keep-alive each ~ X/2 sec in order of maintaining NAT table so CHECKPOINT can send a message to peer anytime. Achieving this using TCP would be much less convenient and would make unnecessary data transfer. Also, servers usually have a limitation on a number of concurrent TCP connections so that would also produce much faster performance degrade.   

- When two peers decide to establish direct communication channel NAT traversal operation should be invoked. NAT traversal is a complex operation involving many steps where each new step depends on the result of previous ones. So we introduce new service that we will call "HANDSHAKE" in further texts. Handshake purpose is to synchronize NAT traversal operation steps between two peers. When they want to create tunnel, CHECKPOINT will direct them to one of all available HANDSHAKE services to manage Nat traversal operation. HANDSHAKE procedure uses STUN technology to decide which methods are best to be applied in order of tunnel opening. So we need a pair of STUN services to be available for HS. In some rear situations 2%, especially if one peer is behind some tight security corporate, network tunnel opening will fail using NAT traversal in such cases we can turn to relaying that will always work. The relay is a most expensive resource in the system so it's crucial that HS does it procedure well so we minimize relay usage. To be able to guarantee 100% connectivity relay service must be a part of the system so it could handle 2% of tunnels that Nat traversal was unable to create. Common standardized technology that was intended for relaying was TURN and you will find it in most or readings on the internet and as RFC standard. Systems like quickP2P do not use TURN. Instead it uses raw relay. It is because the resulting object of operation is common socket that you will use as using a socket in any other case. Also, some networks deliberately detect and refuse TURN packets. TURN brings much overhead because of packet info data that is sometimes larger than actual "real" data. All for this reasons raw relay was picked by quickP2P engineers to handle 2% of tunnels that Nat traversal was unable to create.

So until now, we described all components that would be needed for the basic peer-to-peer NAT-traversal system: CHECKPOINT, HANDSHAKE, STUN and RELAY service. Commonly every modern application requires storage of some permanent data like user profiles, user groups with their meta-data, device data etc... So if you would want to develop some application that has user accounts and does some sort of data exchange between users, you would certainly need this. You could provide web services of your own. On other hand, imagine you have the ability to store meta-data permanently and the system that does storage is more closely aware of the availability of peers. That would certainly be more convenient so we introduce INDEX service. Index purpose is to do storage operations for permanent meta-data that will be available no matter if the peer is online or not.

Having all this we would be ready to create an out-of-box peer-to-peer application with no need for additional web services.  



Direct Peer-To-Peer vs WebRTC

Lately technology called WebRTC become popular. It is mostly because it can be used from JavaScript and it's easy to implement.

WebRTC works like this:

- Client/Host opens and maintains a session with website equipped with WebRTC service. If he wants to communicate with another peer then that other peer also needs to have an active session with the same site so webRTC service could create data-bridge between them. Also, it is possible that multiple servers work together. In that case, peers could connect with any site that is part of webRTC network. Then if two peers want to communicate and they are not served by the same server, servers will create server-to-server data channel to carry this peer-to-peer communication. Basically, it could be compared with direct peer-to-peer systems that are set to always use just relay technique, and relaying is one thing direct peer-to-peer systems avoid for all cost because of its most expensive system resource.

BebRTC is easy to implement and it's available and friendly for people that are involved just with web development (the majority of developers). Because of that, webRTC become fairly popular. It enables a simple way of having peer-to-peer capabilities with a web page using just JavaScript ajax.

WebRTC is set of techniques that were present long time ago, packed for use by web developers. WebRTC does not bring much technical advantage. Data passes servers which is insecure regarding privacy concerns. The number of active users is limited by total servers permeability. You need enormous investments in a system powered by technology like this, all data must pass through servers. Also, it's TCP based which automatically limits a thing to 100 socket connections (peers) per network adapter on a server before degradation begins. Usually its good for things like web page chat or small file transfers. But if you try to make something more serious that requires more intense data channel you will find your self-trapped.

So, the conclusion is that webRTC is just simple technology intended for easy use by web developers working on projects intended for some small-scale use.