NAT history
Internet pioneers in early days of the internet (before mid. 1990) designed network architecture and protocols that are still used today. They had in mind network should be able to provide end-to-end connectivity between any of the host connected to the internet as one of the primary uses.
Drastic expansion of network resulted in public IP address and become a valuable resource. End-to-end connectivity also had security issue because you directly expose your computer/device to potential abuses. Because of this reasons in mid. 1990 NAT (network address translation) devices become popular. Today their presence is common in our houses and offices.
NAT devices (routers) enable multiple devices in the local network to share single IP making internet easily distributable to any computer/device in home or office network.
NAT also provides a basic shield to the local network from possible outside attacks. It is because they don't let any traffic from outside reach computer/device on the local network unless device/computer in the local network initiates the connection by sending the request to remote server. NAT keep records of requests and lets back data from remote service only if the local client has initiated data transfer. Routers commonly also have a built-in firewall that further increases security protection by router device.
This is great, but what about End-To-End connectivity. Original concepts had flaws to use it without consequences but no dough we need it and it's irreplaceable for some sorts of communication.
NAT device controllable port mapping by clients
Utilities to enable End-To-End connectivity where added to NAT (router) devices. Protocols that enabled client applications to request port mapping from router such that all data targeting mapped port reaches client that requested mapping. These protocols are:
UPnP - most common, port mapping creation using universal plug and play protocol (based on XML messages)
NAT PMP (Nat address translation port mapping protocol) - old protocol you would really found in your router. It had intensive use in some large AIRWAY companies.
PCP (Port control protocol) - proposed by Apple latest protocol to be adopted as RFC standard. It is compatible with NAT PMP protocol. You can commonly find it in Apple's NAT devices (newer).
Presently (2013) you will most likely find UPnP in your router settings, unfortunately, disabled by default. It is because if it is enabled, can be abused by viruses and trojans if there are already present on your computer. It is expected that you enable one of this only if you need it, like in situation when you want to play some online multiplayer game.
Also, these solutions have sense only if your router has public IP. It is becoming rare these days because internet providers tend to share public IP between several users. Also, it's not rare they use devices and routing software that can full your WAN device that it's on public IP.
So, if you design some software that only use these methods to create direct connections you will have maybe in best case 10% users that can use it. That if you explicitly tell them they should enable particular protocol. This stands for all devices accessing internet generally. If your users are people behind home routers that play some multiplayer online game over the computer, most likely they will have UPnP enabled because at least some friend will help them configure it so maybe you will have 20% usability in this case.
Traversal using Intended NAT Table manipulation
So we see port mapping protocols can be used only in a small number of cases. What can we do now? We can cheat our NAT device to create the mapping by sending some packet to the remote client. Then instructing the remote client to send the packet that looks like response (matched source address and port) to us. If all ok and record in NAT mapping table are matched (like we are relay lucky then), the packet from the remote client will reach us. This would be an explanation of oldest known technique of NAT traversal using intended NAT table manipulation referred as "UDP Hole punching". Earlier, this technique was really usable and had a great success rate. TCP connection could be even created after using same ports and even TCP hole punch was fairly successful.
UDP Hole punching become even more usable when STUN technique was invented (Cornel university). STUN is used to learn if the computer is behind NAT, NAT behavior and ports that router with public IP mapped as external. (Every NAT device can change packet source port, change is recorded in the table so it knows how to modify response packets source port). They recognized four observable classifications of NAT behavior:
- A full cone NATis one where all requests from the same internal IP address and port are
mapped to the same external IP address and port. Furthermore, any external host can send
apacket to the internal host, by sending a packet to the mapped external address. - A restricted cone NATis one where all requests from the same internal IP address and
port are mapped to the same external IP address and port. Unlike a full cone NAT, an external
host (with IP address X) can send a packet to the internal host only if the internal host
had previously sent a packet to IP address X. - A port restricted cone NATis like a restricted cone NAT, but the restriction
includes port numbers. Specifically, an external host can send a packet, with source IP
address X and source port P, to the internal host only if the internal host had previously
sent a packet to IP address X and port P. - A symmetric NATis one where all requests from the same internal IP address and port
to a specific destination IP address and port, are mapped to the same external IP address and
port. If the same host sends a packet with the same source address and port, but to
adifferent destination, a different mapping is used. Furthermore, only the external host that
receives a packet can send a UDP packet back to the internal host.
Since router (NAT) design is not standardized in this terms it quickly became clear that this classification is not valid, because it can lead us the wrong way. Probably classification was "more valid" in time when invited because it's based on empiric conclusions, but eventually due new NAT designs become outdated. Commonly when you use some STUN testing client on your computer behind NAT to test if you can get 4 different results for NAT classification of the router. So this simply cannot be taken as valid information. What you can use is the fact that if you get any of above four results you can be sure there is NAT present. Mapped ports you get from responses are also usable because they will tell you most probable area of value next mapping will take. (Note that even if there is NAT sometimes very rarely STUN query may tell you that you are behind open internet - public IP)
Why did this techniques become outdated? Unfortunately, security administrator and us, IT engineers developing NAT traversal software, are in constant struggle. We are basically both right and wrong. They claim NAT traversal is used only by crackers and pirates. We claim NAT traversal is sometimes needed and security level does not degrade. It is so because NAT traversal finds some random port for communication and does application specific data transfer. Even if abuser manages to guess one of 65536 ports his data will get into some application process. That will throw an exception because of false data or simply break. So, speaking about security while transferring data using direct channels is far far far... more secure than using the intermediate server. Communicating with servers is less secure than communication with some host directly. This is because servers are well known and they are subject to crackers attacks. Also, client-server communication is commonly based on well-known protocols. That is also suitable for injecting entity of arbitrary code. Besides all that you can never be sure someone from cloud hosting company does not pick at your data.
To return to our story, later NAT devices and networks are not so thankful for NAT traversal. It is because some engineers design routers that recognized it as a security threat. TCP traversal is almost impossible unless you have the ability to use raw sockets which is unpractical because most OS-s enforce high-security rules for their use or even don't support it. TCP uses 3 step handshake involving packet number, session number, packet type.... and in most cases you need to match all of them to trick your NAT device not speaking of possibility that ICMP packet of type "Destination Port Unreachable" resets your try. Basic UDP hole punch will work in the small number of cases. Usable if you have the router from some quality company like Cisco (Linksys) or NETGEAR, chances of success are greater. It is because their engineers design devices that are better and they recognize the need so they properly design their devices. For example STUN and basic UDP hole punch will be enough to traversal Linksys router NAT. Linksys will preserve source original port if possible or will take some near-by value so NAT traversal on such quality router is fairly easy.
Modern-day method of NAT traversal by intended NAT table manipulation should involve next external ports prediction, price packets TTL manipulation which is a key factor in cases of symmetric and port restricted cone NAT, multiple retries, and side swap. As we already said NAT behavior is not standardized so designing good "piercing" method involves a lot of testing on different NAT consultations between peers so good routine could be designed based on empiric conclusions.
Final NAT traversal solution
Industrial standard NAT traversal solution should be able to apply all possible methods mentioned in above texts. It should inspect network environment of both peers and decide which method of NAT traversal should be applied. If one method fails it should be able to try other methods or repeating swapping peer sides. If neither methods of direct tunnel creation succeed, the relay will be the last solution that we know for sure that must work. It is because it's based on a standard client-server model. The relay is the most expensive resource of peer-to-peer system network. Having better success with direct tunnel methods will make the system more flexible and cheap to maintain. The simple calculation to demonstrate this:
Let's say we have one server having connection bandwidth 100Mb/s. We want to support our Video-Over-IP application which requires let's say 500KB/s per peer pair = 4000Kb = 4Mb.
We want to have quality service guarantying 500KB/s for each peer-to-peer session under any conditions.
If we don't use NAT Traversal we will be able to support 100/4 = 20 sessions at once per server.
If we use NAT traversal and percentage of all tunnels made by the relay is 5% we will be able to support (100/4) + 95%/5% * (100/4) = 400 sessions at once per server.
Also, if we get users over the proposed limit quality of our service will degrade in 20 times slower rate on NAT traversal equipped system.
So the cost of the system equipped with NAT traversal would be about 20 times less than pure relay system.
This is just one advantage of NAT traversal equipped peer-to-peer systems.