dinsdag 16 juni 2009

USENET but then FASTER: tsunami-udp FTW!

Downloading data via USENET has become the default FAST track for most DSL|Cable users. The data is nicely placed 'locally' and for a small fee, one gets priority access for 4, 8 or however many connections. Nice. Much faster then the old single-sourced-http access or the newer multiple-source-bittorent [too many cheaters in bittorent country who who do not obey to the rule that one should at least have a share ratio of 1:1.20 or better].

Cool, so now we finally get to use all the bandwidth we pay the ISP for. But since the USENET servers are useualy close to the endpoint, the need for a connection oriented protocol like TCP is hard to make. UDP is a cheaper protocol and thus could increase the effective bandwidth since it requiers less 'overhead'. The old fasioned reasoning for using TCP over UDP is that UDP is only usefull for transmissions where order isn't important and you don't need all of the messages to get to the other machine.

Other reasons for using TCP over UDP are that the upstream application needs less state awareness and since we like our coders dumb, we take that burden off of them.

But since our USENET servers are close, packetloss is not much of an issue. It is far more exceptional to loose packets. In the rare case we do, we could simply ask for a resent of that particular packet [or block].

So I went to BING [I have to admit I am impressed by the google-like results!] and asked "when is tcp better then udp" Hardly anything interesting showed at first glance. I repeated the same question to GOOGLE, more 'good' material was listed at first, but still not what I was looking for. BING has been setup to give me 100 results so I took a second look and found at hit 38 tsunami-udp.

In pseudo-code, the server and client operate approximately like this:

while(running) {
wait(new incoming client TCP connection)
fork server process:
check_authenticate(MD5, "kitten");
exchange settings and values with client;
while(live) {
wait(request, nonblocking)
switch(request) {
case no request received yet: { send next block in sequence; }
case request_stop: { close file, clean up; exit; }
case request_retransmit: { send requested blocks; }

start, show command line
while(running) {
read user command;
switch(command) {
case command_exit: { clean up; exit; }
case command_set: { edit the specified parameter; }
case command_connect: { TCP connect to server; auth; protocol version compare;
send some parameters; }
case command_get && connected: {
send get-file request containing all transfer parameters;
read server response - filesize, block count;
initialize bit array of received blocks, allocate retransmit list;
start separate disk I/O thread;
while (not received all blocks yet) {
if timeout { send retransmit request(); }

if block not marked as received yet in the bit array {
pass block to I/O thread for later writing to disk;
if block nr > expected block { add intermediate blocks to retransmit list; }

if it is time {
process retransmit list, send assembled request_retransmit to server;
send updated statistics to server, print to screen;
send request_stop;
sync with disk I/O, finalize, clean up;
case command_help: { display available commands etc; }

It combines the strength of TCP [reliable data transfer] with the efficiency of UDP [no handshakes etc].

How It Works:
Tsunami performs a file transfer by sectioning the file into numbered blocks of usually 32kB size. Communication between the client and server applications flows over a low bandwidth TCP connection. The bulk data is transferred over UDP.

Most of the protocol intelligence is worked into the client code - the server simply sends out all blocks, and resends blocks that the client requests. The client specifies nearly all parameters of the transfer, such as the requested file name, target data rate, blocksize, target port, congestion behaviour, etc, and controls which blocks are requested from the server and when these requests are sent.