Today zip is almost the magic word for packing files together, and they are mostly used these days for distribution. Now zip is by far not the best way to do this and also not the first. There are multiple ways to do this….
I do not know what the oldest archiver may be, but originally the entire concept was there most of all for archiving. What first comes to mind may date back from the mainframe era, in which gigantic computers were used to process loads of data. The work stations, aka dumb terminals, were nothing but a keyboard and a monitor. I’ve worked on one of these dumb terminals, and man, that was quite an awkward experience. Gigantic tapes were used to back up all data at the end of the day, the predessors of the later tape streamers. The OS I was on, was most likely Multics, as that OS was specifically desinged for these kinds of computers. (And Multics came from the same guys who brought us the C programming language). Unix was later on invented as the “single-computer” variant of Multics. A special tool was back then available to backup data to tapes. Contrary to floppies, diskettes and harddrives (USB sticks didn’t exist back then, heck USB didn’t even exist back then), tapes can not easily access data randomly, so all files will just put into one big kind of “string” onto those tapes with a header at the start of each file so when the backups had to be restored the tape reading software knew how to put all the data in files. Although tapes are no longer used today, the way the data was being backed up, I mean the way the data was “stringed” together does still exist, and Linux users in particular must have encountered this system on countless occasions. Yes, I am speaking of the “TAR” system. “TAR” stands for “Tape ARchive”. The entire byte order in a TAR file is exactly the same the way it was back in the days this format was used for tapes, hence the name.
Unlike a zip-file where each entry inside the archive is compressed TAR does not support this. Back when this format was invented they simply couldn’t spare the memory and CPU resources to even consider this. ZIP files also have a central file table at the end of the ZIP file, allowing unzippers to easily find files inside a zip file at random, but TAR doesn’t have that either. It’s tape remember. Tapes have no random access. And the header of each files is also a bit odd to modern standards. All numbers, like file size for example, are written in strings containing octal numbers. This was done to avoid endian issues. There has never been a good consensus about wether to use BigEndian or LittleEndian. Both Multics and its successor Unix had to work on both, and writing numbers in binary format could therefore make things spooky, and converting endian numbers is easy and fast now (not to mention that most computers are LittleEndian now. Not really by consensus, but rather by how the market evolved itself), but back then it was really an attack on your system performance, so a big no-no. Basically everything about the TAR format is outdated, and yet it remained its popularity, as we even see it widely used on Linux which might well be the youngest of the three big OSes today. Please note, that TAR.GZ is not a feature of TAR. It’s just a TAR file compressed with the gzip tool, which was never part of TAR, and not to mention invented much later.
Of course, things changed when Bill Gates conquered the marked with MS-DOS, which formed the basis of PCs as we know it today, and yes even Windows 10 still has the leftovers from MS-DOS. DOS came with the BACKUP tool which was used to back up folders on a hard drive. Although I have to note that hard drive support came later in DOS, as the first DOS machines had no hard drive, but that is not relevant now. The MS-DOS file system works in clusters, and that makes reading files faster (that was an issue back then), for backups it could make you lose loads of diskspace with no-info, so BACKUP “packed” it all in one big file, and that saved clusers thus diskspace, and thus you needed less floppies (and later diskettes) to make backups. And with the tool RESTORE you could unpack these files. That was already a big deal back then.
Archiving become more serious when Sea came with ARC. The ARC tool did not only pack files together, it also compressed them using various compression methods. Phil Katz saw much potential in this format and came with PKARC which could also create ARC files and PKXARC could unpack them. Sea didn’t like this and sued Phil Katz, but the entire audience stood behind Katz and drove Sea a bit to its “deathbed”. Phil Katz however came with PKPAK and PKUNPAK as the final versions of his tools and came up with a new tool about which Sea could do nothing. That was the end of the ARC format. The new tool was called “PKZIP”, and it came with the unpacker “PKUNZIP”. PK stands for Phil Katz, the name of the inventor of the format. And so the format we still use today was born. Phil Katz has passed away by now and the ZIP format has been put in the public domain, so that is why most OSes can so easily support this format by default.
A lot of other tools and formats came back in the DOS era. DWC by Dean W. Cooper, ZOO by Rahul Dhesi. LHA, which has many designers depending on the platform, but the DOS version that has been used a lot in the DOS era by commercial game designers for their installers was written by Haruyasu Yoshizaki. Most of the success was for ARJ, by Robert K. Jung. ARJ and LHA had the best compression ratios back then, but more importantly, ARJ was the first archiver with a built-in file splitter. Back then we still had diskettes of only 1.44MB, which was nothing, even back then, so ARJ’s file splitter made it possible to put large files onto multiple diskettes, and particularly in the illegal-games-copying scene the tool was very extremely popular.
One archiver did never get really popular in the DOS time, but did get very popular after the transition to Windows. Eugene Roshal’s RAR. The modern version is done by Alexander Roshal. RAR too has file splitting, it has a very powerful compression algorithm and also supports solid archiving, and with that RAR did a bit of a dive in the past. Technically all TAR.GZ archives are solid archives, but where TAR.GZ is just one big compressed file, RAR could group multiple files and compress them together, and RAR could sort out which files could be put together best.
7z by Igor Pavlov is the youngest big used archiver, and it uses RAR’s solid archiving as a standard, and Pavlov has improved the Limpel-Ziv compression method and ‘chained’ it with Morkov and that became the Limpel-Ziv-Morkov-chain-Algorithm, also known as lzma. Now lzma would never have been possible in the old days as lzma-compression takes a lot of RAM and CPU power to pack, although unpacking is very light compared to other methods.
In a certain way my own JCR6 tool shares similarities with these formats, although it was never meant for backing up so it lacks some features a true archiver has, but it comes with features in return that makes it powerful for data-compression-packing for games, and that was what I primarily designed it for, although JCR6 has also been inspired by the WAD system (although JCR6 is by far more sophisticated).
Now what the joke, with ZIP is, that it was created in order to end the juridical fight between Katz and Sea about the ARC format, in which the audience was on Katz’ side. Actually only because Katz was believed to be one man and Sea to be a big company. The latter was not true. Sea too was very very small, but the name made people assume things. Although ZIP was not the first tool of its kind. Heck ARC wasn’t either. It was the tool that made packing and compressing files a standard still used today. Originally for backup reasons, now most of all used for distribution means.
#article #zip #tar #arj #dwc #7z #rar #archive #tar #jcr6 #Katz #Phil #PhilKatz #history #wad #arc #sea #roshal #pavlov #jung #sea
8 comments