Reflections on Broken Token Ring
#insert "someoneontheinternetiswrong.meme"
ES Raymond, the open source advocate and the bloke responsible among other things, for your GPS working, has a recent Xeet where he reXeets approvingly someone else’s Xeet about a dead network technology called Token Ring. That Xeet is wrong and since Token Ring paid for me to travel the world (it literally caused me to visit all the continents except Africa and Antarctica), I feel I should do an old fashioned fisking.
Token Ring was always obsolete.
Token Ring is obsolete now, but it wasn’t. In fact Ethernet and TCP/IP borrowed a ton of ideas from it. It failed to rule the world for two reasons, cost and the development of switched networking, which, ironically, was greatly assisted by the IBM & Token Ring best practice of structured networking
Back in the 1970s, IBM accounted for the majority of the computer industry, including networking. The famous “OSI Model” is a model for how IBM did networking, not actually how network works today.
Even ESR thinks this is wrong. IBM skipped several layers of the OSI model as did everyone because it was an academic abstraction. IBM did SNA, which some academics pounded into the OSI model, and IBM was cool with them doing that, but IBM didn’t use the OSI model for anything except marketecture to governments who had been convinced by academics that the OSI model was the way to go.
Then along came Ethernet, which broke the IBM model of networks. Instead of an expensive mainframe at the center of the network ruling everything else, Ethernet was democratic, allowing anybody to put any machine onto an Ethernet segment. Instead of “client-server” computing, it allowed “peer-to-peer” computing.
Weirdly despite Ethernet and TCP/IP allowing for “peer-to-peer” computing almost all traffic on the internet today, including how you are reading this article, is “client-server”. Moreover while other parts of SNA were very much “client-server”, token ring was not and that was deliberate because IBM used it for some mainframe-mainframe communication as well as user-mainframe communication
It was also cheap, compared to other options, and started to become very popular.
That is true. It was relatively cheap and thus became popular. IBM, as I mention below, did a horrible job of lowering the cost of its products and other Token Ring manufacturers preferred to keep margins up by only being 10-20% less than IBM even when they could have been a lot cheaper.
There were a lot of competing technologies that sprung up around this time as well, like “ARCnet” and “LocalTalk”. Basically, the ability to network cheap computers became really cheap.
Also true. A lot of these technologies turned out to have massive scaling problems so they worked fine to connect, say, a dozen computers but could not usefully be extended to support an office building with hundreds. Ethernet did better than these others, which was another reason why organizations started to adopt it but it also had scaling problems
The IEEE decided to standardize Ethernet, now known as the 802.3 series of standards.
IBM couldn’t allow this, so they created their own alternative and pushed for the IEEE to include that in the standards, “Token Ring”. This is defined in 802.5.
There’s also a “Token Bus” standard, 802.4, but is meaningless. It was only included to pretend IBM wasn’t trying to disrupt and dominate the standard.
Also true. The fun bit is that one of the more popular Ethernet networking stacks at the time - Novell Netware - ended up stomping all over part of the Ethernet standard in order to get a few more bytes of packet data. That was because Ethernet had packet size issues…
The trick to the IEEE 802 standards is that all three alternatives used the same 48 bit MAC address that we know and love. This allowed us to build bridges between Ethernet and Token Ring.
True and not true. There was a big-endian/little-endian issue that needed to be fixed up and which broke, for example, the ARP part of IP unless you added smarts to the bridge. But bridging between the two was possible and that helped when other low level formats like FDDI and then wireless networking showed up.
Now the thing about Ethernet at the time was that everything was attached to the same wire. That meant if two devices transmitted at the same time, their packets would “collide”, and corrupt each other. Each would detect this, then stop transmitting and backoff for a random period of time before transmitting again.
IBM pretended this was unreliable. The feature of passing a “token” around a “ring” was that it was deterministic, with nothing wasted due to collisions. It meant that a network could run at 100% of theoretical capacity, whereas Ethernet started experiencing problems as it reached max capacity with everyone colliding with each other.
IBM was correct. As I’ll explain below, Ethernet could degrade very quickly. One of the reasons why bridging was developed as a concept was that in a large busy network of several dozen or more devices Ethernet became unusable. Another problem with traditional Ethernet was that it essentially had a single path - the coax wire - which meant that if the path broke then you were disconnected. One fix for this was to put a bridge at each end of the coax but that required the bridges to talk spanning tree protocol (STP)1 so as not to end up causing a broadcast storm. Spanning tree worked but in situations of dynamic network topology changes (e.g. someone plugging in a flakey bridge) it could take a very long time for the spanning tree to get back to stability and network connectivity between segments was broken while it did. With the original coax cabling one of the classic ways to have Ethernet collapse in a heap was to fail to properly connect the bridge so that it was intermittently connected. Back in the mid 1980s this was a persistent problem with a part of the Cambridge University Computer Science network.
As it turns out, Ethernet’s reliability problems were overstated and Token Ring’s reliability understated.
Collisions were only a problem when transmitting high rates of tiny packets. When transmitting large packets, collisions were rare, and allowed the network to run at 99% capacity. Once any network exceeds capacity, everyone needs to slow down and wait on the network. So in the end, you wouldn’t notice the collision problem as being anything remarkable.
That is untrue. Ethernet worked fine for client-server communications with all clients talking to one server (or bridge). It broke as soon as you had asynchronous communications by different clients to more than one server - i.e. peer-peer communications. One of the reasons Novell Netware took off in the late 1980s was that they usually specced routing between relatively small Ethernet segments and that solved the problem at the cost of some higher level processing. The same applied for TCP/IP which is why even today you have a “default gateway” for most client TCP/IP networks. If everyone talks to one thing then the limiting factor is the ability of the one thing to respond and you end up with a series of interleaved requests and responses and no collisions. As soon as you add a second server that also wants to respond you end up in a mess because the two servers and theit clients try to talk over each other. A classic example of this was when a department bought a new HP Laserjet printer that had its own Ethernet card instead of connecting to the server. Then if user A wanted to print her 50 page document while user B wanted to copy his powerpoint presentation to the server there would be problems. As a Token Ring vendor representative I used to demo this all the time until Doom was released and we discovered that networked versions of Doom with 4 or more players would smash the average Ethernet network while Token Ring ones worked just fine.
Prior to switching, which is a degenerate case where every client has its own network segment, excessive collisions were a big problem with Ethernet and they got worse when you tried to build large networks connecting entire office buildings together. Bridging helped, but bridges could become congested because you could only have one bridge active connecting anywhere to anywhere.
Conversely on Token Ring peer-to-peer really did work and you could run a network reliably at 90+% utilization even when running multiple games of network Doom on the same Token Ring network. Moreover one feature of Token Ring that Ethernet lacked was “source route bridging”, which removed the need for spanning tree and allowed for multiple bridges to all forward traffic in parallel when needed. It also allowed for tricks like having the same server (or mainframe) be reachable two or more different ways on different Token Ring networks which allowed for both load balancing and fault tolerance. If you were running, say, a trading floor high availability was a mandatory requirement and Ethernet couldn’t do it before the development of reliable Ethernet switches.
Finally, not mentioned by the OP, is that Token Ring had a larger packet size of 4.5K Bytes vs Ethernet’s 1518 bytes and that it had a faster data rate (16Mbps vs 10Mbps) until the development of 100Mbps Ethernet. The higher data rate combined with the lack of collisions really did allow Token Ring to handle higher sustained network loads compared to Ethernet and the larger packet size allowed for more efficient server operations as you could send an entire 4KB disk block in a single response. In the early to mid 1990s these factors allowed organizations to spend less on server infrastructure with Token Ring even though the network interface cards were more expensive2.
Conversely, IBM chose the same connector for Token Ring as was already in use for video ports and serial ports. If a desktop user plugged the cable into the wrong port, it would crash the Token Ring. In other words, it had a serious reliability problem that “tokens” couldn’t fix.
This is completely false. Yes it was the same DE-9 of RS-232 but there was a required voltage to be applied across two pins in order to open a relay on the token ring hub. If you didn’t have that voltage you were isolated from the network and RS-232 didn’t generate that on the right pins. I suppose technically it was the same as old CGA and EGA video cards but VGA took over the world so quickly in the mid/late 1980s that I never heard of anyone plugging a Token Ring cable into a video port. I doubt either video standard generated the right voltage either.
As any old timer can tell you, they were in a constant battle against this, trying to fix “beaconing” (crashed) rings. It was hilariously unreliable.
No they weren’t - beaconing was extremely rare. There was a problem where, under relatively obscure conditions, it was possible for the active monitor negotiation process to fail and this prevented new computers joining the ring as well as causing issues with certain hubs that tried to troubleshoot the problem and broke themselves doing it, but (and I spent months going to customer sites to debug this, so trust me) one of the more remarkable facts about this was that networks could continue to function for days with this process broken and no one would notice. It is true, however, that this problem was caused by bugs in the NIC code which could not have occurred on Ethernet NICs because Ethernet lacks the whole token monitor requirement.
It was also expensive. Ethernet hardware used dumb, and cheap, chips. Token Ring adapters needed their own CPUs, separate from the main CPUs. Humorously, the early network cards from IBM included a 16-bit CPU that was more powerful than the 8088 CPU of the IBM PCs into which you inserted these adapters.
That is mostly true. But the difference in price was driven in part by IBM not trying to reduce prices to match the competition.
The additional CPU, however, allowed for a bunch of interesting things that, as I say above, meant that the total cost of a token ring network could be less.
For example back in the day the 640KB memory limit was a major limiting factor for DOS based computers, including of course Windows 95 ones. That problem only really went away with NT and Win 2000. If you had, as was relatively common in the pre NT days, a requirement to talk to IBM mainframes and Novell Netware servers and also do this new TCP/IP thing you could start hitting issues with available memory below 640K because all the network drivers had to run there. Unless, that is, you offloaded most of that to the NIC and just had a relatively small shim in that lower 640K.
You could also do a bunch of tricks on the server side that meant the NIC did a good deal of the calculations for DMA data transfers between NIC and main memory which significantly reduced the CPU load on the server.
What these two things meant was that you could save on expensive (at the time) RAM in user computers and have fewer servers to handle the load from your users. That mean the TCO could be less.
Now it is true that IBM didn’t do most of those things and that the standard NIC had many more expensive chips on it than an Ethernet one did. But that wasn’t inherent in to the protocol. Token Ring competitors could make higher performing cards for less than IBM and at some point around 1994 Madge Networks came out with a NIC that put almost the whole card on one chip with just the memory being external (and that was a single chip IIRC). That two chip card could have been sold for very little more than the price of an Ethernet one but they didn’t do so because they preferred to make a larger margin on the card.
The point is that IBM had an argument for why they were “better”, but the technology actually was dramatically worse, even it weren’t more expensive. It was all part of IBM’s fight to avoid losing control of the industry.
I don’t disagree that it was more expensive, but I strongly disagree with the statement that it was worse. In fact the revealed preference of Ethernet and IP vendors copying aspects of Token Ring show that the technology was actually better. IBM did a horrible job of marketing it properly and they failed the classic business problem of managing to grow a niche product with high margins into a mass market one with lower ones but the technology was fine for the time.
What do I mean by “Ethernet and IP vendors copying aspects of Token Ring”?
The simplest was structured wiring. Traditional Ethernet was a length of coax cable going from computer to computer. It took until (I think) about 1992 for Ethernet hub vendors like Synoptics and 3com to release hubs that allowed for structured wiring where each computer had its own dedicated cable to the hub. Token ring always had structured wiring. Structured wiring allowed for smart hubs that could do fault isolation and bridging between multiple segments inside the same piece of tin which reduced the collision issue. It also paved the way for switching because once you had a dedicated wire and port for each computer it was relatively easy to make each wire its own dedicated network and thus totally eliminate collisions as every port is bridged to every other one.
Then there are “jumbo frames” i.e. packets larger than the 1518 bytes of the Ethernet standard but which Token Ring had from day 1. Then in the high availability space concepts like IP anycast look remarkably similar to the IBM SNA/Token Ring multiple NICs on different networks. In fact while they don’t use source routing quite a few high availability tricks of Ethernet switches use concepts derived in part from Token Ring bridging and SNA high availability.
Finally there are smart network adapters for servers which offload much of the TCP/IP stack bits from the server’s CPUs. This looks amazingly like what Madge Networks did back in the early 1990s (possibly even the late 1980s).
IBM customers bought a lot of Token Ring from IBM because they were IBM customers and IBM told them to. But it never really went anywhere outside of IBM shops. Few believed IBM’s marketing nonsense.
It never went far outside of IBM shops because of deliberate choices by IBM to not compete on price. In particular IBM chose to not create a low cost single chip version, which was something it could easily have done in the early 1990s. Had it done so then quite possibly we’d have seen larger layer 2 networks and less demand for layer 3 switches and complex IP routing protocols because they would not have been needed. I don’t know if it would have been cheaper, but I don’t see why it would not have been.
The point is that old timers like me shouldn’t be bragging about having once built Token Ring networks. It’s a badge of shame, not pride. It was bad tech from the very beginning.
Actually you should. And not just token ring. There are lots and lots of issues that show up again and again as a new generation develops a new thing and fails to apply the lessons learned in the deployment of an old thing. If you have a few old timers around they can help you avoid making the mistakes they made.
Now if you want to discuss bad technology then I’d argue that ATM was the poster child. Unlike Token Ring, it really did originate in the backrooms of Telcos and academic research and was a standards body compromise that suited nobody. But even ATM can teach useful lessons, albeit of the “do not do this” sort.
I think that I shall never see
A graph more lovely than a tree.
A tree whose crucial property
Is loop-free connectivity.
A tree that must be sure to span
So packets can reach every LAN.
First, the root must be selected.
By ID, it is elected.
Least cost paths from root are traced.
In the tree, these paths are placed.
A mesh is made by folks like me,
Then bridges find a spanning tree.
— Radia Perlman “Algorhyme”
Radia Perlman invented Spanning Tree among other important internet protocols. I’ve actually met her back when she joined Novell.
I doubt I still have the documents I worked on at Madge to prove it, but we had actual customer numbers and TR could be extremely price competitive in certain environments despite appearing more expensive at the network level.




I won't pretend to be a networking guru, but I've had to set up office networks in a few startups. One thing I learned early on was the technical differences between a "router", a "bridge", and a "hub".
Routers create a sort of network partition, and are smart enough to keep traffic inside their local network local, while forwarding traffic that needs to go out to the "outbound" network.
Hubs are just dumb repeaters. They often look like routers - and are a lot cheaper because they're dumber - but repeat traffic out to everything that's connected to them.
Bridges are basically bigger, more scalable, and more expensive routers that you need if you have multiple office networks that need to interact with each other.
Too many people would see a $15 hub and just fill their office network with them, and wonder why their office networking was so slow, especially when 10baseT was what most people used. A few times I pretended to be a networking guy just by figuring out where routers should be instead of hubs, buying a few $50 routers, deploying them, and suddenly the network is decently fast.
As for IBM worlds, the one thing that scaled better than IBM networks was IBM expense. A couple of startups I was at briefly chatted with IBM sales reps, got basically "the $2000 you are budgeting for your 50 device office network will be like $15K for an IBM network", and that was that.
Loved reading this. Not a programmer, but when I went (back) to grad school, a workstudy job was running the LAN for my dept, thru which I met a kindly gent running everything in the Dook dungeons. My job was boring and MG suggested I type "rn" at the *nix prompt. What a world! When I say the list of ahem newsgroups could be printed on about 10pp, you'll know it wasn't the beginning of the world but I could see it right behind & talk/listen to people who had a hand in building it.
Among others, I read alt.folklore.computers daily for years; frankly everyone I read conversed easily in these topics (so I understand more than I can do), and your piece took me right back to those days. Disclosure, big fan of esr, not least for his ability to explain things and willingness to change his mind, e.g., https://x.com/esrtweet/status/1910809356381413593 led to an epiphany of my own. In the thread you cite, perhaps you'd agree he implicitly acknowledges general correctness of what I take to be at least one main point of your comments (responding to https://x.com/CDoombeard/status/1982168304379728078). Anyway, thanks for the read.