User:Locke Cole/IEC units are bad

From Wikipedia, the free encyclopedia

This page outlines what IEC units are, how they came about, how infrequently they are used in the real world, and why they should be avoided in article content (with very rare exceptions).

What are IEC units, how are they bad, and what do we do instead?[edit]

The IEC units were something cooked up in 1997 to deal with a discrepancy between how hard disk drive manufacturers used units like "megabyte" and "gigabyte" and how the manufacturers of RAM utilized the units. For hard drives, the manufacturers used a definition of gigabyte that had it equal to one billion (1,000,000,000) bytes. For RAM manufacturers, they had it set to 1,024 × 1,024 × 1,024 (1,073,741,824) bytes. This is the difference between binary and decimal variations of the units. Because of this discrepancy (and the fact that as you move up from gigabyte to terabyte to petabyte and so forth the discrepancy becomes even more significant) the IEC units were created which used similar names, so instead of kilobyte or megabyte, you had kibibyte and mebibyte. Instead of GB or TB, you had GiB and TiB.

The problem is, the vast majority of sources (primary and secondary), software, manufacturers and scholarly writings still utilize the traditional metric-derived units for both instances. Because our sources don't typically use these new IEC units, our articles are placed in a position of having a single term (ex: gigabyte/GB) having multiple meanings throughout the article. The solution, as devised in WP:COMPUNITS, is to disambiguate conflicting meanings using footnotes or precise definitions within parenthesis.

We do this because the IEC units do not have widespread acceptance. And as most people are generally unfamiliar with these units (heck, most people are barely understanding of megabyte/gigabyte/terabyte to begin with), burdening readers with learning and understanding the difference between what a GB and a GiB is is bad for our readers. Further, it deviates from the sources (which already makes using IEC units a non-starter before you consider that the terms are generally unknown outside of people with a purely technical background like programmers or engineers).

The Wall Street Journal summed it up best in a 2003 article on the issue (one of the two instances of a real world print publication ever referring to the IEC units): In 1997 a standards body tried to clarify things by renaming the binary-derived units as gibibytes, which tells you all you need to know about the usefulness of engineers tinkering with language.[1]

Finally, it's worth noting the commentary of Donald Knuth, author of The Art of Computer Programming (from What is a kilobyte?; emphasis added):

Now to my astonishment, I learn that the committee proposals have actually become an international standard.

Still, I am extremely reluctant to adopt such funny-sounding terms; Jeffrey Harrow says "we're going to have to learn to love (and pronounce)" the new coinages, but he seems to assume that standards are automatically adopted just because they are there. Surely a huge number of standards for other computer things, like networking protocols, have been replaced by better ideas when they came along. Thus I hope it still isn't too late to propose what I believe is a significantly better alternative, and I still think it unlikely that people will automatically warm to "mebibytes".

Unsurprisingly, Knuth was right.

IEC unit usage in sources[edit]

Detailed below are some examples of IEC unit usage vs. the common usage as found in various source-types. For some, a sanity check is included in the form of fatberg so you can see just how uncommon IEC units are in the wild.

IEC usage in print media[edit]

List taken from List of newspapers in the United States, List of newspapers in the United Kingdom by circulation, List of newspapers in Australia by circulation and List of newspapers in Canada by circulation.
Google query results
Newspaper site:<URL> "gibibyte" site:<URL> "gigabyte" site:<URL> "fatberg"[2]
usatoday.com 0 745 26
wsj.com 1[1] 2,890 2
nytimes.com 1 4,610 168
nypost.com 0 234 130
latimes.com 0 1,390 5
washingtonpost.com 0 971 47
startribune.com 0 410 1
newsday.com 0 270 0
chicagotribune.com 0 952 41
bostonglobe.com 0 278 3
Extra entries for fun
seattletimes.com 0 611 23
seattlepi.com 0 383 0
International (English-speaking)
timesofindia.com 0 490 4
metro.co.uk 0 109 2,290
thesun.co.uk 0 137 135
dailymail.co.uk 0 710 318
www.standard.co.uk 0 88 17,900[3]
mirror.co.uk 0 154 170
thetimes.co.uk 0 439 65
www.telegraph.co.uk 0 407 72
theguardian.com 0 546 3,270[4]
heraldsun.com.au 0 76 26
www.dailytelegraph.com.au 0 84 45
www.couriermail.com.au 0 49 34
www.smh.com.au 0 1,030 29
www.theaustralian.com.au 0 364 8
www.theglobeandmail.com 0 602 2
www.thestar.com 0 387 22
nationalpost.com 0 58 49
No "site:" filtering, just the terms in quotes
213,000 72,400,000 238,000

IEC units in scholarly writings[edit]

At User:Thunderbird2/The case against deprecation of IEC prefixes there is a Google Scholars link that is used to determine how many articles are using IEC units. I note that currently for the 2020-2022 period there are 582 hits for MiB/GiB. That same search ran with MB/GB returns 44,900. Granted, some of those may be false positives (since MB/GB are more likely to occur as initials for other terms), so for clarity I ran the search using mebibytes/gibibytes and megabytes/gigabytes. There were 28 hits for the IEC unit, and 1,560 for the traditional metric unit.

It would seem, even among research papers, that IEC units make up a small fragment, about 1.76%. Metric units accounted for 98.23% of the results. As I've already explained above, the wider media at large does not use the IEC units whatsoever, and their use in academic circles appears to be vastly outnumbered by the traditional metric units. And just to continue the "fatberg" sanity check from above, that returned 49 results for the same period.

IEC usage by specific companies[edit]

bytes bits
Site kilobyte kibibyte terabyte tebibyte kilobit kibibit terabit tebibit
intel.com 1,500 3[5] 2,240 4 668 0 301 0
microsoft.com 4,370 135 8,210 91 784 2 553 0
amd.com 75 0 252 0 3 0 6 0
apple.com 2,620[6] 359[6] 6,180[6] 281[6] 1,130[6] 8[6] 765[6] 6[6]
netgear.com 58 1[5] 349 9[5] 4[5] 0 4[5] 0
crucial.com 25 0 63 0 1 0 1 0

IEC units in our sources[edit]

To be clear, there are sources out there that one might choose to deliberately cite to give an article the appearance that it has more IEC units referring to it than it really does, so claims to the contrary should be met with significant skepticism until the true nature of sources on a topic can be sussed out.

Typically, where no shenanigans have taken place, our sources almost exclusively utilize the traditional metric units for both binary and decimal meanings, and often do so interchangeably with little in the way of disambiguation. Some companies, like Apple or Western Digital, will make clear that their storage products (or products with storage included in them) use terms like GB and TB to refer to products where a GB is one billion bytes and a TB is one trillion bytes, but that is typically the most they will do. They do not use GiB or TiB. When we force these IEC units into our articles, we are giving them undue weight and promoting something that the wider world simply has not adopted. This causes unnecessary confusion for our readers, and forces them to learn about this obscure unit so they can continue reading our article where a simple footnote or parenthetical explanation would have sufficed.

IEC units encourage bad behaviors in our editors[edit]

As IEC units are rarely used in our sources, using them in any widespread way may give new editors the impression that reliable sources can be deviated from in significant ways. Sometimes editors may deviate from the unit used in a source, believe they're "fixing" something, and perform a calculation on a value that was actually correct as it was. This can take whatever minor discrepancy exists between the actual metric and decimal values and amplify them. Using the units that are used in the wider world and in the vast majority of sources encourages good editing behavior by not introducing original research that could go undetected until a more experienced editor corrects the mistake.

WP:COMPUNITS annotated[edit]

Some editors appear to have difficulty parsing the meaning of the end of WP:COMPUNITS. This section attempts to annotate and explain how this applies to articles, and how the exceptions should be applied. First, the full text as it existed on 2021-07-05T05:31:06.

Text[edit]

The IEC prefixes kibi- (symbol Ki), mebi- (Mi), gibi- (Gi), etc., are generally not to be used except:[a]

  • when the majority of cited sources on the article topic use IEC prefixes;
  • in a direct quote using the IEC prefixes;
  • when explicitly discussing the IEC prefixes; or
  • in articles in which both types of prefix are used with neither clearly primary, or in which converting all quantities to one or the other type would be misleading or lose necessary precision, or declaring the actual meaning of a unit on each use would be impractical.
  1. ^ Wikipedia follows common practice regarding bytes and other data traditionally quantified using binary prefixes (e.g. mega- and kilo-, meaning 220 and 210 respectively) and their unit symbols (e.g. MB and KB) for RAM and decimal prefixes for most other uses. Despite the IEC's 1998 international standard creating several new binary prefixes (e.g. mebi-, kibi-, etc.) to distinguish the meaning of the decimal SI prefixes (e.g. mega- and kilo-, meaning 106 and 103 respectively) from the binary ones, and the subsequent incorporation of these IEC prefixes into the ISO/IEC 80000, consensus on Wikipedia in computing-related contexts favours the retention of the more familiar but ambiguous units KB, MB, GB, TB, PB, EB, etc. over use of unambiguous IEC binary prefixes. For detailed discussion, see WT:Manual of Style (dates and numbers)/Archive/Complete rewrite of Units of Measurements (June 2008).

Annotation[edit]

Quotation Commentary
The IEC prefixes kibi- (symbol Ki), mebi- (Mi), gibi- (Gi), etc., are generally not to be used ... Fairly straightforward, the English language Wiktionary defines "generally" as "Popularly or widely" and "As a rule; usually".
except: It goes without saying that except applies to abnormal or uncommon situations, but let's soldier on.
when the majority of cited sources on the article topic use IEC prefixes; This is completely reasonable, though it does give rise to editors being selective with their sources to sway the ratio one way or the other. Care should be taken when someone claims a significant number of sources support IEC prefixes; as demonstrated above, this is highly unlikely except in very specific article topics (say, software which actually uses the units and documents that in a significant way).
in a direct quote using the IEC prefixes; Obviously we aren't going to change a direct quote, though here again, is the quoted source representative of all sources on the topic? Is this source being deliberately chosen to use IEC prefixes against the aforementioned "generally not to be used" rule?
when explicitly discussing the IEC prefixes; Clearly in an article which is discussing them (either directly, or comparatively in another article) this makes sense. Again, care must be taken to ensure such uses are well sourced, and that the used sources are representative of the topic overall and not simply cherry picking.
or We have one last potential exception to the "generally not to be used", we'll take this last sentence a piece at a time.
in articles in which both types of prefix are used with neither clearly primary Fair. That said, nothing in this obviates the need for our sources to match what the article is presenting. WP:NOR and WP:V still apply, and the MoS cannot override that.
or in which converting all quantities to one or the other type would be misleading or lose necessary precision Remembering now that we must be in an article with mixed units in use and the sources must support such mixed unit usage, this case is truly exceptional at present. And with the recommended disambiguation at WP:COMPUNITS, a conversion would not lose necessary precision or be misleading.
or declaring the actual meaning of a unit on each use would be impractical. And so we reach the end, and really the nail in the coffin to those seeking a pass on the "except" part of "The IEC prefixes ... are generally not to be used except". Again, with disambiguation provided at WP:COMPUNITS we already have the tools necessary to use the common units our readers are familiar with, while making it clear to them when a unit is referring to a slightly different value than what they may expect. WP:COMPUNITS is certainly not exhaustive on methods of disambiguation, using simple language such as what Apple uses ("1 GB = 1 billion bytes", "1 TB = 1 trillion bytes", etc) as a footnote to relevant entries should suffice without needlessly using IEC prefixed units.

At the end of the day if we follow our sources (absent any cherry picking) we shouldn't usually run in to problems. Perhaps one day IEC units will be accepted by the computing and technology industry, and when that day arrives this discussion won't be nearly as controversial as it seems to be. But in the world as it exists as this is written IEC units simply are not used with any significance and we should not force our readers to simultaneously grapple with a topic they're already reading to learn more about and throwing in units they've never heard of to further complicate it for them.

See also[edit]

References / Notes[edit]

  1. ^ a b Hanrahan, Tim; Fry, Jason (September 22, 2003). "Finding the Dogs on the Net; Case of the Missing PC Bytes". The Wall Street Journal. Retrieved 2021-05-01. (In 1997 a standards body tried to clarify things by renaming the binary-derived units as gibibytes, which tells you all you need to know about the usefulness of engineers tinkering with language.)
  2. ^ Inspired by A 330-ton fatberg is clogging an English city's sewer, and it won't move for weeks; fatberg was first coined in 2008 according to Merriam-Webster Online Dictionary.
  3. ^ Clearly The Evening Standard believes fatberg stories to be particularly newsworthy...
  4. ^ Seriously, guys... stop putting cooking oil down the drains...
  5. ^ a b c d e Some or all of these are end-user community/forum posts, not company documents.
  6. ^ a b c d e f g h A not insignificant number of these appear to be Apple App Store, Apple Music, or forum posts.