Wikipedia talk:Pruning article revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
WikiProject iconEssays Low‑impact
WikiProject iconThis page is within the scope of WikiProject Wikipedia essays, a collaborative effort to organise and monitor the impact of Wikipedia essays. If you would like to participate, please visit the project page, where you can join the discussion. For a listing of essays see the essay directory.
LowThis page has been rated as Low-impact on the project's impact scale.
Note icon
The above rating was automatically assessed using data on pageviews, watchers, and incoming links.

Why?[edit]

Why is the large number of revisions a problem? --Explodicle (T/C) 17:59, 20 November 2008 (UTC)[reply]

  • 26-Nov-2008: Good point, new users might not realize the impact of multiple revisions: flooding an article's history list with "25" one-phrase revisions. However, as you know, some users also save a phrase every 2-3 minutes (for 2 hours), and by definition, block other users from making large edits as a simple edit+save scenario. Any other users just see "Edit conflict" and lose everything, on each attempt. When that is happening, changes must be re-copied quickly into the edit-buffer (from another window), so that an edit+save combination is completed within the repeated 2-3 minute intervals regimented by the other change-a-phrase-and-save writer. Editing just one isolated section can slip past those repeated whole-article edits. I really don't think many repeated-save editors even realize that repeated 2-3 minute saves are completely derailing all other writers who want to save a 4-minute edit (but each continually gets "Edit conflict"). No wonder so many new writers quit Wikipedia within weeks (how many repeated edit-conflicts could they stand?). Anyway, that is an excellent point to add to the essay: Why numerous revisions is a problem. That could be added as a separate section. Thanks. -Wikid77 (talk) 03:28, 26 Nov 2008
  • If I merge a whole bunch of small edits together, and in the meantime someone else successfully makes that 4-minute edit, won't I get hit by the edit conflict instead? Assuming a user would rather have someone else deal with an edit conflict, I think people have an incentive to make smaller, faster edits. I don't think any social mechanism will solve this, because people don't want to change their editing style. I think a MediaWiki improvement that reduces the occurrance of edit conflicts in general would be best... but I have no idea how the dev team could do that. --Explodicle (T/C) 15:28, 26 November 2008 (UTC)[reply]
  • But you don't lose everything on an edit conflict, you just copy your text back in and perform the merge. Not a problem unless someone is changing exactly the same line every minute for 2 hours. OrangeDog (talkedits) 01:15, 12 May 2009 (UTC)[reply]
  • You can do that, but it's a pain in the neck. There's got to be some better way to handle this. --Explodicle (T/C) 16:33, 20 May 2009 (UTC)[reply]

Changes needed in MediaWiki software[edit]

27-Nov-2008: I agree that the current edit-conflict handling does promote quick revisions: a writer quickly saves each revision, hoping it will "stick" without edit-conflict, but unwittingly, thereby causes edit-conflict for every other 4-minute-edit user. The writer has created "40" revisions during 2 hours and, perhaps, even thinks other users simply lost interest in all the new updates, totally unaware that other users had continually lost all their 4-minute changes in repeated edit-conflict dumps. It is a bad problem, called "collisions" in network conflicts, but there are some possible solutions:

  • avoid whole-page edits, and there is less likelihood of collisions, if editing each separate section;
  • sacrifice one revision to tag an article as "<<Under Development>>" for a few hours, or such, to allow one writer more time to expand a broader revision;
  • change the MediaWiki software to put a side edit-button ("[edit]") for the top section of a file; this section can already be edited separately (as section# "0") by URL suffix "&action=edit&section=0" but that is too obtuse for many users (people, not aware of section=0, have been editing a whole article to revise just the intro text);
  • This is already an option in preferences. Go to Special:Preferences, then click "Gadgets", and under "User interface gadgets" you'll see a checkbox for "Add an [edit] link for the lead section of a page". I use it all the time. Maybe it should be on by default, but it's already coded in. --Explodicle (T/C) 14:49, 27 November 2008 (UTC)[reply]
  • change the MediaWiki software to put a user limit for revisions-per-hour on each article; once the limit is reached, the article is available for other users to save (now allowing 4-minute edits);
  • I am strongly against this. It discourages users from being bold and bites newcomers. If there is no other user, then improvements stop being made. If someone is really annoying you, why not just politely ask them if you can take turns focusing on certain sections on their talk page? --Explodicle (T/C) 14:56, 27 November 2008 (UTC)[reply]
  • then teach users about the (hypothetical) revisions-per-hour limit, and it won't take long for them to change their editing habits;
  • Instead of changing how people edit to accommodate the software, we should change the software to accommodate how people edit. --Explodicle (T/C) 14:58, 27 November 2008 (UTC)[reply]
  • change the MediaWiki software's page edit-tab (at the top of each article) to, instead, display an edit-menu (not immediately edit the whole article), limiting edits by a section-list menu to choose which section to edit, and put choice "all sections" at the bottom of the list (discouraging whole-page edits).
  • That's a really good idea, I like where you're going with this. We should also make the section edit buttons much more clear, sometimes they can be practically hidden. --Explodicle (T/C) 15:04, 27 November 2008 (UTC)[reply]

I have been developing computer software for many years, and I must emphasize that 95% of all today's usability problems can be solved by simple changes to software. It's like a super-charged version of the "80/20 rule" in Quality Control: 95% of all computer problems can be fixed by changing perhaps 10% of the source code. Example: look how easily Apple Computer or Linux made most computer viruses obsolete (for Apple or Linux computers); stopping computer viruses has been so simple, it appeared that viruses were purposely fostered (as planned obsolescence), because many people bought new computers/software when a virus fried their PC. Just like the simple end of the "Computer Virus Era", many other computer problems can be fixed by extremely simple solutions. I'm not saying the problem is college-dropout computer billionaires can't fix problems; however, perhaps ideas like wiki-collaboration have been hampered by purposely frozen technology (such as a computer giant that buys/dissolves a budding wiki-software company). Better wiki-software systems would reduce edit-conflict problems, and also provide revision-erase capability to undo hacked edits that just need to go away, not be immortalized in edit-history listings forever. So, anyway, yes, do think of ways the developers could improve the MediaWiki software to reduce those problems. -Wikid77 (talk) 03:12, 27 November 2008 (UTC)[reply]

You don't address dummy edits in your essay. What is your position on them? They inflate the revision count, but is their contribution sufficiently significant to outweigh that? If not, how can they be incorporated into your scheme or made unnecessary? 72.65.227.112 (talk) 01:15, 18 February 2009 (UTC)[reply]

Disagree[edit]

I do not agree with this essay. Lots of revisions do little harm, and less harm than condensing disconnected edits to the detriment of useful edit summaries. If it remains a disputed single author essay, then it should be userfied. --SmokeyJoe (talk) 10:16, 21 November 2008 (UTC)[reply]

26-Nov-2008: That's another good point: instead of documenting each separate change as a small revision edit-summary, numerous changes should be explained, together, as a talk-page topic. It is, indeed, better to have 3 large edits to an article, plus 4 revisions in the talk-page explanations, rather than revise the article 30-40 times. The new topic on the talk-page can then explain, as a coherent whole, the entire collection of 30-40 changes that occurred in those 3 revisions. I have recently explained a similar extensive revision, as a new talk-page topic, when I made 137 changes in one revision, plus another 47 changes in the next revision. The focus here is big-picture editing of Wikipedia. Anyway, that is a great idea: make useful edit summaries, by linking those edit-summaries to a separate talk-page topic about all those changes. Thanks again for mentioning that aspect. -Wikid77 (talk) 03:28, 26 Nov 2008

Inter-city vs. Inner-city[edit]

"Intercity" means "between cities" and is commonly used to refer to transportation between cities. The slums of a city center, often associated with lower-income living conditions and increased crime, are called "inner city." I'm sorry, this is just a usage pet peeve of mine, and I didn't realize the blatant disregard for the purpose of the essay that this edit entailed until after I had hit save. Now this self-contradiction is permanently suspended in the Wikipedia edit history log, and there's nothing I can do about it. :-C 72.65.227.112 (talk) 00:57, 18 February 2009 (UTC)[reply]

Not likely a long-term problem[edit]

An additional argument could be made that article revisions are not going to be a technical problem in the long run. See this graph. Since late 2006, the time required to generate 10 million new revisions has been hovering between 40 and 50 days. In other words, the number of revisions grows linearly, while the computing power is growing exponentially (and will continue to grow exponentially, at least in the foreseeable future), which would imply that the relative overhead from additional revisions will actually become smaller and smaller. GregorB (talk) 01:25, 9 March 2009 (UTC)[reply]

Storing wikipedia's revisions could not possibly cost wikimedia a lot of money[edit]

Are you on hallucinogenics? The majority of wikimedia's storage space is used for multimedia. For example the 'water' page on wikipedia is 70 kb long and has had about 5500 edits. Even if it was 70 kb from the beginning, that's 70*5500/1024 = 375 MB. That's really not that much. The latest database dump for all page history (which includes the text of every revision) was done in March 2008 (less than a year ago) and compressed, took up 147gb (proof).

The wikipedia database dump says their .bz2 compressed files may be compressed up to 20 times their original size. So it seems like worst case scenario, that is 3000gb of space which is required to store ALL english wikipedia page revisions EVER. Do you know how much a 500gb data drive costs? $100 these days. Granted, they have to buy more fancy drives and keep more copies so they can respond to more requests at once, but worst case scenario this costs about $2.5k. But seriously, they are not having storage problems because of revision history.... it's because of video! People are going to want to store video on wikipedia commons, because that is one of the few ways it can 100% not be a copyright violation to use here and share and share alike.

Furthermore, these revisions while not taking up much size, also do not take up much bandwidth, as 9 times out of ten, when you're using wikipedia, you're looking at current articles, not past revisions. Past revisions does not largely increase the number of computers requesting pages from wikipedia, or the frequency of the requests, or the size of files requested.

Text is, storage wise, actually really cheap. it's multimedia which is the issue. Especially multimedia bandwidth. Which is why wikipedia has recently bought a truckload of Sun servers. Besides, if you look at operating expenses, in 2008, operating expenses were about $1 million, compared to the $1.1 million for salaries, compared to $7 million raised in donations last year. Our revisions are certainly not trampling wikipedia. —Preceding unsigned comment added by 129.64.131.25 (talk) 13:34, 21 March 2009 (UTC)[reply]

GFDL requires all to be logged[edit]

What about the GFDL? All changes must be logged. OrangeDog (talkedits) 01:12, 12 May 2009 (UTC)[reply]

I'm under the impression that the number of revisions made (not recorded) is what will be reduced. --Explodicle (T/C) 16:38, 20 May 2009 (UTC)[reply]
Not according to the "copy not move" suggestions. Doing such things are a clear breach of licensing requirements. OrangeDog (τ • ε) 23:04, 19 January 2011 (UTC)[reply]

NUMBEROFPAGES or NUMBEROFARTICLES?[edit]

Looking at the wikitext, I noticed that both NUMBEROFPAGES and NUMBEROFARTICLES are both used to calculate the number of revisions per article, which explains why at the beginning of the article it's currently about 19 revisions per article, and further down it's 119. Which is correct? -Paul1337 (talk) 02:42, 14 July 2010 (UTC)[reply]

Neither. To get that figure you need to add up the revisions of articles only and divide by the number of articles. 19.15 is the number of revisions per page. 119 represents only a measure of edits to the wiki, per article, and hence in some ways of the total work required for the average article, including cats, templates, talk pages, vandal warnings, and WP1.0Bots 1 million edits to assessment lists/tables, this page, arbcom, etc etc... Rich Farmbrough, 14:29, 23 February 2011 (UTC).[reply]
I was going to have a stab at extracting this figure locally, but decided not to. A way to do it is 1. d/l pages-articles.xml. 2. Extract the titles of all the article pages by skipping non-mainspace and anything that is a redirect or dab or SIA (lists are a moot point). 3. count them. 4. d/l stub-meta.xml 5. add together the revision counts for all those titles 6. do a "share" sum (you will need to use long division for this, almost certainly). Rich Farmbrough, 15:04, 23 February 2011 (UTC).[reply]

Deleting old revisions is against policy and generally bad practice[edit]

Have you read stuff like Template:copied and such? There are some people who are sticklers for keeping track of copied text, taking it at times I would say to excess. Using renames to dump the page history as you suggest goes to the absolute opposite extreme, depriving contributors of their fair attribution as actually required by the GFDL or CC licensing. If you managed to do that successfully, the copyright fanatics would be telling us we have to delete the whole article and start from scratch. (Which would also cut the number of revisions, I suppose)

My feeling is that this essay should bear a warning note that it is actually advocating things directly contrary to existing policy. Wnt (talk) 19:06, 14 November 2010 (UTC)[reply]

I strongly agree with the above assessment. Further, I would suggest that all revisions potentially contain important information which could motivate or inform future wikipedians or other researchers. Just because information has been removed from the current version of the article does not erase its value. If there are technical challenges associated with keeping all revisions to the greatest extent that is reasonable, then they must be met and planned for by the Wikimedia Foundation. I feel this is a core duty that must be fulfilled. --Dfred (talk) 05:34, 17 December 2010 (UTC)[reply]
The only thing that needs to be done is to give us the option to hide bot edits for cryin out loud! They're the only useless revisions that pollute the history and make tracking changes a nightmare. We can do it in Recent changes so why not in the Revision history?? -- œ 21:29, 20 February 2011 (UTC)[reply]
In fairness some of the proposals are a little more extreme, hiding (or better collpasing) revisions is a great idea, but it shouldn't merely cover bot edits, it should cover reversions pairs or sets too. Also if one person makes 30 successive edits that could usefully be collapsed. Rich Farmbrough, 14:32, 23 February 2011 (UTC).[reply]
Oh and incidentally a very significant proportion of bot edits are interwikis, trancluding the interwikis from a common repository (will be possible once bugzilla "reasonably cheap method of interwiki transclusion" is completed) would make these unnecessary. Rich Farmbrough, 14:36, 23 February 2011 (UTC).[reply]

I've decided to pick up the glove and add a warning to this page. Reasons:

  • old revision histories shouldn't be deleted
  • minor edits shouldn't be avoided - does the author have anything against wikignomes?
  • users are advised not to worry about server performance. ʝunglejill 10:19, 17 June 2012 (UTC)[reply]

Bold major update[edit]

I've been bold and updated this to something that is technically accurate, not conflicting with law or policy but also giving tips on efficient and considerate use of multiple edits. Back in late 2004 we were getting close to running out of about 200 gigabytes of disk storage so we implemented the compression scheme I describe in the article and moved the article text off the main database servers. Took a few months for scripts to do all of the internal moving and compressing work. There's no need at all to be concerned about the space consumed by article text these days. There is some smallish cost just for each revision's summary information so it's still nice technically not to create huge numbers of pointless edits. Jamesday (talk) 15:20, 29 July 2013 (UTC)[reply]

Dealing with edit conflicts[edit]

I'm tempted to update this to remind people how to reduce edit conflicts - never go more than ten minutes without saving your work, and whenever possible edit by section rather than edit the whole article. Obviously on high profile "busy" articles you need to save much more frequently. I've seen an article get to 25 edits per minute...... ϢereSpielChequers 06:40, 21 January 2022 (UTC)[reply]