User:Smallbones/Wales method

From Wikipedia, the free encyclopedia

On September 6, 2015 Bob K31416 asked a deceptively simple question at User talk:Jimbo Wales/Archive 194#Getting better?

"Is Wikipedia getting better?"

The following day Jimmy Wales proposed a simple method to answer the question.

My favorite way of checking this is to "click random article" on 10 articles, and go back and look at them a year ago, 5 years ago, 10 years ago. Every time I have tried, it's unambiguous: Wikipedia is getting better by this test.--Jimbo Wales (talk) 08:28, 7 September 2015 (UTC)

More than one editor had objections to this method, which I'll call the "Wales method" unless Jimbo Wales objects. The simplicity of the method should be its strength. A very basic "test of proportions" should give a credible answer if we can operationally define quality, and randomly sample enough articles. If the difference in the proportion of articles that have improved in quality is greater than those which have not increased in quality is greater than 2 x the standard error, then we can conclude that the typical article is improving in quality. With a sample size of n=100, the standard error is 5%, at n=400 the s.e. = 2.5%. While n=400 would obviously be a more powerful test, the usefulness of this test also depends on the time editors have to spend on it. For this preliminary study, I'll use n=100.

Also to save time, just one time interval is used to compare quality, 2 years.

Defining quality[edit]

I'm skeptical that the currently defined quality classes are consistently applied to articles, and that they are updated as needed, so I'll define a different measure below. The current class system ranges from Stub, Start, C, B, Good article (GA), A, to Featured article (FA). The table show how the classes were assigned in 2010 and 2015 for both total articles and percentage of articles. It also includes "featured lists" and "lists" as classes.

Since over half of all articles are classified as stubs, I'd like to divide this class roughly in half. To conserve time, I'll define 2 additional classes that can be quickly observed. Note for n=100, there will be 200 quality classifications needed. If each takes 1.5 minutes, 5 hours will be required to make the classifications.

Quality definitions[edit]

  • Level 0 - Doesn't meet the requirements for Level 1, a "sub-stub"
  • Level 1 - Roughly a "good stub"
    • In 3 or more sentences, describes and defines the topic and places it in context in the body of knowledge (e.g. mentions a parent topic, or a general subject such as physics and 2 more closely related topics)
    • has at least two references, bibliographic entries or external links to sources of additional information
  • Level 2 - meets Level 1 requirements and
    • includes at least 3 passages or sections of at least 2 sentences each that cover a subtopic (e.g. history, current use)
    • has at least 5 inline citations
    • may not have a large warning tag at the top of the article (e.g. for POV, citations needed, notability)
    • text is clearly written and contains no obvious contradictions, or has at least one of the following
      • a photo or video
      • a bibliography or further reading section of at least 5 items
      • at least 10 inline citations
  • Level 3 - meets Level 2 requirements and
    • includes at least 5 passages of at least 2 sentences each that cover a subtopic
    • has at least 10 inline citations
    • text is clearly written and contains no obvious contradictions
    • has at least one of the following
      • a photo or video
      • an extensive bibliography
      • at least 20 inline citations

Work flow[edit]

  • Click the Random article link in the left hand column
  • If the "article" is a disambiguation page or list, click the Random article link again without recording anything
  • Otherwise, record the article name and permanent link (click in left hand column) in the article column of the table
  • Record subject from list below
  • Check to see if two years ago a version of the article was available, if not click the Random article link again
    • If a 2 year old version is available, record its permanent link in the "Old version" column of the table. Rate that version's quality.and record it in the "Old quality" column.
  • Return to the current version and rate that version quality and record it in the Current quality column
  • Start again (or take a break)

Other variables may be recorded later, using the permanent versions.

Data[edit]

Sept 13, 2015, 113 obs, 100 with current vs. 2 years ago

# Article subject Quality
(new)
2 y.o. version Quality
(old)
Class new Class old Page views Aug 2015 Notes to self
1 Roszkowice, Choszczno County GEO, E 0 [1] 0 Start Start 23
2 TB Alliance BIO 1 [2] 1 un un 224
3 Aenigmina tiresa BIO 0 [3] 0 Stub Stub 57
4 2006 Edegem 4 Nations Futsal Tournament SOC 0 [4] 0 un un 35
5 Coesite SCI 3 [5] 3 Start Start 1041
6 The Headless Lady CA, 1990- 0 [6] 0 Stub Stub 47
7 Twinfield BUS 1 [7] 1 un un 102
8 Camillus Erie Canal Park GEO, W 1 [8] 1 Stub Stub 99
9 Tangu Subdistrict GEO, E 0 [9] 0 Stub Stub 36
10 Seashell Trust SOC 1 [10] 1 Start Start 111
11 Gonzalo Garavano BLP, M 1 [11] 1 Stub Stub 67
12 Dostana 2 CA, 1991+ 1 Stub 4707
13 Antithrixia BIO 0 [12] 0 Stub Stub 50
14 Northwest Biotherapeutics BUS 1 [13] 1 Start un 585
15 Kingswood, Virginia GEO, W 0 [14] 0 Stub Stub 58
16 Branson City GEO, W 1 [15] 1 un un 306
17 Kill Rock 'n Roll CA, 1991+ 0 [16] 0 Stub Stub 596
18 Mullsjö Pentecostal Church SOC 1 Stub 28
19 Mary Francis Shura BDP, W 1 [17] 1 Start Start 191
20 Joseph J. Taluto BLP, M 2 [18] 2 Start Start 231
21 Juan José Expósito Ruiz BLP, M 2 [19] 1 Stub Stub 293
22 Metapterygota BIO 0 [20] 0 Stub un 294
23 Baron Trimlestown SOC 1 [21] 1 Start Start 159 barely 1 on "refs", not blp
24 Robert Fullerton BDP, M 1 [22] 1 Start Stub 149
25 IUPAC Inorganic Chemistry Division SCI 2 [23] 1 Start at AfC 120 Earlier article at AfC
26 Detroit Baptist Theological Seminary SOC 3 [24] 3 un un 189
27 Mary, Mary (play) CA, 1990- 1 [25] 1 Stub Stub 344
28 Fartman (Howard Stern) CA, 1990- 3 [26] 3 Start Start 1060
29 Ac3 Company BUS 0 [27] 1 Stub Stub 840 Ad, should be deleted
30 Mark Smith (novelist) BLP, M 1 [28] 1 Stub Stub 102
31 Delphi Lawrence BDP, W 1 [29] 1 Stub Stub 468
32 Rino Romano AC, 1991+ 1 [30] 1 Stub Stub 2940
33 Commercial policy HIST 0 [31] 0 Stub Stub 2159
34 Clayton, Michigan GEO, W 2 [32] 1 Start Start 158
35 Kern National Wildlife Refuge GEO, W 1 [33] 1 Stub Stub 250
36 Rosato & Associates CA, 1991+ 1 Start 142
37 Laksmikanta Roy Choudhury BDP, M 3 un 111 quite new big, but sb checked!
38 Thelypteris verecunda BIO 1 [34] 1 Stub Stub 49 No edits since 2011
39 Michael Cudahy (electronics) BLP, M 1 [35] 1 Start un 249
40 Ulrika Karlsson (footballer) BLP,W 2 Start 77
41 Pain and pleasure SOC 1 [36]] 1 Start Start 1505
42 Semera GEO, E 2 [37] 2 Stub Stub 494
43 Academy of Sedan HIST 1 [38] 1 un un 111
44 Zandokht Shirazi BDP, W 0 [39] 0 Stub Stub 74
45 Smilax pseudochina BIO 1 Stub 44
46 Dip (dance move) CA, 1990- 0 [40] 0 Start Start 521 maybe SOC?
47 Ginny Wright BLP, W 0 [41] 0 Stub Stub 149
48 Pamonha SOC 1 [42] 1 Start Start 761
49 Troy and Boston Railroad HIST 0 [43] 0 Stub Stub 92
50 My Learned Friend CA, 1990- 1 [44]] 1 Stub Stub 361
51 Piper cutucuense BIO 0 [45] 0 Stub Stub 46 No edits since may 2013
52 Cervilissa BIO 0 [46] 0 Stub Stub 24
53 Streats BUS 1 [47] 1 Stub Stub 123 Greatly expanded, earlier barely=1, later prob a very good ad (with no refs)
54 Norman Coke-Jephcott BDP, M 0 [48] 0 un un 97 now has 1 ref, 0 before
55 Alonnisos Marine Park BIO 1 [49]] 1 Start Start 304
56 Camillo Olivetti BDP, M 0 [50] 0 Stub Stub 255
57 Cocycle category SCI 1 Start 50 Started Oct 2013
58 Georgia's at-large congressional district special election, 1801 HIST 0 Same 0 un un 32 No edits since March 2013
59 Assembly of Manitoba Chiefs HIST 1 [51] 0 Stub Stub 95
60 Ichirō Ōkouchi BLP, M 0 [52] 0 Stub Stub 1917 Authority control doesn't count as a ref
61 Kamionka, Nowe Miasto County GEO, E 0 same 0 Start Start 21 no edits since March 2013
62 Mõisaküla, Torgu Parish GEO, E 0 [53] 0 Stub Stub 34
63 Alfred Kropp series CA, 1991+ 0 [54] 0 Stub Stub 211 "good" text was deleted, but never any refs
64 South African republic referendum, 1960 HIST 2 [55] 2 C C 926
65 Sundog Two-Seater BUS 2 C 0 New article Sept 9, 2015, 99 page views in Sept
66 Andreas Panayiotou BLP, M 0 [56] 0 Stub Stub 192
67 Reginald Brett, 2nd Viscount Esher BDP, M 3 [57]] 1 Start Start 675
68 Isla Tres Perros GEO, W 2 [58]] 2 un un 64 no talk page
69 Ptolemy (somatophylax) BDP, M 0 [59] 0 Stub un 142
70 Southeast, Washington, D.C. GEO, W 1 [60]] 1 Start Start 1256
71 M. David Mullen BLP, M 1 [61] 1 Stub Stub 254
72 Dating Naked CA, 1991+ 2 un 18123 TV series started in 2014
73 Charles C. Moore BDP, M 2 [62] 2 Stub Stub 117
74 Organi-cultural deviance SOC 1 Same 1 C C 147 Has "merge tag" other might be a 3! No edits since Feb, 2013
75 Dixieland Jass Band One-Step CA, 1990- 1 Start 233 Started 2014, no inline, but good biblio
76 James Fiorentino BLP, M 1 [63]] 1 un un 48
77 James E. O'Neill, Jr. BDP, M 1 Stub 41 Started Nov 2013, almost a 2 but ...
78 Om Prakash Munjal BDP, M 2 [64]] 2 Start Start 8319
79 Ljubezen.si CA, 1991+ 0 [65] 0 Stub Stub 40
80 Alister Munro Campbell BLP, M 1 [66] 1 Stub Stub 83
81 Titus (soundtrack) CA, 1991+ 2 [67] 2 B B 397 text quality, long quote and near blp vio
82 Gaana Bajaana CA, 1991+ 0 [68]] 0 Stub Stub 220
83 Holyrood Park GEO, E 3 [69] 3 Start Start 1999
84 Lobal Orning BUS 1 [70] 1 Stub Stub 276 Small town store closed in 2008
85 Art Gallery of Burlington CA, 1990- 1 [71] 1 un un 113
86 Pir Bukhsh BDP, M 0 [72] 0 un un 46 no talk page
87 Tseten Samdup Chhoekyapa BLP, M 2 [73] 2 Stub Stub 79
88 Colombo Town Guard HIST 0 [74] 0 Start Start 192 3 marked dead links
89 Garínoain GEO, E 0 [75] 0 Stub Stub 121
90 Dominican general election, 1990 HIST 0 Same 0 un un 59 Last edit May 2013
91 Level sensor SCI 1 [76] 1 un un 6167
92 Hareh Pak GEO, E 0 Stub 64 First edit Oct, 2013
93 Kłodzko Valley GEO, E 0 [77] 0 Start un 324
94 It Was the Best of Times CA, 1991+ 1 [78] 1 Start Start 744
95 Charles Goutzwiller BDP, M 0 Stub 67 1st started March 2015
96 Konradowo, Nowa Sól County GEO, E 0 Same 0 Start Start 43 Last edit Aug, 2013
97 Cinci Hoca BDP, M 1 [79] 1 Start Start 90
98 Julie and Ludwig CA, 1991+ 0 [80] 0 Stub Stub 159
99 Zman Tel Aviv BUS 0 [81] 0 Stub Stub 176
100 Barry Roche BLP, M 2 [82] 1 Start Start 371
101 Mario Aburto Martínez BLP, M 1 [83] 1 Stub Stub 580
102 Grande Tarantelle (Gottschalk) CA, 1990- 0 [84] 0 un un 110
103 Peter Freuchen BDP, M 3 [85] 2 Start Start 864
104 Suffolk GEO, E 1 [86] 1 B B 24864 tag at top, maybe 3 otherwise
105 Antonio Nibby BDP, M 1 [87] 1 Start Start 136
106 Presidential Citizens Medal SOC 1 [[88]] 1 Start Start 2295
107 Staines Urban District HIST 1 [89] 0 C C 158
108 Hassan N'Dam N'Jikam BLP, M 2 [90] 2 Stub Stub 3227
109 Ptychitidae BIO 1 [91] 1 Stub Stub 40
110 John P. White BLP, M 1 [92] 1 Start Start 241
111 Thawun GEO, E 0 Same 0 un un 45 Maps don't count as refs. Last edit 2011
112 Moonlight Point GEO, W 0 [93] 0 Stub Stub 28
113 Kumiko Takeda BLP, W 0 [94] 0 Stub Stub 368

Topic categories[edit]

I would like to record subject areas with each random article, to see in general whether quality has been changing differently across subjects. I'd also like to be able to reuse these mutually exclusivesubject areas in further studies, having them include about 5% of all articles up to a maximum of 15%. Based on the above table and chart, I'll use the following subject categories and sub-categories:

  • Biography
    • BLP, M; BLP, W; BDP, M; BDP, W
  • geography
    • GEO, W (for Western hemisphere); GEO, E
  • Culture and arts
    • CA, 1990-; and CA, 1991+ ("classical" vs. current)
  • Business, products and services
  • History, politics and government
  • Other society, sports, religion, philosophy and social science
  • Hard sciences, technology, and math
  • Biology, health, and medicine
  • Other/unclassifiable

See also[edit]

References[edit]