Template:Did you know nominations/Suffix automaton

From Wikipedia, the free encyclopedia
The following is an archived discussion of the DYK nomination of the article below. Please do not modify this page. Subsequent comments should be made on the appropriate discussion page (such as this nomination's talk page, the article's talk page or Wikipedia talk:Did you know), unless there is consensus to re-open the discussion at this page. No further edits should be made to this page.

The result was: promoted by Cwmhiraeth (talk) 06:22, 13 November 2020 (UTC)

Suffix automaton

Suffix automaton of abbcbc
Suffix automaton of abbcbc
  • ... that Weiner's 1973 suffix tree construction algorithm while building suffix tree of the string constructs suffix automaton of the reversed string as an auxiliary structure? (pictured)
    • ALT1: ... that Weiner's 1973 suffix tree construction algorithm constructs suffix automaton of the reversed string as an auxiliary structure? (pictured)
    • ALT2: ... that suffix automaton (pictured) appeared as an auxiliary structure in 1973 suffix tree scholarly article, 14 years before it was discovered and introduced to scientific community as an independent concept?

5x expanded by Adamant.pwn (talk). Self-nominated at 04:39, 4 September 2020 (UTC).

  • Hi, before I do a review, could you simplify the hook please? You have the word suffix three times, and I don't really understand it because there doesn't seem to be a verb. Thanks, Yoninah (talk) 13:01, 13 September 2020 (UTC)
  • Hi, Yoninah! I changed it a bit, hope it's clearer now. The most interesting part here is that Weiner's algorithm is of 1973, but suffix automaton was formally introduced as an independent concept only in 1987. But I'm not sure how to include this part without stepping in too much of verbosity. The word suffix is a part of "suffix tree" and "suffix automaton" titles, it stands for suffix as for the last part of arbitrary string in computer science. adamant.pwncontrib/talk 13:14, 13 September 2020 (UTC)
  • Thank you, but please do not overwrite hooks. It makes it impossible for other editors and administrators to follow the discussion. I have restored the thread. Yoninah (talk) 13:22, 13 September 2020 (UTC)
  • @Adamant.pwn: The alt hook is still too technical and the lead of your article is very difficult to read for someone who is not familiar with this subject. Could you find a hook angle that, while mentioning a technical term, would still appeal to a broad readership and make readers want to click on the article? Yoninah (talk) 13:24, 13 September 2020 (UTC)
  • Yoninah, I hope new version is more appealing to a broad audience, what do you think? If it's still too technical, I would be grateful if you point on some specific areas for improvement. adamant.pwncontrib/talk 16:30, 13 September 2020 (UTC)
  • @Adamant.pwn: thank you. In the hook and the article, you are neglecting to use the definite article ("the") and the indefinite article ("a") before nouns, which makes it hard to read. Please have someone edit your article with an eye to English grammar. I'm wondering if you can work the words computer science into your hook? Like:
  • ALT2a: ... that a suffix automaton, a string of symbols used in computer science, appeared as an auxiliary structure in a 1973 scholarly article 14 years before it was discovered to be an independent concept? Yoninah (talk) 16:52, 13 September 2020 (UTC)
  • @Yoninah: Sorry for (in)definite article thing, my native language does not have them, so I often forget about them... I left a request on WP:GOCER, but it might take time to be processed. I'll try to ask some friends who are more familiar with this stuff as well. Meanwhile, I've updated the hook with your recommendations.
  • ALT3: ... that a suffix automaton (pictured), a data structure used in computer science, appeared as an auxiliary structure in a 1973 scholarly article 10 years before it was discovered as an independent concept? Source: "A clean version of Weiner's linear-time compact-subword-tree construction simultaneously constructs the smallest deterministic finite automaton recognizing the reverse subwords." (Efficient and Elegant Subword-Tree Construction )
  • adamant.pwncontrib/talk 17:49, 13 September 2020 (UTC)
  • Article has been listed at WP:GOCE, thank you. Yoninah (talk) 13:31, 14 September 2020 (UTC)


General: Article is new enough and long enough
Policy: Article is sourced, neutral, and free of copyright problems
Hook: Hook has been verified by provided inline citation
Image: Image is freely licensed, used in the article, and clear at 100px.
QPQ: Done.

Overall: This review is in respect of the new image (which appears in the article), and ALT3 which was contributed by the creator. ALT3 is a bit long, but I accept that it cannot be shortened without sacrificing clarity for the general reader. The article had been expanded enough and recently enough on 4 September when the nom was listed. I have had to take all sources AGF whether inline or not because I am not qualified to do otherwise, however I have no reason to mistrust this editor. GOCE completed the copyediting on 7 October (confirmed on Adamant's talkpage). No QPQ is required. The only remaining issues are the DYK image and missing definite/indefinite articles in the text. (1) The DYK image does not appear in the text; please add it to the article OR add the article image here. (2) Examples of missing articles include "prefix trees are special kind of deterministic finite automata" and "Suffix trie" of the word {\displaystyle S}S is a prefix tree recognizing set of its suffixes". I accept that we need a computer scientist to check for missing articles, because (for all I know) in some cases the technical terms may require the absence of articles. So could somebody please call for a computer scientist to copyedit for missing articles only? When those issues are resolved, this nomination should be good to go. Note: I have struck out all previous ALTs to prevent confusion. Storye book (talk) 14:33, 16 October 2020 (UTC)

  • Thanks for the review! As the hook doesn't mention suffix trees anymore, I think we may go with the following image instead. I will try to ask someone to copyedit articles and update here afterwards. adamant.pwncontrib/talk 01:11, 20 October 2020 (UTC)
Suffix automaton of abbcbc
Suffix automaton of abbcbc
  • Thank you, @Adamant.pwn:. We now have an image which appears both in the article and with the DYK. Having thought about it, I now have a worry about the clarity of the image. I can hardly see it at DYK-size on my pc monitor, although it is clear at full size. Are you able to re-draw it with thick black lines, using e.g. Gimp or photoshop? I can do it for you, but I am not a computer scientist, so it would be best if you had full control over it. I look forward to seeing the missing words added to the text. (@Yoninah: please could you now remove the image at the top of the template? Thanks.) Storye book (talk) 10:27, 20 October 2020 (UTC)
  • Thank you, @Yoninah:. All that remains now is making the DYK image clearer (with thick black lines?) and the matter of the missing definite/indefinite articles ("the", "a", etc.). Storye book (talk) 12:53, 20 October 2020 (UTC)
  • Thank you, but the image needs to be in the article too. Yoninah (talk) 19:39, 24 October 2020 (UTC)
  • Yes, I guess we must wait for Adamant. Storye book (talk) 11:05, 25 October 2020 (UTC)
  • Hi! Unfortunately, I don't really like your version because it's not scalable (not SVG) and not completely black (one of arrows is slightly blue, I think). I would suggest using this version (placing it to the article as well). adamant.pwncontrib/talk 19:33, 25 October 2020 (UTC)
Suffix automaton of abbcbc
Suffix automaton of abbcbc
  • Thank you, Adamant. Please accept my apologies; I gave the artist both svg and jpg to choose from, not knowing the advantage of svg. Thank you for finding a useable image. I have removed the unwanted one. We still have the problem of missing definite/indefinite articles. If you like, I could offer to correct it for you according to normal English grammar, and you could then look at the diff and correct my changes where they interfere with scientific terminology? Storye book (talk) 20:30, 25 October 2020 (UTC)
  • Yes, please do. I couldn't find anyone to help unfortunately. Thanks! adamant.pwncontrib/talk 21:35, 25 October 2020 (UTC)
  • @Yoninah: please could you kindly remove the top image and replace it with the new image? Thank you. Storye book (talk) 20:30, 25 October 2020 (UTC)
  • Replaced image at top. Yoninah (talk) 20:32, 25 October 2020 (UTC)
  • I have tentatively copyedited the article for missing definite/indefinite articles (diff).
  • Thank you, @Adamant.pwn:, for agreeing to check this. If you are not sure about any part of my edit, please correct it. Storye book (talk) 22:48, 25 October 2020 (UTC)
  • I see that the article has now been checked. Thank you both for all your hard work and patience with this one. Good to go with the above image and ALT3. Storye book (talk) 12:12, 26 October 2020 (UTC)
  • @Storey book: Did you check that all the equations are cited? I see a few of them that aren't. Also, it's not clear from the article that the 1987 paper defined it as an independent concept (with a cite). Yoninah (talk) 15:35, 3 November 2020 (UTC) Re-ping @Storye book: Yoninah (talk) 15:36, 3 November 2020 (UTC)
  • @Yoninah. Sorry I didn't realise that equations had to be cited. I'm rubbish at maths, but I believe an equation is any set of numbers and squiggles balanced either side of an equals sign? If that's the case, then the article would need a trillion new citations, because there are equals signs throughout. Maybe you mean the theorems in boxes need citations? I count nine of those. @Adamant.pwn:? Storye book (talk) 17:42, 3 November 2020 (UTC)
  • I mean the introductory sentence to each equation, which tells us where the page creator found the equation. Some are cited, some not. Yoninah (talk) 17:45, 3 November 2020 (UTC)
  • @Yoninah: Suffix automaton itself was introduced in 1983 work of Blumer et al., then, also in 1983, Chen and Seiferas showed that the same structure was used in Weiner's 1973 paper. It's seemingly a typo on my side to say it's 14 years, it should be 10 years. My apologies for that. The key fact here is, although some earlier papers touched suffix automaton tangentially, Blumer et al's work was the first one to focus attention on it, so it is perceived by scholars to be the work which introduced it. As for the equations I need to ask you to be more specific. I don't think it would be reasonable to add a citation after each and every "=" sign. Generally, most paragraphs have citations for them at the end of the paragraph unless there is a need to verify some specific sentence. adamant.pwncontrib/talk 18:18, 3 November 2020 (UTC)
  • Edit conflict: I had written the following: If I read each section as a whole as a representation of a piece of logic, then as far as I can see, each section has a citation (mostly at the end of section). I can't understand the maths enough to be able to argue with it, but I can see where it comes from. Sort of. Forgive me for not being able to see the problem. @Adamant.pwn:? Storye book (talk) 18:27, 3 November 2020 (UTC)
I amended ALT3 per comment above. adamant.pwncontrib/talk 22:27, 6 November 2020 (UTC)
Thank you, Adamant. @Yoninah: I am happy that each sentence introducing a set of maths logic is now cited. Please could you kindly help me by checking that your query (it's not clear from the article that the 1987 paper defined it as an independent concept (with a cite).) has now been resolved? Once that is resolved, I understand that I can then give the green tick. Thank you. Storye book (talk) 10:44, 7 November 2020 (UTC)
  • @Storye book: the page creator may be explaining it here on this template, but the article does not say that the 1987 paper defined it as an independent concept, nor does it provide a cite for that fact. In fact, the 1973 date also is not explicitly stated in the article; it is buried in the cite. This does not satisfy WP:DYK#Cited hook a: The hook should include a definite fact that is mentioned in the article.
  • I have added "citation needed" tags to the paragraphs that lack them.
  • I must say that the whole article is very densely written and rather difficult for the average reader to understand, making it impossible for us to adequately review it. Should we ask someone in a math department to review it for accuracy? Yoninah (talk) 17:52, 7 November 2020 (UTC)
  • @Yoninah. Thank you for checking that for us. In answer to the question of bringing in a mathematician as a second reviewer, yes I'm happy to agree to that. I would have taken the subject matter AGF, but since you have presented a different view I think we should have it checked. Storye book (talk) 18:31, 7 November 2020 (UTC)
  • Please could we have a second reviewer to check the calculations, theorems and formulae in this article? I am not withdrawing as a general reviewer, but we need an expert opinion on the accuracy of the article content (see above comment by Yoninah). Thank you. Storye book (talk) 18:31, 7 November 2020 (UTC)
  • Dates are accurate in the History section:
  • "The concept of suffix automaton was introduced in 1983 by a group of scientists..."
  • "In 1983, Mu-Tian Chen and Joel Seiferas independently showed that Weiner's 1973 suffix-tree construction algorithm ... constructs a suffix automaton of the reversed string as an auxiliary structure.".
  • Wording about independent structure is just explanatory here, we may drop it and rewrite hook to align exactly with what's presented in the article:
  • I also placed the citations where you placed "citations needed" templates. adamant.pwncontrib/talk 20:10, 7 November 2020 (UTC)
  • Thank you for taking care of the cites and the wording under History. I think ALT4 reads well and will help readers find the hook fact in the article. Perhaps David Eppstein could help us with a quick look-over of the math? Yoninah (talk) 20:55, 7 November 2020 (UTC)
  • (Edit conflict twice) I have struck out ALT3 in accordance with the above discussion. As a reviewer, I am happy with all aspects of the DYK nomination except for the accuracy of the article content (as queried by Yoninah) and ALT4 which I'm not scientifically qualified to judge. I look forward to hearing from a maths expert. Thank you Adamant for your patience with this process. Update: I accept Yoninah's judgement of ALT4, please consider it approved. Storye book (talk) 21:02, 7 November 2020 (UTC)
  • Re the request to review the article for correctness: It's a standard concept in the study of algorithms on strings, one I'm reasonably familiar with, and I didn't see any obvious mistakes either in its history or its technical content. I do think that, despite the technical nature of the content, the article could be written more accessibly than it is, and if this were a good article review I would fault it for that, but for DYK rules the level of writing isn't as much an issue as whether a general audience would find something of interest in the hook and the article. I don't think I'm really the best judge of that as I'm not really "general audience" for this topic. —David Eppstein (talk) 21:12, 7 November 2020 (UTC)
  • Thank you very much for looking at this, David Eppstein. While I understand that most of the article is technical, the lead shouldn't be. At the very least, the first sentence should be understandable! But it's not. Yoninah (talk) 21:49, 7 November 2020 (UTC)
  • Looks like the guideline agrees with me. Yoninah (talk) 21:50, 7 November 2020 (UTC)
  • I swapped formal definition and its informal explanation in the lead. Do you think it's more understandable this way? I wouldn't like the idea to delve further in explanation of the very basic computer science concepts, such as "graph" and "concatenation", for example. adamant.pwncontrib/talk 10:01, 8 November 2020 (UTC)

Thank you, David Eppstein, Yoninah and Adamant.pwn. In the light of the above discussion, I have just checked the article's leader for at least one sentence of general accessibility, and: aaaaggghh. The general audience that we can reasonably aim at on WP is an intelligent person with an average-to-good education, who may know a lot about some things but knows little about some other things (such as higher maths and computer science, in this case). I belong in that category as you know, so if you can give me at least a vague idea of what a suffix automaton is then we can get somewhere.

Let's make a start, then. Would it make sense if the header started with something like: In computer science a suffix automaton is a type of graph which shows connecting paths between (what, exactly?) That's as far as I can get. If we can leave the string out of that first explanatory sentence, that would help, because the article title is only "Suffix automaton" on its own. What worries me is that as an outsider I can't even tell whether the suffix automaton refers to hardware (electrical connections?) or software (logical paths?) or a combination between the two? Or something else completely? If you can put a sentence clarifying basic subject matter at the beginning of the header, you can follow it with all the material that is already there, without sacrificing anything. I accept that the rest of the header and the article should not include explanations for the unitiated because that would undo the precision of the thing. Storye book (talk) 11:25, 8 November 2020 (UTC)

I've added a sentence to the lead which hopefully explains in very basic and generic terms what it is (an efficient data structure in computer science) and what it's used for (some stuff with substrings of a given string). I hope that clarifies it a bit, please let me know if any further clarification is needed. adamant.pwncontrib/talk 00:49, 9 November 2020 (UTC)
Thank you, Adamant. It looks to me as if a suffix automaton one of the ways of understanding the logic of data patterns, and being able to locate bits of data within that. If I'm sort of roughly right, then your explanation has worked. What do you think? Storye book (talk) 11:26, 9 November 2020 (UTC)
  • It's correct to some extent, yes. Not of understanding though, but rather exploring it programmatically, as understanding implies some human being involved. adamant.pwncontrib/talk 13:03, 9 November 2020 (UTC)
  • (ec)Thanks. The first sentence is a wee bit more understandable, though I'm wondering if the grammar is correct. Should it read:
  • In computer science, a suffix automaton is an efficient data structure for representing the substring index of a given string, which allows the storage, processing, and retrieval of compressed information about all its substrings.
  • But the second sentence makes no sense to the layman at all. Yoninah (talk) 11:30, 9 November 2020 (UTC)
  • I'm happy about the second sentence (and all that follows) being like somebody doing maths behind a locked door. We outsiders have had our simplified explanation in the first sentence, and in the second sentence the people with the key to the locked door get their precise explanation, unsullied by outsider approximations. Do the WP rules say that all of the header of a science article should be aimed at people who don't get science? Storye book (talk) 11:44, 9 November 2020 (UTC)
  • I amended the first sentence per your suggestion, thanks! adamant.pwncontrib/talk 13:03, 9 November 2020 (UTC)
  • @Storye book: sorry, but the lead is still incomprehensible. But since we got the okay from our resident computer scientist as to the accuracy of the article, we might as well promote it. Caveat emptor! Yoninah (talk) 16:50, 9 November 2020 (UTC)
  • Thank you, Yoninah. I know what you mean. I have a friend who is a senior lecturer in physics, and when I ask him to explain any physics whatsoever, he always demands three whiteboards and several daysworth of listening, because he knows I'll run away. Sigh. So fingers crossed for this one, then. Storye book (talk) 17:34, 9 November 2020 (UTC)
  • with ALT4. and the above image. Storye book (talk) 17:34, 9 November 2020 (UTC)