Module talk:Language/Archive 1

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Early discussion

Module:Language is meant to house various (sub-)modules for templates that depend(ed) on {{ISO 639 name}}. The sandboxes of these presently use components of this module:

{{ISO 639 name/sandbox}} – uses Module:Language/name (complete; not for use in article space, so the template should probably go if/when it's deployed)
- Usage: {{ISO 639 name/sandbox|{language code}}}
  - Example: {{ISO 639 name/sandbox|fr}} → French
{{lang/sandbox}} – uses Module:Language/text (meant for replacing lang and all lang-x templates; complete, but does not italicise Latin text automatically like lang-x templates do)
- Usage: {{lang/sandbox|{language code}|{text}{|name=yes}}}
  - Example: {{lang/sandbox|fr|''bonjour''|name=yes}} → [bonjour] Error: {{Lang}}: text has italic markup (help)
{{link language/sandbox}} – uses Module:Language/external links (WIP)
- Usage: {{link language/sandbox|{language codes, |-delimited}}}
  - Example: {{link language/sandbox|it}} → {{link language/sandbox|it}}

For the language data, see here. — lfdder 22:16, 15 April 2014 (UTC)

I don't know what #1 is used for ((edit conflict)?), so I can't comment on that. For #2, I find it annoying that some languages are currently italicized, and others are not, and agree it's better to be consistent. But I think italics should be the default, ~~not just~~ for Latin~~, but for Cyrillic and Greek as well~~. There should of course be an override, for example for tables or block quotations, but in general foreign words are italicized per the MOS, and IMO the template should reflect that standard. Either way, implementation will require us to review all transclusions and fix up italicization. — kwami (talk) 22:39, 15 April 2014 (UTC)

#1 is meant for use in other templates -- all it does is grab the name for a code. I thought we're not meant to italicise anything but Latin, though? Has the MOS been changed? Any chance of getting a bot to fix them up when it's deployed do you think? — lfdder 22:45, 15 April 2014 (UTC)

I never heard of italicizing only Latin. Maybe you're thinking of taxonomic nomenclature / Latin names? The MOS just says, "use italics for phrases in other languages and for isolated foreign words that are not common in everyday English."

Yes, this is exactly the kind of thing we'd want a bot for. If the template generates italics automatically, we'd want to remove any italic formatting around the word. If we don't, we'd want to add italics if it's not already there. But with the latter we might get objections at the bot request about needing manual review of every case; removing redundant formatting shouldn't have that complication. — kwami (talk) 23:52, 15 April 2014 (UTC)

What it says at MOS:FOREIGN: "Text in non-Latin scripts (such as Greek, Cyrillic or Chinese) should not be italicized at all—even where this is technically feasible; the difference of script suffices to distinguish it on the page." I've never requested a bot before, so I'm gonna have to look into it. — lfdder 00:12, 16 April 2014 (UTC)

Oh, sorry, I though you meant the Latin language! Duh.

Yes, I suppose so. Though Cyrillic looks so much like Latin I don't always notice the difference if it's not italicized. — kwami (talk) 01:56, 16 April 2014 (UTC)

It's not a good idea to to automatically italicise, even where we know the script. Firstly the text could be in a table. Secondly there can be stylistic reasons for italicising, for example, due to use/mention distinction, this could result in un-italicised text, where italicised is meant (depending how it is coded). Finally items may want to be un-italicised to compare with the surrounding italic text:

Meads towards Haslingfield and Coton// Where das Betreten's not verboten.

All the best: Rich Farmbrough, 22:59, 21 April 2014 (UTC).

Yes, but where lang-x tpl's are used it nearly always is the case that the text needs to be italicised. It may be seen as a regression in functionality by those who have now been spoilt. I couldn't even have all the x icon tpl's deleted last I tried 'cause people couldn't come to terms with having to use a vertical bar. — lfdder 23:55, 21 April 2014 (UTC)

Module:Language/text overhaul

@Lfdder: I've overhauled Module:Language/text, implementing a few of the things I talked about on my talk page, and a few other things. The overhaul includes dedicated functions for other Lua modules so that we don't have to load Module:Arguments each time, more of the helper functions being made public so that they can be used in test cases, and using pcall on mw.title.makeTitle to avoid gremlins such as script errors when we are over the expensive parser function count. Feel free to tweak it as much as you want, and let me know if you have any questions about my alterations.

Also, I have a couple of questions for you:

If the user specifies an invalid language code, e.g. |code=abcde, what do you want the module to do? I made it return args.text, like if args.code or args.text are not present, but is that the best way of doing things?
Where did you get the second table values in Module:Language/data/wp languages from? Are they official MediaWiki fallback language codes, or do they mean something else?

Best — Mr. Stradivarius ^{♪ talk ♪} 11:17, 28 April 2014 (UTC)

Thanks for this. How is Module:Arguments different from the other two modules in this respect?

Is that not how it was before? I don't have a preference (where I assume the other option would be to throw an error), but maybe we should look into adding maintenance cats. By the way, if not args.name on line 56 won't work 'cause Language/name returns an empty string (rather than nil or false) when no match has been found for compatibility with {{ISO 639 name}}.
They're valid lang tags for invalid ones. For example, the ISO code gkm's been proposed since 2006 but hasn't yet been accepted (or rejected) and people are prone to using it for printing "Medieval Greek" -- we let them do that, but the code in the tag gets replaced with grc.

What's line 137 for? — lfdder 14:52, 28 April 2014 (UTC)

I'll take those points in order:

The other modules always have to be called in every code path. However, if you're using this module from another Lua module, you don't need to get the arguments from the frame, you can just supply them directly to p._plain or p._named, making Module:Arguments unnecessary.
Previously you were checking that args.code existed, then making args.name equal to getName{code=args.code}. You then concatenated args.name to the language wikilink string in the f.named function. I have to admit that I didn't trace through Module:Language/name to see what happened if args.code was invalid - I just assumed that there should be a check for it, as the usual Lua idiom for that would be to return nil. (And it's usually a good idea to check data you get from other modules if you can, as someone might edit them and subtly break things.) I tried it a few moments ago, and it returns the blank string. This means that you wouldn't have produced an "attempted concatenation with nil" error, but it does mean that the resulting wikitext would have looked like this: [[ language|]]: <span lang="abcde">foo</span>.
Makes sense, thanks. :)
Those are the functions that other Lua modules call that I mentioned above. They wouldn't be able to access the f table functions directly - they'd have to go through p.plain and p.named (and therefore Module:Arguments). So we make these extra functions available just for other Lua coders to access. (Although, really it's just a new pointer to the same function, because of how Lua encodes things internally.)

— Mr. Stradivarius ^{♪ talk ♪} 15:21, 28 April 2014 (UTC)

Sorry, just properly parsed what you said about args.name returning a blank string. Yes, that will need to be changed on line 56. I have to say that I'm tempted to get rid of the args table from the f table functions entirely - the more Lua-like idiom would be to have something like f.named(code, text, options), where code and text are strings, and options is an optional table containing the nocat argument, the type argument, plus anything else that you decide to add in later. Then you can assign the name to a local variable and pass it through to the other functions as needed. That way you don't have to keep track of how the args table has been changed - it always houses the original arguments, and the changes to those arguments are made obvious through the variable names. Well, that's what I usually try and do to keep things obvious, anyway. — Mr. Stradivarius ^{♪ talk ♪} 15:32, 28 April 2014 (UTC)

That's fine with me. Also, rather than change the conditional, would it not be a good idea to have Lua-dedicated function aliases in Module:Language/name like you did here? — lfdder 15:46, 28 April 2014 (UTC)

Yep, that's probably the way to go. Module:Language/name will have to go on the to-do list for now, though. — Mr. Stradivarius ^{♪ talk ♪} 15:55, 28 April 2014 (UTC)

I'll see if I can bring Module:Language/name in-line with the changes you made to /text if that's all right then. — lfdder 02:31, 29 April 2014 (UTC)

That would be great, thank you. :) — Mr. Stradivarius ^{♪ talk ♪} 02:55, 29 April 2014 (UTC)

When to italicise in Template:wikt-lang

I've only looked at the notice about {{wikt-lang}} at WP:WikiProject Languages and it seems that italicising is decided based on whether the language in question is written in the Latin script. Fair enough, but languages do get written using different scripts, so won't it make more sense to italicise based on which particular script is used on the occasion? Uanfala (talk) 22:24, 6 October 2016 (UTC)

Yes. At the moment, the module can only support languages written in a single script. I ran into this problem when adding scripts for Punjabi, which is written with Arabic and Indic scripts. I think what the module needs is a function to detect which script the input belongs to: that way, script does not have to be specified in another parameter. And it should be able to understand language-script code combinations such as sh-Latn or sh-Cyrl (and perhaps add the script code when it is not specified). I will look into it (perhaps soon, perhaps not), though I'm new to module coding. — Eru·tuon 23:24, 6 October 2016 (UTC)

Module:scripts/data on Wiktionary contains a list of scripts and, for some of them, the characters they contain, which can provide the data for a script detection function. — Eru·tuon 00:11, 7 October 2016 (UTC)

Now, if the script is not defined in a language data file, the module detects whether the word in the template is in the Latin script, and italicizes it unless the parameter |i= tells it not to. And the module prevents non-Latin scripts from being italicized. — Eru·tuon 08:48, 4 February 2017 (UTC)

Template:wikt-lang falls over on Proto-Germanic

I was just editing something and noticed that while {{lang}} can handle gem and gem-x-proto as language codes, {{wikt-lang}} cannot. Compare:

{{lang|gem|þiudiskaz}}: þiudiskaz

{{lang|gem-x-proto|þiudiskaz}}: *þiudiskaz

{{wikt-lang|gem|þiudiskaz}}: þiudiskaz

{{wikt-lang|gem-x-proto|þiudiskaz}}: þiudiskaz

Now I know that linking to wikt:Reconstruction:Proto-Germanic/þiudiskaz rather than just wikt:þiudiskaz#Proto-Germanic is different logic, but that doesn't seem very complex. We do have a workaround:

{{lang|gem|[[wikt:Reconstruction:Proto-Germanic/þiudiskaz|þiudiskaz]]}}: þiudiskaz

{{lang|gem-x-proto|[[wikt:Reconstruction:Proto-Germanic/þiudiskaz|þiudiskaz]]}}: *þiudiskaz

but is this an update that can be looked into at some point? — OwenBlacker (talk; please {{ping}} me in replies) 19:36, 2 April 2018 (UTC)

@OwenBlacker: What the module needs is a system to ensure that a Wikipedia code gem-x-proto (used in language tagging) is paired with the corresponding Wiktionary code gem-pro (used when looking up language information for linking). It would be most convenient if either code could be used in the template, while the module decides which to use for which function.

gem should not be used here, though that's the practice on Wikipedia; it properly refers to a language group (the Germanic languages) rather than a proto-language (Proto-Germanic). But perhaps the current incorrect practice should be supported nevertheless; it can be corrected in other ways. — Eru·tuon 21:04, 2 April 2018 (UTC)

@Erutuon: Makes sense. Yes, I agree that gem shouldn't really be used here; though note that it has a slightly different effect; it doesn't add the * to indicate reconstruction. — OwenBlacker (talk; please {{ping}} me in replies) 21:16, 2 April 2018 (UTC)

@OwenBlacker: Yeah, {{lang}} treats gem correctly as "a Germanic language" rather than "Proto-Germanic", so it doesn't add an asterisk. — Eru·tuon 21:28, 2 April 2018 (UTC)

Okay, so now Module:Language/data has a table of "redirects" where you can enter the Wikipedia code and the Wiktionary code that it should redirect to. An asterisk is, however, still required when linking to an entry for a reconstructed word, because on English Wiktionary any language may have entries in the Reconstruction namespace, even if it is not a reconstructed language: for example, wikt:Reconstruction:Latin/blancus ({{wikt-lang|la|*blancus}}). — Eru·tuon 02:05, 3 April 2018 (UTC)

How to refer to non-English wiktionary

So, how do I refer to for example the Turkish Wiktionary for the word 'ökse' [1]? I can't seem to figure it out. --Ahmedo Semsurî (talk) 14:19, 21 November 2017 (UTC)

@Ahmedo Semsurî: This template only links to the English Wiktionary. In which article do you want to link to Turkish Wiktionary? — Eru·tuon 01:57, 3 April 2018 (UTC)

@Erutuon: Thanks for the reply. It is the article Kurdish_phonology#Facultative_vowels. --Ahmedo Semsurî (talk) 16:42, 3 April 2018 (UTC)

@Ahmedo Semsurî: I see that English Wiktionary doesn't have an entry for ökse, but Turkish Wiktionary does. But still I don't see the purpose of linking to a Wiktionary that is not in the same language as this Wikipedia, aside from indicating that the word probably exists. Probably most people who are reading English Wiktionary can't understand Turkish. I can't understand the Turkish entry (except through Google Translate, and then only imperfectly because it's machine translation). Admittedly, the percentage of Turkish speakers reading an article about Kurdish phonology are probably greater than percentages elsewhere on Wikipedia, because there are many Kurds in Turkey. But it would be better to write a short entry in the English Wiktionary and link to that instead. — Eru·tuon 19:53, 3 April 2018 (UTC)

Categorisation on Template:Wikt-lang

I'm just editing something and noticed that — unlike {{lang}} — {{wikt-lang}} does not add categories like Category:Articles containing Old French-language text. Could someone take a look at the Lua and fix that, please? — OwenBlacker (talk; please {{ping}} me in replies) 20:01, 4 May 2018 (UTC)

@Trappist the monk and Eru·tuon: Any thoughts on this one, btw? — OwenBlacker (talk; please {{ping}} me in replies) 13:56, 29 June 2018 (UTC)

Missing language code for Potawatomi

For some reason, presumably a mismatch between Module:Language/data and Module:Lang/data, pot does not equate to Potawatomi language in {{wikt-lang}} but does in {{lang}}. Is this intentional? — OwenBlacker (talk; please {{ping}} me in replies) 19:33, 28 July 2018 (UTC)

Old language-collective categories

I just noticed that the old categories, like Category:Articles containing Germanic-language text still exist and, presumably, should be put up for deletion, given any articles that would have gone there will now be in either Category:Articles with text from the Germanic languages collective or Category:Articles containing Proto-Germanic-language text.

Is there an easy way of enumerating the categories that should now be up for deletion? — OwenBlacker (talk; please {{ping}} me in replies) 21:14, 26 June 2018 (UTC)

(I'm becoming something of a frequent visitor here, aren't I? 😳)

You should post about this on Template talk:Lang; it's {{lang}} that adds the "Articles containing x-language text" categories. This module doesn't add any categories (perhaps a problem). — Eru·tuon 21:18, 26 June 2018 (UTC)

@Erutuon: Oh sorry, for some reason I had thought that redirected to here. My mistake. — OwenBlacker (talk; please {{ping}} me in replies) 13:56, 29 June 2018 (UTC)

@OwenBlacker: It's confusing, but Module:Language and Module:Lang are different. The former handles {{wikt-lang}} and {{wt}}, the latter {{lang}} and the various {{lang-xx}} templates. — Eru·tuon 17:15, 29 June 2018 (UTC)

@Erutuon: Is there a reason not to merge the 2 templates? Surely they're doing roughly the same thing? — OwenBlacker (talk; please {{ping}} me in replies) 22:47, 15 July 2018 (UTC)

@OwenBlacker: You mean, merge {{lang}} and {{wikt-lang}}? (There are more than two templates mentioned here.) — Eru·tuon 23:16, 15 July 2018 (UTC)

@Erutuon: Yes, that's what I was suggesting; sorry didn't spot the potential for confusion there. — OwenBlacker (talk; please {{ping}} me in replies) 19:33, 28 July 2018 (UTC)

@OwenBlacker: It might be possible to make {{wikt-lang}} use some functions in Module:Lang, but merging them completely would be complex and I don't know what the benefit would be. {{wikt-lang}} links to Wiktionary and only allows language subtags (en) and Wiktionary language tags (ine-pro). {{lang}} doesn't link to Wiktionary and allows script, region, and variant subtags along with language subtags (en-GB for British English, en-emodeng for Early Modern English, ru-Latn for Russian in Latin script). Merging the templates would mean adding a parameter to turn on linking, which would require more wikitext than a separate template name. (For instance, {{lang|en|word|wikt=1}} or {{lang|en|word|wikt=yes}} are longer than {{wikt-lang|en|word}}.) It's best to stick with the shorter of the two options.

It would be more useful to figure out how to allow {{wikt-lang}} to support more language subtags: for instance, {{wikt-lang|sh-Cyrl|реч}} to link to a Serbo-Croatian word in Cyrillic script, or {{wikt-lang|en-emodeng|hath}} to link to an Early Modern English word. Then this template could output text with as much language information as {{lang}}. Categorization would be another good feature to add. — Eru·tuon 21:25, 28 July 2018 (UTC)

Sorry, I was expecting that they'd use the same underlying Lua, not that they'd necessarily be called using the same Template. But yes, adding subtags and categorisation to {{wikt-lang}} would definitely be A Good Thing™. Surely all of these would be easier to do if they were all together in the same Lua, called by different templates invoking different start points, no? — OwenBlacker (talk; please {{ping}} me in replies) 22:20, 28 July 2018 (UTC)

It's a good idea to do each task in just one module, if possible. It would be easier to maintain that way. But at the moment, I'm not sure I want to work on major changes to this module and try to get it to work with another module written by someone else. (There's also the practical issue that I don't have template editor rights, so I can't edit Module:Lang.) — Eru·tuon 22:56, 28 July 2018 (UTC)

@Erutuon: Those would be good reasons not to merge them. Fair enough :) — OwenBlacker (talk; please {{ping}} me in replies) 11:00, 1 August 2018 (UTC)

Proto-Balto-Slavic

I tried adding ine-bsl-pro for Proto-Balto-Slavic to the data module, but I still get an error when I try to use it, like on the Proto-Balto-Slavic article. Rua (mew) 18:02, 27 May 2019 (UTC)

@Rua: Fixed. There simply wasn't a code with that format yet so the code-matching function didn't recognize it. — Eru·tuon 19:21, 27 May 2019 (UTC)

The error is gone, but it links to Proto-Indo-European on Wiktionary instead. And now the Slavic link next to it shows a big red error as well... Rua (mew) 19:35, 27 May 2019 (UTC)

Ahh, I should have paid more attention. Now {{wikt-lang|ine-bsl-pro|*test}} links to the Proto-Balto-Slavic entry, and the module correctly recognizes sla-pro. — Eru·tuon 00:22, 28 May 2019 (UTC)

Not all diacritics for Slovene are stripped

kọ́st and črẹ̑vo only have the underdot stripped, leaving the acute and inverted breve in the link. Rua (mew) 16:45, 29 May 2019 (UTC)

@Rua: Fixed. — Eru·tuon 22:08, 29 May 2019 (UTC)

Errors on Early Germanic calendar

@Erutuon: There's a few problems here:

There's a big red error where Old Saxon text should be.
The macron isn't stripped from Old High German.
The Gothic term isn't italic.

Rua (mew) 13:02, 26 December 2019 (UTC)

@Rua: The Gothic thing was a flaw in the module, which I fixed. As for osx, apparently mw.language.fetchLanguageName doesn't know about it, so it has to be added in Module:Language/data. And Module:Language/data doesn't have a full complement of entry name replacements for all languages; at the moment they have to be added as needed. — Eru·tuon 20:32, 26 December 2019 (UTC)

I was puzzled why it understood goh but not osx, despite neither of them being defined in that module. Is it because fetchLanguageName knows about one but not the other? Rua (mew) 21:33, 26 December 2019 (UTC)

Exactly. It's puzzling because osx is a regular ISO 639-3 code and I'd've thought mw.language.fetchLanguageName would have the English names for it since it's available online. (The function is defined here and apparently involves code here and here.) It might be worth filing a bug report for. — Eru·tuon 21:53, 26 December 2019 (UTC)

Slavic languages ISO 639-2 / 5 code absent

sla the ISO 639-2 / 5 code for the Slavic languages^[1] isn't entered here and I get an error when trying to format things as that in {{Wikt-lang}}. I have no idea how to enter it otherwise I would've tried. —I'llbeyourbeach (talk) 13:43, 9 November 2020 (UTC)

This template is for linking to language headers in Wiktionary entries, and "Slavic languages" is not a valid language header (nor would "Germanic languages", gem, be). Are you trying to link to the Wiktionary entry for a word in a particular Slavic language or something else? — Eru·tuon 19:49, 9 November 2020 (UTC)

References

^ "sla | ISO 639-3". iso639-3.sil.org. Retrieved 2020-11-09.

Language rkt

A recent edit at T–V distinction added entries for Kamtapuri language which display an error discussed at User talk:Msasag#Language rkt. The issue is that previewing the following wikitext in a sandbox gives the error message shown.

{{wikt-lang|rkt|তুই}}
Lua error in Module:Language at line 197: Name for the language code "rkt" could not be retrieved with mw.language.fetchLanguageName, so it should be added to Module:Language/data.

I'm hoping someone understands how to fix this. If it helps, the discussion points out that Module:Language/data/ISO 639-3 contains:

["rkt"] = {"Kamta", "Rangpuri"},

Johnuniq (talk) 02:47, 2 May 2020 (UTC)

@Johnuniq: At the moment the way to get a new language code working is to add Wiktionary's language name in data["languages"] in Module:Language/data; in this case

["rkt"] = {
	["name"] = "Kamta",
},

To find the language name, I looked for the language code or language name in the search box in wikt:Module:languages, and found the most appropriate entry in one of the modules that start with Module:languages/data (in this case, wikt:Module:languages/data3/r), and got its field 1, which is the "canonical name" used in language headers. Here the Wiktionary and ISO codes agree and Wiktionary has one of the ISO 639-3 names, but that's not always the case.

This process is not especially obvious to people who aren't familiar with Wiktionary's language data. I think it could be simplified by maintaining a local copy of the necessary data (language name and entry name replacements) here and creating a mapping from ISO code (or maybe multi-part IETF language tag) to Wiktionary language, though I'm not sure of all the details. — Eru·tuon 21:08, 9 November 2020 (UTC)

@Eruton: Thanks, I'll bear this in mind and your post is a record of what to do in the future. However the fix is not currently needed and it's above my language understanding so I'll leave it. Johnuniq (talk) 23:27, 9 November 2020 (UTC)

WP's wikt-lang should support Wiktionary's language codes

See wikt:Wiktionary:List of languages/special.

Example: Old Irish is designated sga.

WP template IPA-* accepts it: Old Irish pronunciation: [/ˈol͈aṽ/]
WP template wikt-lang does not: ollam Sai ¿?✍ 11:02, 15 March 2021 (UTC)

Hm. It seems to have worked here. But it didn't when I was editing (in draft) Ollam just now; gave an error about unsupported language code. Unable to replicate. Sai ¿?✍ 11:05, 15 March 2021 (UTC)

lang for Proto-Slavic

*bogatьstvo ({{wikt-lang|sla-pro|*bogatьstvo}}) should emit HTML similar to *bogatьstvo ({{lang|sla-x-proto|bogatьstvo}}) but in the first case the lang attribute has the illegal sla-pro and the second has sla. --Error (talk) 12:50, 25 June 2021 (UTC)

Italics

The documentation for Template:Wikt-lang states that italics can be disabled, but this functionality doesn't seem to actually work and I can't figure out why. Regardless of whether |i=, |italic=, and |italics= is used, and whether it is set to "no" as the documentation suggests or to 0, it doesn't disable the italics.

Can someone fix this so that the italics actually can be disabled? – Scyrme (talk) 13:29, 1 April 2022 (UTC)

Fixed. Jberkel (talk) 11:35, 7 September 2022 (UTC)

Thanks! – Scyrme (talk) 12:44, 7 September 2022 (UTC)

[1] "sla | ISO 639-3". iso639-3.sil.org. Retrieved 2020-11-09.

[1]