Wikipedia:Bots/Requests for approval/WikiCleanerBot 18
This is the current revision of this page, as edited by AnomieBOT (talk | contribs) at 17:13, 19 June 2020 (Removing Category:Open Wikipedia bot requests for approval from closed BRFA). The present address (URL) is a permanent link to this version.
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
New to bots on Wikipedia? Read these primers!
- Approval process – How this discussion works
- Overview/Policy – What bots are/What they can (or can't) do
- Dictionary – Explains bot-related jargon
Operator: NicoV (talk · contribs · SUL · edit count · logs · page moves · block log · rights log · ANI search)
Time filed: 13:40, Friday, June 12, 2020 (UTC)
Function overview: Fix some nowiki tags after internal links (cf. Wikipedia:CHECKWIKI/WPC 553 dump).
Automatic, Supervised, or Manual: Automatic
Programming language(s): Java (WPCleaner)
Source code available: On GitHub (especially algorithm 553)
Links to relevant discussions (where appropriate):
Edit period(s): Twice a month
Estimated number of pages affected: About 10k pages found during the dump analysis, not all can be fixed automatically, so a few thousand edits.
Namespace(s): Main
Exclusion compliant (Yes/No): Yes
Function details: Tools like VE or CX tend to create internal links with incorrect formatting (the hyperlink is not covering all the letters), because the user doesn't always select exactly on what the link should apply. Part of such errors could be fixed automatically (see for example what my bot did on frwiki for several thousand articles). Examples of situations where the bot can automatically fix the internal link:
- ’Ori tahiti,
[[Eugène Caillot|Eugène Caillo]]<nowiki/>t
replaced by[[Eugène Caillot]]
: displayed text is the same as the target of the link - Şabran (raion),
[[forêt]]<nowiki/>s
replaced by[[forêt]]s
: "s" is configured on frwiki as a possible extension (plural). Configuration for enwiki will also include "s", I will see with what is left after a first pass if other extensions can be added. - Œdipe et le Sphinx,
[[Jean-Auguste-Dominique Ingres|Ingres]]<nowiki/>
replaced by[[Jean-Auguste-Dominique Ingres|Ingres]]
: whitespace after the nowiki makes it useless. - İbrahim Tatlıses,
[[Divorce|Divorcé]]<nowiki/>s
replaced by[[Divorce|Divorcés]]
: "s" is configured on frwiki as a possible extension (plural).
Discussion
[edit]- Comment: Thanks for taking this on. It looks uncontroversial. Do you know if there is a phabricator bug report so that this can get fixed in VE? – Jonesey95 (talk) 17:35, 12 June 2020 (UTC)[reply]
- Hi Jonesey95. I don't know if there's a specific phabricator bug report for this, but I know the subject of incorrect links created by VE has been a long-standing issue... For example, you also have many links that are to an unrelated article (see for example, the list I'm generating on each dump analysis for frwiki for internal links like
[[1999|2000]]
). --NicoV (Talk on frwiki) 14:24, 14 June 2020 (UTC)[reply]- I haven't seen a bug report for that issue. I will be happy to file one. Do you have links to diffs? We don't link to years on en.WP, but I imagine that there are incorrect links being generated somewhere, given all of the other link-related bugs with VE. – Jonesey95 (talk) 15:27, 14 June 2020 (UTC)[reply]
- Hi Jonesey95. A few examples gathered from Recent changes with nowiki tag, just by looking at the last 20 edits:
- Mesta:
grazing lands
replaced by[[grazing land]]<nowiki/>s
- Chilean Army:
[[mounted band]]<nowiki/>s
added. - Manuel Romero Rubio:
Mexico City2 5,
replaced by[[Mexico City|Mexico Cit]]<nowiki/>y<ref name=":1" /><ref name=":3" />.
- Fox Networks Group:
[[Fox Corporation]] and [[Walt Disney Television]] while
replaced by[[Fox Corporation]]<nowiki/>while
- Attica Scott:
[[Shooting of Breonna Taylor|shooting death of Breonna Taylo]]<nowiki/>r
added - New Democratic Party:
[[Indo-Canadians|Indo-Canadian]]<nowiki/>to
and[[Third party (Canada)|fourth-largest party]]<nowiki/>in
added.
- Mesta:
- As you can see, it's quite frequent (and most of the other nowiki tags are just different problems...). I gave up on reporting this kind of things to VE team, I reported them years ago... --NicoV (Talk on frwiki) 06:07, 15 June 2020 (UTC)[reply]
- Jonesey95. If you were speaking about examples of links with an incorrect target, I don't have diffs, but I noticed articles with such problems when doing some trial edits, but I didn't try to find where they are coming from:
- 2015 New South Wales Cup:
[[Campbelltown Stadium|Newcastle Sports Ground]]
- Akaoni Studio:
[[Nintendo DSiWare|iPhone]]
- 2015 New South Wales Cup:
- They are hard to track by a bot (except for the dates, that's why I added #526 for frwiki). --NicoV (Talk on frwiki) 06:21, 15 June 2020 (UTC)[reply]
- Jonesey95. Even if you're not supposed to link to years on en.WP, I just started a dump analysis for #526, and it quickly found articles with such problems... Maybe some are false positives.
- Australian Labor Party:
[[1943 Australian federal election|1946]]
- Clement Attlee:
[[1956 Walthamstow West by-election|1955]]
- Ducati Motor Holding S.p.A.:
[[1995 British Superbike Championship season|1999]]
- European Free Trade Association:
[[1972 Norwegian European Communities membership referendum|1973]]
and[[1994 Norwegian European Union membership referendum|1995]]
- Spenser (character):
[[2006 in literature|2007]]
- William Ewart Gladstone:
[[1846 Newark by-election|1845]]
- 549:
[[619|624]]
- Australian Labor Party:
- Wikipedia:CHECKWIKI/WPC 526 dump should be generated in a few hours. --NicoV (Talk on frwiki) 06:53, 15 June 2020 (UTC)[reply]
- Jonesey95. More than 6k pages listed in Wikipedia:CHECKWIKI/WPC 526 dump. --NicoV (Talk on frwiki) 18:06, 15 June 2020 (UTC)[reply]
Most of those links in the WPC 526 dump look OK to me, per WP:YEARLINK.(sorry, I misread the first few links; I see that most of them appear to link to the wrong year.) It is links like [[1999|1999]] that are typically (but not always) discouraged, per WP:YEARLINK. The real problem links are the ones like[[2006 in literature|2007]]
. Over 6,000! Wow. – Jonesey95 (talk) 18:57, 15 June 2020 (UTC)[reply]- One note: I believe that many of the links related to sports seasons are intentional, like
[[1998 Pro Bowl|1997]]
, because, as the article says, "The 1998 Pro Bowl was the NFL's all-star game for the 1997 season." In the US, American football seasons take place almost entirely in the second half of a given year, with the post-season games at the beginning of the following year but designated as part of the previous year's "season". If that makes sense. If there is any way to avoid changing links where the link text is one number higher than the target year, please do so pending further discussion. – Jonesey95 (talk) 03:33, 16 June 2020 (UTC)[reply]- Hi Jonesey95. I can try to ignore
[[xxxx ...|yyyy]]
when xxx=yyyy+1. Do you think it's the same reason for the elections links (2 in the above examples) or it will be problems that are missed? Or do I need to configure the list of "..." for which xxxx=yyyy+1 should be ignored? The incorrect links problem for years if just the tip of the iceberg for incorrect links, but I don't know how I can find all the other ones... --NicoV (Talk on frwiki) 09:22, 16 June 2020 (UTC)[reply]- The election links generally take the form xxxx=yyyy-1, like 1836 United States presidential election, where the election took place in one year (in November), but the dispute over it took place while votes were being counted in the following months. I think the bot might need to ignore all cases where the years are different by one (higher or lower), since it will run into context problems. The links that differ by more than one look like they are mostly typos and copy/paste errors. – Jonesey95 (talk) 14:10, 16 June 2020 (UTC)[reply]
- Hi Jonesey95. I've modified the detection to allow configuring the minimum difference, so next time the list is generated, it will be trimmed down a bit. I think we should continue the discussion elsewhere, like Wikipedia talk:WPCleaner. I don't think it's possible to fix this error automatically (sometimes the link is correct, sometimes the displayed year is correct): on frwiki, I'm just adding a template after the link to request help from editors to fix the link. --NicoV (Talk on frwiki) 06:00, 17 June 2020 (UTC)[reply]
- The election links generally take the form xxxx=yyyy-1, like 1836 United States presidential election, where the election took place in one year (in November), but the dispute over it took place while votes were being counted in the following months. I think the bot might need to ignore all cases where the years are different by one (higher or lower), since it will run into context problems. The links that differ by more than one look like they are mostly typos and copy/paste errors. – Jonesey95 (talk) 14:10, 16 June 2020 (UTC)[reply]
- Hi Jonesey95. I can try to ignore
- Hi Jonesey95. A few examples gathered from Recent changes with nowiki tag, just by looking at the last 20 edits:
- I haven't seen a bug report for that issue. I will be happy to file one. Do you have links to diffs? We don't link to years on en.WP, but I imagine that there are incorrect links being generated somewhere, given all of the other link-related bugs with VE. – Jonesey95 (talk) 15:27, 14 June 2020 (UTC)[reply]
- Hi Jonesey95. I don't know if there's a specific phabricator bug report for this, but I know the subject of incorrect links created by VE has been a long-standing issue... For example, you also have many links that are to an unrelated article (see for example, the list I'm generating on each dump analysis for frwiki for internal links like
Approved for trial (50 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. Primefac (talk) 23:55, 15 June 2020 (UTC)[reply]
- Thanks Primefac. Trial complete. I've done the 50 edits, and bot behaved as expected. --NicoV (Talk on frwiki) 18:46, 16 June 2020 (UTC)[reply]
- I looked through all 50 test edits, and they all looked fine to me. In diff 1, I would have changed the link to "Wake Forest's" (I think this is the expected format on en.WP, although I can't find the guideline at the moment; I don't think you'll get any complaints), but the bot's "Wake Forest's" is acceptable. — Preceding unsigned comment added by NicoV (talk • contribs) 06:00, 17 June 2020 (UTC)[reply]
- Approved. Primefac (talk) 17:12, 19 June 2020 (UTC)[reply]
- I looked through all 50 test edits, and they all looked fine to me. In diff 1, I would have changed the link to "Wake Forest's" (I think this is the expected format on en.WP, although I can't find the guideline at the moment; I don't think you'll get any complaints), but the bot's "Wake Forest's" is acceptable. — Preceding unsigned comment added by NicoV (talk • contribs) 06:00, 17 June 2020 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.