Welcome to the talk page for WikiProject Linguistics. This is the hub of the Wikipedian linguist community; like the coffee machine in the office, this page is where people get together, share news, and discuss what they are doing. Feel free to ask questions, make suggestions, and keep everyone updated on your progress.

Please comment on discussion[edit]

There is a discussion proposing that part of the Phoenician alphabet article might need to be split off into an article that is to be named Canaanite scripts. Please comment at Talk:Phoenician_alphabet#This_page_might_need_to_be_split. Debresser (talk) 12:09, 26 July 2020 (UTC)

Could someone undo the May 16th edit on article "Nabatean Alphabet"?[edit]

See . This edit is wrong for reasons explained at Talk:Nabataean alphabet#Problems in the table. Due to a combination of Coronavirus isolation and the stupid encryption protocol upgrade, I'm editing with a non-fully-Unicode-compliant tool, so I can't do it myself right now... AnonMoos (talk) 21:54, 26 July 2020 (UTC)

New dab page Islamic language[edit]

FYI I have opened a discussion about the rationale of the new dab page in Talk:Islamic language. –Austronesier (talk) 07:13, 28 July 2020 (UTC)

Draft:Powari language[edit]

Could somebody look at Draft:Powari language. It's been kicking around WP:AfC for two years. We're trying to figure out if it's worth keeping, but needs a Subject-matter expert. -- RoySmith (talk) 13:35, 28 July 2020 (UTC)

Links to DAB pages[edit]

I have collected some articles which have language- or linguistics-related links to DAB pages, where expert attention would be welcome. Search for "disam" in read mode and for "d" in edit mode, and if you solve any of these puzzles remove the {{dn}} tag and post {{done}} here.

Thanks in advance, Narky Blert (talk) 14:03, 28 July 2020 (UTC)

Third-person pronoun[edit]

Need more eyeballs at Third-person pronoun, which is an OR disaster. I'm particularly interested in the table in this section, especially the portion under the header, Gender-neutral singular pronouns. This entire portion of the table should come out; it is populated by hapax and other obscure science fiction trivia, and is completely unencyclopedic. Thanks, Mathglot (talk) 08:25, 4 August 2020 (UTC)

I have seen this page a while ago and closed it with mixed feelings of shudder and cringe, for various reasons. It's not about third-person pronouns in general. Much of it is about gender-distinctions, with a gender-distinction-normative bias: e.g. pronouns in gender-neutral languages are called "gender-inclusive", even if these languages never had an exclusion problem. Plus lengthy stray material that is not about third-person at all, e.g. the subsection Rapa; not to mention the non-notable hapax from hobby conlangs. –Austronesier (talk) 11:18, 4 August 2020 (UTC)

I moved it to Gender neutrality in languages with gendered third-person pronouns and removed all the sections on languages that didn't fit the topic, which left only Swedish and Norwegian and examples of the opposite trend it CJK. Please don't hesitate to try something else. — kwami (talk) 18:48, 10 August 2020 (UTC)

Is this reference on the Hajong language reliable?[edit]

The reference is:

Publisher: SIL

Context: A citation to this was flagged ([1]). The issue at the heart is the alphabet used for this language. The language is strongly associated with an ethnic group and there is an effort to preserve and advance it into a literate form. Currently a number of different scripts are being used by the speakers, who are distributed in Assam in India and Bangladesh. The scripts in use are Latin, and Bengali-Assamese script, and from among these both the alphabets, Bengali and Assamese, are being used.

The point here is, is this reference reliable for the claim that the Assamese alphabet is one of the alphabets used to write the Hajong language, from the examples in the book and from the sentence - "Each word in the word lists is written first in Roman script followed by Assamese script in brackets." (p.1)

Thanks! Chaipau (talk) 15:13, 4 August 2020 (UTC)

@Chaipau: Sorry, I have just noticed your question here. The answer is yes, it is just as reliable as the source (Ethnologue) for the use of the Bengali alphabet in the same box of the table. Both sources are published by SIL. But actually, I find this SIL source[2] most enlightening (p. 27), which is also cited in Hajong language. This survey indicates that some Hajong speakers actually are less sensitive to these details than Bengalis/Assamese would expect. –Austronesier (talk) 09:48, 14 August 2020 (UTC)
@Austronesier: You have gone ahead and responded to the tag as well. Thank you! Chaipau (talk) 11:50, 14 August 2020 (UTC)

Do IPA click letters require velar/uvular symbols?[edit]

Kwamikagami edited {{IPA non-pulmonic consonants}} to show all click letters with velar/uvular symbols preceding them, sayingʘ ǀ ǃ ǂ ǁ⟩ only symbolize releases according to the the Handbook of the IPA, taking the fact pp. 20–1 of the Handbook show ⟨k͡ʘ k͡ǀ k͡ǃ k͡ǂ k͡ǁʰ⟩ to illustrate the sounds through examples from ǃXóõ and Xhosa. p. 10 of the Handbook says:

'Velaric' airstream sounds, usually known as 'clicks', again involve creating an enclosed cavity in which the pressure of the air can be changed, but this time the back closure is made not with the glottis but with the back of the tongue against the soft palate, such that air is sucked into the mouth when the closure further forward is released. The 'tut-tut' or 'tsk-tsk' sound, used by many English speakers as an indication of disapproval, is produced in this way, but only in isolation and not as part of ordinary words. Some other language s use clicks as consonants. A separate set of symbols such as [ǂ] is provided for clicks. Since any click involves a velar or uvular closure, it is possible to symbolize factors such as voicelessness, voicing, or nasality of the click by combining the click symbol with the appropriate velar or uvular symbol: [k͡ǂ ɡ͡ǂ ŋ͡ǂ q͡ǃ].

This doesn't strike me as saying the letters must be used with velar or uvular symbols, especially when the letters are seen by themselves in works, including the IPA Illustration for Sandawe.

Is it true that ⟨ʘ ǀ ǃ ǂ ǁ⟩ can only represent releases? How should our {{IPA non-pulmonic consonants}} be arranged? Nardog (talk) 08:27, 10 August 2020 (UTC)

I don't see that the Handbook says affricates "must" be written with two symbol either, they just give a couple examples written that way on p. 22. The examples are all we have to go on.
It's very common to leave out the accompaniment if it's <k>. However, it's less common to do so if there's a velar-uvular distinction to worry about.
Before the Handbook, the illustrations of the IPA implied that the clicks needed to letters but then wrote the tenuis clicks with only one. That was evidently cleaned up for the Handbook.
As far as the Sandawe illustration, note that the many of the illustration in the Handbook omit the tie bars from the affricates (Igbo even uses the tie bar bar for labial-velars but not for affricates!), but it's still proper to write them with.
When giving a prescriptive explanations, we should be careful not to take shortcuts that might confuse the reader. We can always explain common shortcuts and use them ourselves in the language articles, but when readers refer back to the IPA articles they should be clear on official usage.
For several years, there was a shift toward using the simple click letters for the complete consonant, with diacritics for anything else except for the velar-uvular distinction. Now usage seems to be swinging back. There's an upcoming phonetics volume on clicks coming out, and much of the transcription is explicit about both places, though typically using superscripts rather than tie bars (a convention also commonly seen for affricates and labial-velars). — kwami (talk) 08:45, 10 August 2020 (UTC)
The official IPA chart only says "Affricates and double articulations can be represented by two symbols joined by a tie bar if necessary" (emphasis added). I don't understand how that's germane even if what you say about clicks were true. Nardog (talk) 08:51, 10 August 2020 (UTC)
It's germane because it's exactly parallel. You object that the IPA doesn't explicitly require two letters for a click, but then it doesn't explicitly require two letters for an affricate either. So, should we assume that the second letter is optional? We can only go by the examples. The Chart doesn't give any, the Handbook does, and in the 'Guide to IPA notation' all clicks and affricates are written by two letters joined with tie bars. — kwami (talk) 08:56, 10 August 2020 (UTC)
It's possible to do all sorts of things with the IPA. It's possible to use ⟨c⟩ for an affricate, for example, but that doesn't mean we should define it as one. — kwami (talk) 08:56, 10 August 2020 (UTC)
What the IPA doesn't explicitly require for an affricate is a tie bar, not two letters. Nardog (talk) 09:02, 10 August 2020 (UTC)
It never says that two letters are required, and indeed in some of the illustrations affricates are written with only one letter (e.g. Korean, Sindhi).
Note also with Sandawe that they incorrectly transcribed the glottalized nasal clicks as ejective. They just adopted common conventions, they weren't being precise.
It's also quite common to use ⟨ɾ ɽ⟩ for laterals, e.g. in Indic languages, but that doesn't mean [ɾ ɽ] are lateral. — kwami (talk) 09:05, 10 August 2020 (UTC)
(edit conflict) But you can't possibly represent a double articulation with one letter, with a tie bar or not. So "if necessary" is clearly in reference to the use of a tie bar, not two symbols. I'm not saying you can't represent an affricate with one letter, just that a tie bar is only optional when representing an affricate according to the official chart. What the Igbo illustration does is actually a very sensible choice: [tʃ, dʒ] are in fact transitions from [t, d] to [ʃ, ʒ], while [k͡p, ɡ͡b] not so much, following closely the IPA Principle #4 (c).
By this analogy, which you brought up, then, the Handbook is saying that ⟨ʘ ǀ ǃ ǂ ǁ⟩ can indeed represent entire clicks. That's not quite the same as The Handbook requires them. What the Chart calls 'clicks' aren't consonants at all, just the releases, which is what you said at Template talk:IPA non-pulmonic consonants#velar vs uvular clicks. Nardog (talk) 09:25, 10 August 2020 (UTC)
You're reading a lot of your own opinions into the motivations of the writers of the Handbook. It makes just as much sense to write a labial-velar stop as ⟨kp⟩ in a language that doesn't contrast that with k + p as it does to write an affricate ⟨ts⟩ in a language that doesn't contrast that with t + s. And I don't see how you can possibly think that Principle 4(c) is any more relevant to one than to the other. — kwami (talk) 09:38, 10 August 2020 (UTC)

The question is whether you wish to be precise in your presentation of how the IPA works. In explaining the IPA, I feel that we should be precise. When being precise, [t͡s] and [k͡p] are one segment, [ts] and [kp] are two. [c] is a plosive, not an affricate. [ɽ] is central, not lateral. [ɨ] is a central vowel, not back. [k] is pulmonic, not ejective. [ǂ] is a click type, not a velar click. Outside pedagogy, it's fine to use the alphabet more broadly -- [ts] and [c] can both be affricates, [ɽ] can be lateral, [ɨ] can be a back vowel, [k] can be ejective, and [ǂ] can be a velar click. But you don't start off presenting them that way to people who are not familiar with how the IPA works. — kwami (talk) 09:35, 10 August 2020 (UTC)

As for the phrasing it is possible to symbolize factors such as voicelessness, voicing, or nasality of the click by combining the click symbol with the appropriate velar or uvular symbol, they're saying that it's convenient to transcribe features that have nothing to do with the rear articulation as if they were part of that articulation rather than of the entire consonant. The voicing and nasalization don't belong to the rear articulation -- that's only specified for uvular-velar, affrication, ejection and the like. The IPA transcription of clicks is weird, like writing labial-velars as *⟨k͡p, g͡p, ŋ͡p⟩. A lot of the variability and debate in transcription is related to this fact, that the IPA letters for clicks don't really fit in with the rest of the alphabet. I suppose that one might address this by transcribing clicks as e.g. ⟨k͡ǂ̥, ɡ͡ǂ̬, ŋ͜ǂ̃⟩, but I've never seen anyone do that. — kwami (talk) 09:55, 10 August 2020 (UTC)

The Chinese Wikipedia's IPA article uses the diacritics for voicelessness and nasalization to modify click symbols, see zh:國際音標#非肺部氣流音. Love —LiliCharlie (talk) 04:51, 12 August 2020 (UTC)
@LiliCharlie: I'm sure that's just copied from a version of our {{IPA non-pulmonic consonants}} before recent edits. What I want to know is the community's opinion on how that template should be presenting the links to articles about clicks. I for one think the articles themselves (most of which are unreferenced) are hardly notable and should be merged into just Bilabial click, Dental click, Alveolar click, Retroflex click, and Palatal click, so that then the template can simply have a row of ⟨ʘ ǀ ǃ ǂ ‼ ǁ (ʞ)⟩ much like the actual IPA chart. But even barring that, do we need both velar and uvular symbols, making the rows twice as tall? My understanding that the use of velar symbols has been far more prevalent, even if the actual posterior closure of such clicks may be more accurately described as uvular. Nardog (talk) 05:07, 12 August 2020 (UTC)
Following Ladefoged & Maddieson (1996:265–266) velar and uvular symbols are both required to account for a phonemic contrast in ǃXóõ. However Miller et al. (2007) say in their study of Nǀuu that "evidence suggests that the contrast between “velar” and “uvular” clicks proposed for the related language ǃXóõ is likely also one of airstream and that a contrast solely in terms of posterior place would be articulatorily impossible." Love —LiliCharlie (talk) 05:50, 12 August 2020 (UTC)
@LiliCharlie: The Cornell link is dead, here's the new URL: [3]Austronesier (talk) 10:11, 12 August 2020 (UTC)
Good spot, thanks for pointing out and providing a working link, Austronesier. My outdated link was actually to the earlier 2007 version of the study that was submitted to JIPA where it was published in 2009. I've managed to find the 2007 version archived on WaybackMachine, but the 2009 JIPA version is even better and certainly more accessible. Love —LiliCharlie (talk) 17:07, 12 August 2020 (UTC)

Pace Miller, there's at least one language that distinguishes velar from uvular clicks without any airstream contour. You hear it in the vowel rather than in the release of the click. (Miller discovered when working on N|uu that in that language where a velar-uvular distinction had been posited, all clicks were uvular and the distinction was one of timing, e.g. [q͡ǂ] vs [ǂ͡q] -- that is, whether or not you could hear the uvular release. She suggested that all languages had uvular clicks only in this fashion, and that "velar" clicks simply had an inaudible back release. But it turns out that not all of them do -- some have only velar clicks, and some have both. Why she should say that such an easy distinction might be "articulatorily impossible" is beyond me. They're easy to articulate and the spectrograms are pretty clear, with e.g. a velar pinch after velar clicks.) But regardless, the question here is what is the IPA convention, not what is Miller's. Lots of people use the bare click letter for a tenuis velar click. And that's fine, if you're not sticking to strict IPA. The IPA itself did that when it introduced the Beech letters in 1923. But the 1999 Handbook -- the replacement for the 1949 Principles so they could accommodate the Kiel convention that had replaced the original click letters with the ones we see now -- doesn't take such shortcuts in the examples it gives.

(Side note, Sandawe and Hadza have only (somewhat backed) velar plosives and ejectives, and clicks at the same rear place of articulation. Some Khoe langs have both velar and uvular plosives and ejectives, and clicks at both places of articulation. If you were looking only at those languages, it would be natural to conclude that clicks are doubly articulated. But Xhosa has only velar plosives and ejectives and only uvular clicks. If you were to look only at that language, it would be natural to conclude that the uvular closure is part of the airstream mechanism, not a place of articulation. So there's plenty of reason for theoretical differences between phoneticians, which are reflected in how they choose to symbolize clicks.)

As for merging the articles, that would effectively be saying that click consonants aren't important enough to bother distinguishing. It's as if a French-speaker were to say that our articles on affricates should be merged into the corresponding fricatives because affricates aren't important -- after all, they don't occur in French and there are no IPA letters for them. — kwami (talk) 04:51, 14 August 2020 (UTC)

Wikipedia:Reliable sources/Noticeboard#etymonline could use input from this project's participants. Nardog (talk) 13:35, 13 August 2020 (UTC)

etimo aut no etimo[edit]

Hey There,
many psychological pages have no etymology whatsoever
ie "panic attack" does not refer to Pan. Goddess Psyche is never mentioned anywhere
please upgrade the articles
thanks Linguists --Wittgenstein51 (talk) 19:08, 15 August 2020 (UTC)

Please go ahead and add what you deem to be missing. −Woodstone (talk) 07:33, 16 August 2020 (UTC)

Unicode chart template references[edit]

Regarding templates within Category:Unicode charts, would there be a reason that the superscript numbers at the top could not be replaced by, for example, letters, to better distinguish them from article references? This would make them more clearly linked to the template notes they apply to. (Drmccreedy,BabelStone) CMD (talk) 14:52, 19 August 2020 (UTC)

@Chipmunkdavis: Personally, I find these less confusing/irritating than the notes/references in Help:IPA/English :) But sure, there is no reason not to convert them into something that better meets common expectations, even if not prescribed by MOS. –Austronesier (talk) 15:31, 19 August 2020 (UTC)

Scots Wikipedia[edit]

FYI: meta:Requests for comment/Disruptive editing on sco.wikipedia on an unparalleled scale. Visite fortuitement prolongée (talk) 14:53, 26 August 2020 (UTC)

Is an "alphabet" and a "script" same?[edit]

Is an "alphabet" and a "script" the same thing? I know this is probably not a strictly linguistic issue, but I can thing of no other expert group that can help with this. If this is not the forum, please point me to the right one.

Context: Today we have Bengali-Assamese script and the two alphabets: Bengali alphabet and Assamese alphabet. The "script" article came about because the then Bengali script was too language specific and after some discussion it was decided that a "parent" article was required (Talk:Bengali_alphabet#Merge_with_Assamese_script?, 2006-2007). After some meandering the article name settled on "Bengali-Assamese script".

The immediate context is the discussion at Talk:Rangpuri_language#Writing_system. In short the discussion is on whether we should link the script of the Rangpuri language as [[Bengali-Assamese script]] or [[Bengali alphabet|Bengali script]].

I am tagging the other interested parties: user:Za-ari-masen and user:Msasag. And also user:SameerKhan who was instrumental in the 2006/2007 decision.

Thank you!

Chaipau (talk) 12:53, 29 August 2020 (UTC)

Terminology varies. A good starting point might be the Glossary of Unicode Terms. In their strict terminology the Bengali-Assamese script is a script ("Bengali script") of the abugida type (and not of the alphabet type), and the Bengali writing system as well as the Assamese writing system use the Bengali script. HTH. Love —LiliCharlie (talk) 13:08, 29 August 2020 (UTC)
Does Unicode provide names of scripts or blocks of codes? There is in fact a proposal to change the block to "Bengali-Assamese" ("It may be possible to change the block header name, though the block property values cannot. The most neutral and least disruptive name would be “Bengali-Assamese”. This is an editorial, not a normative, matter." [4]) Nevertheless, the script is already called "Bengali-Assamese" in Saloman (1998) Bengali–Assamese_script#cite_note-1. Chaipau (talk) 13:34, 29 August 2020 (UTC)
Unicode provides a lot. The latest standard has over 1000 pages. And they also host ISO 15924. Love —LiliCharlie (talk) 13:50, 29 August 2020 (UTC)
I don't think Unicode's stability policy allows script or block name changes, not even if the names contain obvious spelling errors, but formal name aliases are allowed. Love —LiliCharlie (talk) 14:07, 29 August 2020 (UTC)
But Unicode does not encode scripts per se, according to their FAQ. For instance, Bengali uses the "danda" defined in the Devanagari block, so does it mean that Bengali encoded in Unicode uses a hybrid Bengali-Devanagari script? Yes we started with Unicode, but we need to move on. Chaipau (talk) 14:24, 29 August 2020 (UTC)
Blocks are handy, but they don't determine script. Characters have character properties, and one value of the script property is Zyyy for "undetermined script" aka "common". (Many scripts share punctuation, numerals, diacritics, etc.) Love —LiliCharlie (talk) 14:35, 29 August 2020 (UTC)
I respect the knowledge and opinions of the editors who joined that discussion of 2006-07 that Chaipau showed, but it just appears to be a case of WP:OR where the editors came up with the term "Bengali-Assamese script" which now seems like a WP:NEOLOGISM as there are some visible efforts to popularize the term. All the relevant sources call it "Bengali script" including the Unicode glossary that LiliCharlie showed and in the context of Rangpuri language, Ethnologue states the writing system of Rangpuri as "Bengali script". The sources use "Bengali script" and "Bengali alphabet" interchangeably, even the article on Bengali alphabet uses "script" numerous times in its description. I think the best way to solve this issue is to rename Bengali-Assamese script to Bengali script. Za-ari-masen (talk) 09:34, 30 August 2020 (UTC)

A script and an alphabet aren't same imo. A script is a set of characters and an alphabet is based on one or more scripts and the characters have certain sound values and other rules. This script we are talking about isn't just used for Assamese and Bengali but also for Maithili, Meitei Manipuri, Kamtapuri, Bishnupriya Manipuri, Sylheti, Hajong, Santali, Chittagonian etc etc. And this script is known by many different names. This script currently has two Unicode blocks: Bengali and Tirhuta. Tirhuta block is, as of now only usef for Maithili language, it's also known as Mithilakshar. And the Bengali block is used for many different languages like Assamese, Bengali, Rangpuri/Kamtapuri etc. Unicode has three names for the script: Tirhuta script, Bengali script and Assamese script. Though since this is one script, an unified name should be used. I prefer the name Eastern Nagari. The Siddham script has two descendants, 1) Nagari or Devanagari or Western Nagari and 2) Eastern Nagari. So the term Eastern Nagari is suitable for the script since it's used in the Eastern region. Scripts like Odia and Nepalese script came from early Eastern Nagari that emerged in 13th-14th century. We cannot choose any of the regional names like Bengali or Assamese or Tirhuta. That is because people from other regions don't accept any specific regional name. For example, if it's renamed as Bengali script, then people from Assam, Bihar, Jharkhand, Kamtapur region will feel offended. They feel offended and disadvantaged when their script, languages, culture etc are mistaken to be Bengali. This leads to hatred among different groups. So this issue will never be solved, people will keep demanding to change the name "Bengali script". So I think it's best not to favour any specific regional term and we should use an unified term like "Eastern Nagari" for the script. Outside wikipedia, the unified name "Eastern Nagari" or "Purvinagari" is quite accepted as I've seen. Only few people opposed this term, it seems that they prefer a term to which their cultural identity is associated. I'm a Bengali and I've many Bengali friends from Bangladesh and West Bengal who have no issues using the term Eastern Nagari. Msasag (talk) 10:41, 30 August 2020 (UTC)

@Za-ari-masen:, No. When user:SameerKhan suggested the name "Bengali-Assamese" in 2006 it was already prevalent ("Indian Epigraphy" Saloman 1998). It was to accommodate the non-Assamese/Bengali languages that for a time this article was named "Eastern Nagari script". (Manipuri language uses the Bengali and the Assamese for example. - addendum) We know from Brandt 2014 that the academic community rightly prefers "Eastern Nagari script" for the very same reason. [5] Chaipau (talk) 14:24, 30 August 2020 (UTC)
Ethnic pride is not among our five criteria for article titles and we will continue to call Serbo-Croatian by that name in spite of animosities between fervent Serbs and Croats who hate their linguistic varieties to be described as varieties of a common language, and in spite of Bosnians and Montenegrins who hate not to be mentioned. We should be guided by our existing policy and not invent new ad hoc rules to cater for the taste of people who lack scientific objectivity (i.e., maximum distance between observer and the observed). Love —LiliCharlie (talk) 15:30, 30 August 2020 (UTC)
Chaipau, these are just one or two sources where the terms "Bengali-Assamese" or "Eastern Nagari" are mentioned but there are thousands of sources that describe the script as "Bengali script". Even Brandt himself notes that "Bengali script" is the most common and popular term for this script, hence, it seems to be the most suitable title per WP:COMMONNAME. You should see what LiliCharlie stated above, ethnic pride is not a criteria to suggest article titles. Za-ari-masen (talk) 09:13, 31 August 2020 (UTC)
@Za-ari-masen: What applies here is WP:NAMINGCRITERIA, not WP:COMMONNAME. The point is that academics and others have recognized the name "Bengali script" is problematic. You are misquoting Brandt—this is what she says: "In fact, the term 'Eastern Nagari' seems to be the only designation which does not favour one or the other language. However, it is only applied in academic discourses, whereas the name 'Bengali script' dominates the global public sphere." In other words, she (and the academic community) is rejecting the dominant name ("Bengali script") and is preferring quite another name ("Eastern Nagari script").
And the claim to WP:COMMONNAME is a little misleading. In determining what is WP:COMMONNAME it recommends In determining which of several alternative names is most frequently used, it is useful to observe the usage of major international organizations, major English-language media outlets, quality encyclopedias, geographic name servers, major scientific bodies, and notable scientific journals.. Just a search on the web is not enough. Again pointing back to Brandt's statement preferring "Eastern Nagari script" over the popular "Bengali script".
Chaipau (talk) 11:23, 31 August 2020 (UTC)
Addendum: Using solely WP:COMMONNAME, one should use "Bengali-Assamese script" rather than "Bengali script". This is because there are significant works that mention the script as Assamese or Asamiya script (e.g. in "Indo-Aryan Languages, Cardona") and it improves recognizability. Chaipau (talk) 11:33, 31 August 2020 (UTC)

Za-ari-masen I don't see any reason to consider the "popularity" of a word, rather we should use a name that is acceptable to all (not just an individual). We should also keep in mind the publication dates of those "thousand" sources. Mohsin274 (talk) 10:20, 31 August 2020 (UTC)

For what it is worth, Unicode does call it "Bengali and Assamese"[6]. Chaipau (talk) 12:06, 31 August 2020 (UTC)

"It"? No. Unicode calls the script "Bengali script", and the block starting at U+0980 "Bengali", cf. chapter 12.2 of the current standard which also mentions "Bangla script", "Asamiya", and "Assamese" as synonyms for the script. What you are citing is a page to help users find charts of Unicode blocks rather than scripts. Love —LiliCharlie (talk) 12:45, 31 August 2020 (UTC)
@LiliCharlie: I think we have addressed these issues earlier.
  • The block header name will never change in Unicode. It will break too many things and it was designed not to change.
  • Blocks encode codes, not scripts (look at the FAQ link I provided above). It says they do not encode scripts, per se. I also gave you an example why every complete sentence used in Bengali Unicode is a hybrid Devanagari-Bengali code.
  • Further more, look up the answer to the FAQ: Can I determine the script of a character by the character or block name? Ans: No, not at all. The character names and block names are not reliable indicators of the script of a character. In other words, the name "Bengali script" may or may not determine the name of the script to which the characters in the block belong. For example, the letter which is called "BENGALI LETTER RA WITH LOWER DIAGONAL". This letter does not even exist in the Bengali alphabet, and it is not "RA' but "WO".
In this case at least, we cannot go by Unicode naming conventions.
Chaipau (talk) 14:07, 31 August 2020 (UTC)
@LiliCharlie: I don't know if these will help but you should read these news articles once: [7], [8], [9] [10]. Mohsin274 (talk) 14:19, 31 August 2020 (UTC)
@Mohsin274: You are aware that the whining Sentinel editorial is utter BS that conflates script with language? –Austronesier (talk) 14:48, 31 August 2020 (UTC)
@Austronesier: I don't know. You may/may not be right, but I personally don't have any issues with Sentinel editorial. I am just showing few articles about "The London sitting of the International Organization for Standardization... held between June 18 and June 22, 2018." And, if you think the article from Sentinel is unreliable or biased then you can read the other 3 from The Assam Tribune, NE Now, and Indian Express. Mohsin274 (talk) 15:17, 31 August 2020 (UTC)
Don't get me wrong, but we need peer-reviewed scholarly articles rather than newspaper articles by people who seem involved. Love —LiliCharlie (talk) 15:27, 31 August 2020 (UTC)
@LiliCharlie: I agree. Opinion columns are the bane of Wikipedia in many instances. The Indian Express reports are also too opinionated. It was be better to look at the Unicode ad hoc committee report, which is some kind of a peer-review of the submission made by the BIS. Here they are:
  • The proposal: [11]
  • The Ad Hoc Committee report: [12]
  • The Working Group Report: [13]
Please note the Recommendation M67.25b from the Working Group (page 5): Change the block header from Bengali to Bengali-Assamese. Obviously, the WG did not accept everything the BIS submitted.
Chaipau (talk) 15:47, 31 August 2020 (UTC)
I never said we should follow Unicode or ISO 15924 practice. What I said was we should be guided by our own five criteria for article titles, and not consider ethnic pride. And I now add: We shouldn't try to settle any political issues. Love —LiliCharlie (talk) 14:46, 31 August 2020 (UTC)
@LiliCharlie: Yes, I agree with you. We should apply WP:NAMINGCRITERIA diligently here. We have seen that the old usage has some problems, and the Unicode, the academics and scholars are moving in a certain direction. We are best off being mindful of that direction. Not doing so is political. We should not overstep them either. This debate has been going on for some time in different talk pages, and I believe the experts in this Linguistic forum are possibly the best equipped to take the nuances into consideration and resolve the issue. Chaipau (talk) 15:28, 31 August 2020 (UTC)
My above example of Serbo-Croat was chosen because issues are involved that lead to atrocious wars with massacres and many casualties. I refuse to fuel tensions by taking sides, neither the Bengali-speaking, nor the Assamese-speaking nor any other linguistic or ethnic group have the right to demand considerateness that might result in hurting somebody else's feelings. I prefer to remain completely neutral by not agreeing with any of the parties involved. And certainly not with the loudest one. Love —LiliCharlie (talk) 16:04, 31 August 2020 (UTC)
@LiliCharlie: According to user:Za-ari-masen, the reliable sources like Ethnologue uses "Bengali script" (not "Bengali-Assamese script"). And, according to Ethnologue, they use ISO Standard 15924 for identifying writing systems or scripts (As stated here). Therefore, we are indirectly following ISO 15924. But, if ISO renamed "Bengali script" to "Bengali-Assamese script", then we should use the same. Mohsin274 (talk) 15:42, 31 August 2020 (UTC)

Wikipedia policy on including etymology information?[edit]

Hello etymology friends. I often consider adding an etymology section to articles without one, but I'm never sure if that's acceptable. It's not clear to me that there's a consistent threshold, if you will, even for what one would imagine to be the most vetted topics. For example, Tree, Future, and March (music) don't have etymology info, but Animal, History, and March (month) all do. What gives? What is the policy/common practice/tradition for including etymology on a topic's page?

  • Does it depend on how notable the page is?
  • How important the topic is?
  • How "obvious" and/or well-known it is what the etymology is?
  • How attested the etymology is?
  • Whether the origin is Germanic, Latin/French, or other?
  • Whether the word is shared by other languages?
  • How abstract the topic is?
  • Should etymology be explained in a parenthetical in the lede? Or in its own section?
  • Is there a policy at all?

CampWood (talk) 02:09, 30 August 2020 (UTC)

I have no idea if there is any guideline hidden somewhere, but intuitively I find an etymology section helpful if it explains how the concept described by the term developed, as in the case of History. For Tree, etymological information adds little to our understanding of what a tree is, so should be left out per WP:NOTDICTIONARY. –Austronesier (talk) 07:51, 31 August 2020 (UTC)

Discussion of example number formatting on helpdesk[edit]

I'm just gonna leave a link to this discussion about using running numbering schemes for linguistic examples. The idea would be basically to have Wikimarkup support something like the LaTeX/linguex "\label" and "\ref" system. Botterweg14 (talk) 12:42, 31 August 2020 (UTC)

Discussion about Late Greek[edit]

There is an ongoing discussion about Late Greek in Talk:Late Greek -- is it a "period" of Greek? is it a "register"? should it have a standalone article, or be part of some other article? Kindly help us out! --Macrakis (talk) 17:05, 31 August 2020 (UTC)

Is Ethnologue reliable for the Kamta group of languages?[edit]

The Ethnologue seems to give classifications and names in a very different system, at variance with accepted knowledge and recent findings. Here are some examples:

  1. Ethnologue calls Rangpuri language a language [14], whereas Masica 1991, p 25 calls it Rajbangsi (" Thus the Rajbangsi dialect of the Rangpur District (Bangladesh), and the adjacent Indian Districts of Jalpaiguri and Cooch Behar, has been classed with Bengali because its speakers identify with the Bengali culture and literary language, although it is linguistically closer to Assamese.")
  2. Ethnologue, on the other hand, calls Rajbangsi a different language from Nepal [15].
  3. Ethnologue calls Kamtapuri an alternative name for Rangpuri [16], whereas Toulmin (PhD 2006) finds "However, with a sizeable number of speakers now located within a different country to Rangpur, and lacking any special historical reason for choosing Rangpuri over Kamta, it is unlikely that this term will catch on further afield."

It seems Ethnologue is at complete variance with linguists and their findings and reports.

Could we then consider Ethnologue, at least for these entries, reliable?

Chaipau (talk) 17:39, 1 September 2020 (UTC)

The Ethnologue is a tertiary source because it is a compendium of other secondary sources (which in turn rely on primary sources). As such, it may be helpful, but proper secondary sources should be preferred. For our policy regarding primary, secondary and tertiary sources, see WP:PSTS.
Regarding the Ethnologue, I do not know about the Kamta group of languages, but I know cases where the Ethnologue does not reflect the best consensus in Linguistics, namely when it comes to the differentiation between Western Upper German varieties (which is what I know about the most), which includes entries such as “Swiss German“ (not a linguistic division, but rather a cultural or national one), “Walser” (various Highest Alemannic German varieties, but not the only ones), but scandalously lacks Alsatian.
I think it is problematic that the ISO has basically copied the Ethnologue classifications. Of course, a hard classification scheme like ISO 639-3 is a necessity for computers, and it has many benefits. However, it obscures the inherent fuzziness of linguistic classifications and perpetuates one classification system, in this case the Ethnologue’s. Also, the Ethnologue now has a hard paywall. --mach 🙈🙉🙊 18:54, 1 September 2020 (UTC)
I think it is problematic that the ISO has basically copied the Ethnologue classifications.
It's the other way round, see Ethnologue's The Problem of Language Identification page where it says: "Since the fifteenth edition (2005), Ethnologue has followed the ISO 639-3 inventory of identified languages ( as the basis for our listing of distinct languages." (A more direct link to the language identification policy of ISO 639-3 is See articles SIL International, Ethnologue, and ISO 639-3 for the relationship between Ethnologue and ISO 639-3.) Love —LiliCharlie (talk) 19:39, 1 September 2020 (UTC)
P.S. The starting point to request an ISO 639-3 entry for Alsatian is their Introduction to the Code Change Process page. Love —LiliCharlie (talk) 19:57, 1 September 2020 (UTC)
Oh-oh, shows that it’s better to research first and rant later. Thanks for the corrections. --mach 🙈🙉🙊 22:10, 1 September 2020 (UTC)
@LiliCharlie: Yes. In the Indo-Aryan context, where "The speech of each village differs slightly from the next, without loss of mutual intelligibility, all the way from Assam to Afghanistan.", Masica 1991 p.21 has a very comprehensive description of the language/dialect problem. This is a much bigger problem that cannot be adequately captured by the mutually exclusive categories of Ethnologue. Chaipau (talk) 10:04, 2 September 2020 (UTC)
This is typical of dialect continua, of course, and by no means restricted to Indo-Aryan. Mach's Western Upper German example within the Continental West Germanic continuum is of the same kind. A language is a dialect with an army and navy. Love —LiliCharlie (talk) 10:30, 2 September 2020 (UTC)

LiliCharlie, J. 'mach' wust could an unpublished thesis be considered a reliable source over Ethnologue? Za-ari-masen (talk) 09:26, 2 September 2020 (UTC)

Sources are required to be verifiable, and our verifiability policy rules that "content is determined by previously published information". Love —LiliCharlie (talk) 09:43, 2 September 2020 (UTC)
What do you mean by "unpublished"? If you're talking about a PhD thesis that has been submitted and accepted then it counts as published (WP:SCHOLARSHIP). Nardog (talk) 10:05, 2 September 2020 (UTC)
  • The current setup on Ethnologue for the Ranjbanshi dates to 2008, and like with other recent changes it's got a paper trail that you can follow [17] (you will recognise the name of Toulmin somewhere in there). My experience with similar code changes in this part of the world is that they're usually based on the results of a sociolinguistic survey. Of course, conclusions could be different if other methods were used, and even the same sociolinguistic data is often open to different interpretations. Also, a recent survey can paint a different picture from the one gleamed from a three-decades-old reference text. – Uanfala (talk) 10:30, 2 September 2020 (UTC)
Yes, I agree with user:Nardog on the general principle. Furthermore, the PhD thesis in question, Toulmin 2006, is open-access published by the University: [18]. Therefore, it satisfies WP:V too, as required by user:LiliCharlie. Chaipau (talk) 10:40, 2 September 2020 (UTC)
So it looks like Toulmin himself was part of the team at Ethnologue to create the database for Rangpuri, so shouldn't we follow Ethnologue over Toulmin's earlier thesis? Za-ari-masen (talk) 11:09, 2 September 2020 (UTC)

Comments requested[edit]

Please come and make your voice heard at Talk:Eskimo#Racial slur?. Trying to discuss what, if anything, direction the article should take. I have notified all projects listed at the top of Talk:Eskimo. CambridgeBayWeather, Uqaqtuq (talk), Huliva 22:42, 16 September 2020 (UTC)

Request for example numbering[edit]

Thought I'd let the community know I've made a feature request for an example numbering tool that would generate numbers automatically and allow cross-referencing. I'd appreciate any comments on my proposal, or just editors chiming in with their support if they agree that this would be useful. Botterweg14 (talk) 12:28, 21 September 2020 (UTC)

RfC on Sylheti language - Family tree[edit]

What could the family tree be for the Sylheti language? Ethnologue uses the following tree:

  • Indo-European→Indo-Iranian→Indo-Aryan→Outer Languages→Eastern→Bengali-Assamese→Sylheti

Chatterji (1926), on the other hand, uses a combination of names and regions to come up with this tree:

  • Magadhi Prakrit and Apabhramsa→Vanga Dialects

Here he splits Vanga Dialects into two parts and names Sylhet (probably the region, not the language) in two different branches (we can probably assume that E Sylhet represents Sylheti language, but I am giving out both the branches for reference):

  • Western and S W Vanga in which he includes NW Sylhet
  • Eastern and S E Vanga in which he includes E Sylhet

(Chatterji's tree is reproduced for reference in Toulmin's thesis (2006) p302)

Could we combine these two different sources, insert Vangiya in the tree from Ethnologue, and come up with a tree as follows?

  • Indo-European→Indo-Iranian→Indo-Aryan→Outer Languages→Eastern→Bengali-Assamese→Vangiya→Sylheti

@Za-ari-masen:, UserNumber, Kmzayeem, Aditya Kabir, Austronesier.

Chaipau (talk) 17:48, 4 October 2020 (UTC)

@Chaipau: I think you need to start the RfC at the article talk page, and leave a message here leading people to the discussion there. (see: Wikipedia:Requests for comment) Aditya(talkcontribs) 18:21, 4 October 2020 (UTC)
@Aditya Kabir: Pinging somebody requires you to add new lines of text and sign your contribution in the same edit, see the "Usage" section of {{Ping}}. — Chaipau is watching this page anyway, I think. Love —LiliCharlie (talk) 18:48, 4 October 2020 (UTC)
Thanks. I guess if I add a ping later, I would also have to sign the comment again. I re-signature a lot anyways (because a poor connection and a strong ADHD). Here, TeacupY let me pour you a hot cup of fine darjeeling. See you at the RfC. Aditya(talkcontribs) 18:59, 4 October 2020 (UTC)
@Aditya Kabir: this is a technical issue (in Linguistics) that might involve different language databases and the relative weights experts give them. If we sort it out here, we may not have to go through the formal RfC. Chaipau (talk) 19:13, 4 October 2020 (UTC)
Seriously!? I don't think I need to be a lingustic specialist to understand that a "language" (i.e. Sylheti) can't be a subset of a "dialect superculster" (i.e. Bengali/Vangiya, whatever that is). Also obsolete taxonomy belongs to the history section or alternatives section, not the infobox. In my humble but slightly tickled opinion, not-being-a-moron is talent enough to deal with this "technical issue". Face-smile.svg Aditya(talkcontribs) 19:25, 4 October 2020 (UTC)

This is not a language/dialect issue, Sylheti's status as a language/dialect itself is disputed but that's not the point. I don't see any harm in combining the two sources to form a family tree as has been done on Rangpuri language by combining Chatterji, Toulmin and Ethnolugue to insert "Kamrupic" and "Western Kamrupic", which the OP himself has supported. If there are problems with such family trees, it should be avoided on both Sylheti laguage and Rangpuri language. Za-ari-masen (talk) 08:53, 5 October 2020 (UTC)

  • Comment at the risk of not-being-not-a-moron per Aditya Kabir: It takes a layman to believe that a "language" (e.g. Sylheti) can't be a subset of a "dialect supercluster". "Language" and "dialect" are fluid concepts. "Bengali-Assamese" is a complex dialect continuum, with some of its varieties having a literary tradition and thus being considered "languages" (Bengali, Assamese, Sylheti). Many varieties don't have a literary tradition, and as a rule, these non-literary (including "aspiring" literary) variants are "roofed" by a traditional literary language: e.g. Chittagongian, Rangpuri/Kamta by Bengali, Kamrupi by Assamese, or Surjapuri by Hindi.
Ethnologue is agnostic with regards to the internal classification of Bengali-Assamese. This does however not mean that earlier classification proposals are invalid/obsolete. Chatterji has divided the Bengali-Assamese dialect continuum into four branches ("Radha", "Varendra", "Kamarupa", "Vanga"). Only "Kam(a)rupa" has been studied in detail by Toulmin (who btw calls Bengali-Assamese "Gauda-Kamrupa", with a question mark because of its unclear relation to Odia). Unlike Chatterji, he preliminarily proposes that all non-Kamrupa varietes can be assigned to a single sister branch of Kamrupa, viz. "Gauda-Baŋga". Note that the internal structure of "Gauda-Baŋga" is not discussed by Toulmin at all. Toulmin's classification of non-Kamrupa varieties of Bengali-Assamese does not invalidate Chatterji's classification; the matter clearly requires further research.
Since the internal structure of Bengali-Assamese is still not yet fully understood, and Chatterji's and Toulmin's classifications of non-Kamrupa variants are conflicting, I suggest to place Sylheti directly under "Bengali-Assamese" in the infobox, and mention the details in prose. –Austronesier (talk) 09:00, 5 October 2020 (UTC)
So shouldn't we be consistent on both Sylheti and Rangpuri if we are to mention the details in prose and keep the family tree upto Bengali-Assamese? Za-ari-masen (talk) 09:18, 5 October 2020 (UTC)
We don't have to be consistent between apples and pears. The place of Rangpuri is uncontroversial (Kamrupa is "established"), so there's no harm to have more solid info in the infobox. –Austronesier (talk) 09:28, 5 October 2020 (UTC)
Austronesier Does this mean that we can use dialects and languages interchangeably? What is the purpose of structuring a taxonomic tree for concepts that are inherently unstructured? In a layman's view this looks like buidling with bricks made of water. Aditya(talkcontribs) 11:49, 5 October 2020 (UTC)
@Aditya Kabir: Trees are not about taxonomy, nor about languages vs. dialects, but about the historcal relations between individual language varieties. For "language" vs. "dialect" see: Abstand and ausbau languages, A language is a dialect with an army and navy, Dialect#Dialect_or_languageAustronesier (talk) 12:56, 5 October 2020 (UTC)
Aren't those historical relations of a mother-daughter variety? As for the army and navy... every Bangladeshi village talks a little differently than the next one, and there are over 100 thousands of them. Thanks lord that they don't all have access to armies and navies. While I understand the lack of research, but a structure can't be a way to explain things not structured. Can it? (By the way, I must state that my comments last one onwards have nothing to do with the dispute. With new enlightenment I am just wondering about the extreme subjectivity of the thing in dispute.) Here, TeacupY a cup of nice hot darjeeling to compensate for the distraction. Aditya(talkcontribs) 13:18, 5 October 2020 (UTC)

@Aditya Kabir: I don't see any subjectivity here. Historical linguistics is pretty rigorous and can be safely relied upon. It is technical and unfortunately any advanced technology looks like magic. Chaipau (talk) 15:45, 5 October 2020 (UTC)

  • Comment - Is there any established guidelines or manual of style specifically for such linguistic articles to create the language family trees? --Zayeem (talk) 16:43, 5 October 2020 (UTC)
I will pocket the insult and remind that any magic look like science to believers. When all the definations of things and their relations depend upon fluidity open to interpretation and not established facts or accepted hypotheses, it really looks like an interpretation of the scripture than asserting facts. The high attitude against laymen is also not uncommon to scriptural interpreters. That kind of interpretation also has a valid claim of rigour. (By the way, history without lingustic, including historiology and historiography, happens to be my key interest and that discipline has no pretention to be a science. My comment still has nothing to do with the dispute.) Apologising again for further distraction. Aditya(talkcontribs) 17:34, 5 October 2020 (UTC)
  • Unless there is a verifiable scholarly consensus on which classification is better, we should report both per WP:WEIGHT. We may not combine the two hypotheses as suggested as it violates WP:SYNTH. Wug·a·po·des 17:46, 5 October 2020 (UTC)
    • Or neither, if we don't want to overload the infobox. –Austronesier (talk) 18:02, 5 October 2020 (UTC)
Wugapodes, I presume this also applies to Rangpuri language which also has a classification combining two hypotheses? Za-ari-masen (talk) 18:11, 5 October 2020 (UTC)
@Austronesier: oh right, infoboxes. Including neither might not be the best course, but I don't know much about Indo-Aryan languages. IIUC, it looks like both hypotheses agree with classification up to Indo-Aryan, so we may want to include that part of the tree and then refer to the text for further hypothetical sub-classifications. @Za-ari-masen: if the infobox at Rangpuri language combines two classification hypotheses to produce a new hypothesis that does not exist in the literature, then it is likely original research (synthesis of sources) and should be revised to comply with WP:V and WP:OR. As mentioned, I'm not familiar with this language group, so I trust your decision-making on the specifics. Wug·a·po·des 19:08, 5 October 2020 (UTC)
@Za-ari-masen: FWIW, Rangpuri has a classification based on non-conflicting sources. It's not synthesis when multiple sources state the same. –Austronesier (talk) 19:12, 5 October 2020 (UTC)
@Wugapodes: Yes, that was my idea, to cut off the tree at the bottom where the hypotheses diverge. The upper consensus part can fit in the infobox, while conflicting proposals must be explained in the prose part of the article. –Austronesier (talk) 19:18, 5 October 2020 (UTC)
  • Agreed, combining the two trees is synthesis; simply list both unless a reliable source that combines them is found. Gbear605 (talk) 18:24, 5 October 2020 (UTC)
  • A tentative summary of independent comments so far:
    • We cannot combine the two trees (SYN)
    • We need to give due to the two trees (WEIGHT)
  • The question remains—do we give both the trees in the Infobox? The suggestions seem to be:
    • Provide the Ethnologue in the Infobox
    • Explain the two in the text.
Also, Za-ari-masen Rangpuri is out of scope for this. Please use WP:LOP for a solution of Rangpuri, and not the solution to the problem here as input. We cannot have a chain of individual solutions to determine resolution.
Chaipau (talk) 18:49, 5 October 2020 (UTC)
@Chaipau: see this edit I made. I listed the IE and IA macro-families and replaced the disputed parts with "Disputed, see text" and a link the the classification section. What do others think? Wug·a·po·des 19:16, 5 October 2020 (UTC)
@Wugapodes: thank you. As Austronesier has pointed out, there is no conflict between the two sources up to Bengali-Assamese languages. So maybe we can retain it up to that point and then say "disputed"? Also, is "disputed" too strong? Maybe "[more details in text]" or something? Chaipau (talk) 20:53, 5 October 2020 (UTC)
@Chaipau: And, then link the "see "[section name]" in the infobox to lead to the section. When something doesn't have one answer, it is prudent to lead readers from the infobox to a section that discusses the differing opinions in greater detail. Aditya(talkcontribs) 02:02, 6 October 2020 (UTC)
  • Additional Comment: It seems Chatterji's grouping hypothesis has been reconstructed by Pattanayak (1966). From Toulmin (2009) p212 "Chatterji’s subgrouping hypothesis has been subjected to detailed comparative reconstruction by Pattanayak (1966)." The reason there is a disagreement between Ethnologue and Chatterji is because Ethnolgue (and Glottolog) likely follow Pattanayak, which is the more updated tree. At this point I wonder whether we should mention Chatterji at all in the Sylheti article in this context. Chaipau (talk) 10:13, 6 October 2020 (UTC)

Implementing the solution[edit]

I have implemented a solution: [19]. Effectively, I have removed the "Vangiya", as defined in Chatterjee, and replaced it with "Eastern Bengali" from Glottolog. This definition of Glottolog is based on the identification of Bengali-Assamese (Ethnologue) with Gauda-Kamrupa (Glottolog) as we have have discussed in this section: Wikipedia_talk:WikiProject_Linguistics#What_is_WP:SYNTH_when_using_multiple_sources_about_language_classification?. Chaipau (talk) 10:33, 18 October 2020 (UTC)

I have removed gottolog from the infobox since it is still WP:SYNTHESIS, combining two sources to form a family tree. The consensus in the discussion was to keep the family tree up to Bengali-Assamese languages, I have added "Disputed" after it, as recommended by Wugapodes. Za-ari-masen (talk) 16:30, 18 October 2020 (UTC)

What is WP:SYNTH when using multiple sources about language classification?[edit]

Although the point was brought up mainly as a result of a WP:BATTLEGROUND situation in a range of articles about languages in NE South Asia, I want to elicit your thoughts about the general problem.

Building an article based on multiple sources is obviously not synthesis, as long as we do not draw new conclusions based on the material in various reliable sources. Drawing new conclusions is SYNTH. Typical cases are

  • Families A and B are members of the proposed macro-family AB; another source says that family C is related to B: including C in macro-family AB is SYNTH.
  • Paleolinguists propose that the distribution of language family A is associated with the spread of Haplogroup Foo. A paleogenetic paper claims that Haplogroup Foo originates from area X. Saying that family A originates from area X is SYNTH.

But what about "vertical" grafting piping of trees? Consider this situation:

  • Source A discusses the division of a language family X into larger subgroups, one of them is "Fooic". Source B deals with the internal classifaction of Fooic. NB, there is no disagreement about the validity of Fooic. Is the combination of this information already synthesis? Or more concrete: Is the tree information Family XFooicSouthwest-Foo language SYNTH?

I dare to say that we do this everywhere on WP. I can hardly think of a source that provides the full tree information e.g. of Yorkshire dialect.

Another, more problematic example:

  • R. M. W. Dixon is a staunch opponent of the Pama–Nyungan family. OTOH, he has contributed a lot to the classification of smaller units of Australian languages usually included in Pama–Nyungan. Now, the mainstream of specialists accepts Pama–Nyungan. Are we then barred (per SYNTH) from using Dixon's micro-classifications in the presentation of the internal classifcation of Pama–Nyungan, because he opposes the latter? E.g. there is no disagreement between Dixon and the rest about the validity of Yolngu.

We all know that full tree information is most easily to get from Ethnologue and Glottolog. We also all know that this is just a default choice, but where better sources exist, we should make use of them. Usually, specialized sources will not always provide full tree data. I want to use such sources without running into danger of producing contestable synth content.

PS: Is taking the birth date and death date for Alfred E. Neuman from two different sources SYNTH? Austronesier (talk) 10:46, 6 October 2020 (UTC)

A very pertinent issue!
  • SYNTH should be OK as long as no new information is generated.
  • For a concrete example, where is the new information if we grafted the Gauda-Kamarupa tree from Glottolog to the Bengali-Assamese node of Ethnologue? Given that Ethnologue gives no sub-tree to Bengali-Assamese and Gauda-Kamarupa (Glottolog) = Bengali-Assamese (Ethnologue) is accepted.
Chaipau (talk) 11:44, 6 October 2020 (UTC)
Allow me to give another example. Say a source describes how a volcanic bassalt rock formation lies beneath X basin since the Jurrasic times and another source says the bassalt formation beneath basin X yields phosphate minerals, it is okay to write - the Jurrasic era volcanic rock formation beneath basin X yields phosphate minerals - because the connection is explicit.
It should be alright to use one tree upto Bengali-Assamese and another from Gauda-Kamarupa, if it is explicitly asserted by an RS that Bengali-Assamese is indeed Gauda-Kamarupa (especially so if the first one doesn't follow the tree beyond Bengali-Assamese). But it also needs to establish that Bengali-Assamese is Gauda-Kamarupa. Aditya(talkcontribs) 09:03, 7 October 2020 (UTC)
@Aditya Kabir: I think "Bengali-Assamese = Gauda-Kamarupa" can be estabilshed from Toulmin's dissertation (who btw considers it a weakly supported clade), but we can discuss this in depth in Talk:Bengali-Assamese languages once it becomes more peaceful there (#prayforNESouthAsia); you can bring darjeeling, I'll provide some bandrek. –Austronesier (talk) 18:36, 8 October 2020 (UTC)
  • The other side of the WP:SYNTH coin is WP:NOTSYNTH. Not all synthesis is against WP:OR, and I think it's more productive to discuss what constitutes "original research" when listing classifications. Say we have source 1 which says "if A then B" and source 2 says "if B then C" then saying "A -> B -> C" and citing 1 and 2 should be fine unless there is some other prominent hypothesis in the literature. If you submit an article with that claiming it's original research, you'll get laughed at because that's the obvious logical conclusion. It would not be okay to say "A -> C" since neither source connects A and C without B and claiming A directly implies C would constitute an original claim. If you sent that to a journal, explicitly cutting out B, that requires evidence and is OR. Wug·a·po·des 00:31, 8 October 2020 (UTC)
Excellent explanation. Can I offer you TeacupY a cup of hot darjeeling in appreciation? Or would you prefer beer instead? Aditya(talkcontribs) 17:46, 8 October 2020 (UTC)

Requested article: Language Question (Italy)[edit]

I recently wrote an article about the Language Question (Malta), and while I was researching it I came across a somewhat similar linguistic debate which took place in Italy. This is covered by a fairly decent article on the Italian Wikipedia (it:Questione della lingua) but there's no article about it on the English Wikipedia. Would someone from this project be interested in translating the article from Italian and perhaps improving upon it?

I am also making this request at WikiProject Languages and WikiProject Italy. --Xwejnusgozo (talk) 19:36, 9 October 2020 (UTC)

Limburgish short close-mid front rounded vowel[edit]

Hello. I'm about to change ⟨ʏ⟩ to ⟨ø⟩ in our Limburgish transcriptions. The reason for that is [ʏ] is heard by Dutchmen and Belgians as a variant of /y/, rather than /ʏ/ which is phonetically [ø] or [ɵ] in Limburgish (as it is in Standard Dutch). In conversations with native speakers of Dutch, at least two of them have complained to me about the misleading use of ⟨ʏ⟩ in IPA transcriptions of Dutch.

Gussenhoven (1992) reports a lowered [ʉ̞] as the norm in Northern Standard Dutch, whereas Collins & Mees (2003) report [ʏ] as the norm. Both describe /ʏ/ as close-mid, the former source describes it as closer to central [ɵ], whereas the latter closer to front [ø]. [ʉ̞, ɵ] for /y, ʏ/ have been reported to occur in the Limburgish dialect of Hamont (by Verhoeven 2007), whereas [ʏ, ɵ] for /y, ʏ/ (with [y] being a word-final allophone of the former) have been reported to occur in the Ripuarian dialect of Kerkrade (by SKD 1987), which is often treated as a Limburgish variety,

It's clear to me that wherever there's a contrast between a short /y/ and a short /ʏ/, the latter is typically not closer than close-mid, whereas the former is not necessarily fully close.

The symbol I've chosen is ⟨ø⟩, used by Peters (2006). It's also in line with how the related West Frisian vowel is transcribed. In order to completely bring the transcription of the close-mid vowels in line with West Frisian, I'm also going to change ⟨ʊ⟩ to ⟨o⟩ and leave ⟨ɪ⟩ as it is. I'm sure that both vowel symbols (meaning ⟨ʊ⟩ and ⟨o⟩) are used in Limburgish dialectology, as they are in Dutch dialectology.

ø⟩ is superior to ⟨ɵ⟩ in that it clearly shows that the vowel in question is the phonological short counterpart of /øː/. I also haven't seen ⟨ɵ⟩ used for the Limburgish vowel, though it has been used for the Dutch vowel and even for the West Frisian vowel. Even if it tends to be more central than front, in fact any of the so-called front rounded vowels in Limburgish can be central, and so can /œy/.

Full citations can be found on the following pages: Dutch phonology, Hamont-Achel dialect, Kerkrade dialect and Hasselt dialect.

I'm now going to WP:BOLDLY introduce the changes (⟨ʏ⟩ -> ⟨ø⟩ and ⟨ʊ⟩ -> ⟨o⟩). Sol505000 (talk) 12:11, 10 October 2020 (UTC)

I'm neutral to this, but I notice we don't have a dedicated {{IPA-li}} transcription template, which would help pave the way to creating an IPA help page and allow for a centralized place for discussions like this related to Limburgish. I'm a little short on time, but anyone who feels up to it can create the template and even look at Sol505000's recent edits (which they have so graciously provided a link to this talk page section in their edit summaries) to convert the transcriptions they've identified as Limburgish and change them from IPA-all to IPA-li. — Ƶ§œš¹ [lɛts b̥iː pʰəˈlaɪˀt] 16:06, 10 October 2020 (UTC)
@Aeusoes1: An IPA help page would be great, but I'm not sure how to deal with the pitch accent as it varies from region to region (or at least the way it's analyzed varies from source to source, if that makes sense). I understand that superscript numbers are disliked by many of the editors, and I think that I share that sentiment as well. Ooswesthoesbes could be of help here. Sol505000 (talk) 16:45, 10 October 2020 (UTC)
Before we create a help page, we generally start out by creating the IPA transcription template so that we can have an idea if there are enough transcriptions to merit the creation of such a help page. One step at a time. — Ƶ§œš¹ [lɛts b̥iː pʰəˈlaɪˀt] 17:16, 10 October 2020 (UTC)
@Aeusoes1: I see. I can create it, that's no problem. Just out of curiosity: how many transcriptions would have to exist to merit a help page? Sol505000 (talk) 17:46, 10 October 2020 (UTC)
I don't think we have a hard and fast agreement on that. IMHO, one or two transcriptions wouldn't be enough, but a couple of dozen would be enough to pass such a threshold. — Ƶ§œš¹ [lɛts b̥iː pʰəˈlaɪˀt] 18:08, 10 October 2020 (UTC)

The true vowel [ʏ] does not exist anywhere in the Limburgish linguistic area (even in the broad sense which includes Kleverlandic) as far as I know; it is, however, used very often due to tradition and allignment to Dutch. Another reason is that the short "u" is not pronounced the same in all places. My knowledge of Belgian Limburg is limited, so I can only talk about the Dutch dialects. Maastrichts, Roermonds and Weerts have a clear tendency towards [ɵ], while the rural dialects of Midden-Limburg use a simple short version of eu, that is [ø]. /ɪ/ is generally more closed than Dutch, so it would be best to stick with that. The use of [o] over [ʊ] is debatable, as it actually can be very close to [u] in some dialects. For simplicity's sake, [o] is a good choice, however, as it would make the phonology table more streamlined: (/i i: y y: u u: - ɪ (odd one out) - e̞ e: ʊ o: - ə - æ ɛ: œ œ: ɒ ɒ: - ɑ a:/) The big sidenote to this is that there are excentric dialects that contrast a three to four-way /æ a (ɒ) ɑ/ or even contrast /ɪ (e:) e̞ ɛ æ/. The transcriptions I have created thusfar are mainly based on my own dialect, Montforts, which is also well-described by Pierre Bakkes. Here, I chose to use [ø], but [ʊ], as in some words/derivitatives [ʊ] and short [u] are interchangeable due to there proximity (bók "buck" > boekketig "buck-like" etc.). And indeed, [y] does not exist as well, the best way to describe it would be [ʉ] ([ʉ̞] or [ʉ̜] specifically, depending on the dialect), but again, the Dutch tradition is to use [y]. Another note, /øy/ in Dutch usually ends in /øi/ in Limburgish, an often cited example is the pronunciation of the Dutch word truien as /truiwen/ in Northern Dutch vs. /truijen/ in Limburg.

When it comes to pitch accent, the most neutral way is to use simple diacritics: á for sleiptoean/drag tone, and à for stoeattoean/push tone, and none for a neutral accent. The exact pronunciation varies, and in some cases are even reverse, f.e. in Venlo the tones seem to be the opposite of those in Roermond. In literate, "a~" for drag tone and "a\" for push tone are often used, but they do not work well in combination with IPA (as the slash actually appears to close (\ vs. /) the transcription in IPA). --OosWesThoesBes (talk) 06:38, 11 October 2020 (UTC)

Zeugma (and syllepsis)[edit]

Wanting a quick working definition of zeugma and being too lazy to dig out my copy of Crystal's excellent Dictionary of Linguistics and Phonetics (let alone essay a definition by myself), I looked in Wikipedia for Zeugma. What I found amazed me, and not in a good way. I suppose that this is what happens when editors entrust linguistics matters to vaguely literary sources that demonstrate no knowledge of (post-18th-century) linguistics. I mean as a ferinstance:

"He works his work, I mine" (Tennyson, "Ulysses") [...] is ungrammatical from a grammarian's viewpoint, because "works" does not grammatically agree with "I": the sentence "I works mine" would be ungrammatical.

which I might rephrase as

"He works his work, I mine" (Tennyson, "Ulysses") [...] is ungrammatical from the viewpoint of a grammar-obsessed ignoramus, because the sentence "I works mine" would be ungrammatical.

The talk page sports a template that says something-something about applied linguistics. How zeugma (or syllepsis) is a matter of applied linguistics eludes me, and I hope that the content of articles such as this one isn't applied anywhere. (Rant over.) -- Hoary (talk) 00:12, 15 October 2020 (UTC)

Error (Linguistics)[edit]

Does anyone know if someone is currently working on this article? It seems unfinished. Below is my evaluation of it.

I chose this article because I thought Error meant that there was something wrong with the article that needed to be fixed. Once I started reading it, I found the topic interesting.


The article includes an introductory paragraph that explains the main topic covered briefly refers to content covered by the sections in the outline. This paragraph could be more concise and it does include information that is not later covered in the article.


The content covered is relevant to the topic, but some of the information could be more up to date and the author relies heavily on one source. The introduction refers to social perceptions and value claims that are not covered anywhere else in the article. The article does not address Wikipedia’s equity gaps.

Tone and Balance

The article appears to be neutral in tone.

Sources and References:

Sources are cited for the facts presented in the article, several are fairly recent. One link is broken and most of the citations are for books, not journal articles. I am unsure how to discern if the individual works cited are from historically marginalized individuals.


What is provided in the article is written clearly and easy to understand with not grammatical or spelling errors noted. It is quite short and only includes 2 sections, leaving the impression that it is unfinished.

Images and Media:

There are no images or media included.

Talk page:

There is no talk page for the article. Users are directed to the WikiProject Linguistics portal to leave feedback. The article is rated start class on the quality scale and has not been rated on the importance scale.

Overall Impressions:

The article is a good start, but seems incomplete. The comment about social perceptions and value claims should be removed if the statement isn't going to be expanded on by adding another section. Canonlvr (talk) 07:47, 15 October 2020 (UTC)

Hi Canonlvr, and thanks for the feedback! In general, it's better to post feedback like this on the article talk page since it will be easier to find when an interested editor wants to improve the article. But no worries this time, I copied it over there for you. Looking at the page history, no one is actively working on it. Based on your feedback, you might want to try copyediting the introduction or adding the {{One source}} banner at the top of the article. Wug·a·po·des 22:02, 15 October 2020 (UTC)

Does the history of the first print type for Bengali language belong in the history of the Bengali-Assamese script?[edit]

Bengali-Assamese script is used to write both Bengali and Assamese besides a host of other languages. The first printable types were produced for the Bengali language. Does it mean that since the types were made for the Bengali language, an account of this part of the history does not belong in the article? The issue here seems to be "nationalistic" in the sense that what "belongs" to Bengali cannot belong to anything that is named "Bengali-Assamese". Chaipau (talk) 10:32, 17 October 2020 (UTC)

It looks like there's a discussion on the talk page about this. If you would like community input, you might want to ask plainly. — Ƶ§œš¹ [lɛts b̥iː pʰəˈlaɪˀt] 22:35, 17 October 2020 (UTC)
Indeed and since I asked, we have made some progress towards resolution in the talk page: Talk:Bengali–Assamese_script#Printing. But I apologize if this was not done plainly. Chaipau (talk) 07:54, 18 October 2020 (UTC)

Sylheti language - how should we address the language vs dialect issue?[edit]

There seems to be some debate associated with the issue of whether the Sylheti language is a dialect or an independent language. So how should we address it on Wikipedia in the lead of the article. There is currently two different versions:

  • Address the issue frontally and state that it in the lead itself: is an Eastern Indo-Aryan language,[7][8] generally considered to be a dialect of the Bengali language.[9]
  • Not mention "language" and point to "variety" instead: is an Eastern Indo-Aryan variety,[7] generally considered as a part of the Vangiya dialect group of the Bengali language.[8]

Furthermore, regarding the opinion of a professional linguist (who happens to be a Wikipedian) in a [newspaper] that "they are almost universally considered by linguists to be separate languages on their own":

  • Is this a reliable source to claim that Sylheti is a separate language?
  • Should the paraphrase of the quoted sentence be "some linguists may consider Sylheti to be a separate language" or " linguists almost universally consider Sylheti to be a separate language"?

We have tried to discuss these issues here: Talk:Sylheti_language#Language_vs_Dialect.

@Za-ari-masen, UserNumber, and Abu Ayyub:

Thank you for your inputs.

Chaipau (talk) 12:36, 19 October 2020 (UTC)

The linguist is clearly pro-language especially with his terminology like "universally". UserNumber (talk) 13:34, 19 October 2020 (UTC)
Are you trying to apply a Wikipedia NPOV standard on a source? It can either be RS or not, and NPOV standards don't apply there. And by "universally" he is referring to the universe of linguists—he is eminently qualified to speak for them. Chaipau (talk) 18:21, 19 October 2020 (UTC)
  • Try something like is an Eastern Indo-Aryan language variety. Generally considered to be a dialect of the Bengali language, linguists classify it as a separate language. Wug·a·po·des 18:25, 19 October 2020 (UTC)
    • Thanks. This sounds very neutral and accurate to me. Chaipau (talk) 18:45, 19 October 2020 (UTC)
  • Where is the linguistic controversy here? Just start the article the way you would start any other language article, and some way further down in the lede mention that Bengalis, as well as some (most?) Sylhetis, consider it to be a dialect of Bengali. The word "variety" is a perfectly neutral word when used in properly linguistic discourse. It is much less than perfectly neutral when used in general contexts aimed at lay readership, like the lede sections of Wikipedia articles. Also, there are a number of Sylheti varieties, so unless you're narrowing your discussion to a particular one, calling Sylheti "a variety" (in the singular) can be misleading. – Uanfala (talk) 19:00, 19 October 2020 (UTC)
    • There seems to be no controversy among linguists—all including Chatterji (1926) maintain Sylheti is independent/distinct. Contemporary linguists are making that point more emphatically. But there is some resistance here to stating the linguists' position and giving more weight to the general belief that Sylheti is a dialect of Bengali. Interestingly, one of the sources calls it a minoritized language, which seems to be at play here in Wikipedia. I agree that linguistically "variety" could be misleading, but is there an alternative? I would think "Sylheti is an Eastern [[Indo-Aryan languages|Indo-Aryan language]]" (cite [20][21]), but these edits are quickly reverted by other editors. Chaipau (talk) 08:19, 20 October 2020 (UTC) Addendum: The compromise text above seems acceptable (none of the other editors have supported that yet) because the second sentence, as given by Wugapodes above, clarifies the situation accurately. Chaipau (talk) 08:30, 20 October 2020 (UTC)
  • There are still enough dispute among linguists on the language/dialect issue, Grierson, Chatterji, Rasinger and several others have called Sylheti a dialect of Bengali, while in several literature it has been described as "Sylheti Bangla". What Chaipau is quoting here is an opinion piece from a newspaper. Although, written by a linguist, the focus of the content is mainly the ethnic demography of Bangladesh where Sylheti is only mentioned in passing, it's clearly not reliable enough to cite a contentious fact per WP:CONTEXTMATTERS. Za-ari-masen (talk) 08:35, 20 October 2020 (UTC)
    • It should be pointed out that Grierson (1903) and Chatterji (1926) are old; and Rasinger's work (2007, based on the thesis) is not on Sylheti per se with the claim made more in passing. Even so Chatterji specifically calls the dialects independent [22] and mentions that Sylheti is further away from Bengali than Assamese a different language [23]. Besides Simard et. al. (2020, linked above), contemporary linguists have called it a language: Sen (2020) [24], Khan (2018, above) and others. The claim that linguists themselves are divided is not true as clearly stated by Khan: "they are almost universally considered by linguists to be separate languages on their own." That linguists are progressively endorsing the position that Sylheti is a language can be seen here: Gope&Mahanta (2014) "Sylheti is generally considered to be one of the varieties of Bangla"; Mahanta&Gope 2018 "Sylheti is an Indo-Aryan language spoken by about 11 million people in India and Bangladesh (Hammarström et al., 2017)." with the qualification "Along the linguistic continuum of eastern Indic languages, Sylheti occupies an ambiguous position, where it is considered a distinct language by many and also as a dialect of Bengali or Bangla by some others." The positions of linguists seem to have converged after 2017. Chaipau (talk) 09:28, 20 October 2020 (UTC)
      • What all the modern linguists agree to is that Sylheti's classification is still ambiguous, that's even shown in the quotes you provided. Khan (2018) is an opinion piece and not a reliable source. Za-ari-masen (talk) 09:41, 20 October 2020 (UTC)
  • (edit conflict) This is a recurrent problem. I agree with Uanfala that for the lay reader, obviously "language" is a more recognizable term than "language variety". And also, calling e.g. Sylheti (or another illustrative case, Low German) a (=one) language variety is inaccurate, because the term variety/lect implies a uniformity which is not given in this case with numerous local varieties of Sylheti. I agree to sacrifice recognizability in the lede for the sake of consenses, but not at the expense of accuracy. So Wugapodes's otherwise perfect suggestion still needs a minor tweak in that direction.
    As for the "dispute among linguists", that's mostly a matter of WP:DATED. Grierson, Chatterji are still reliable in the field of genealogical micro-classification (unless proven wrong by later studies), but not for the dialect/language-question. In their times, "dialect" was the default term for a "language variety" without literary tradition; most current linguists prefer "language" as default term for language varieties that are sufficiently distinct to the point of low or zero mutual intelligibility. The Daily Star op ed is nice, but there are of course better sources, such as this volume related to the SOAS Sylheti Project. –Austronesier (talk) 09:37, 20 October 2020 (UTC)