August 24th, 2012

По поводу статьи R.Bouckaert et al. Mapping the Origins and Expansion of the Indo-European Language

РИА "Новости" попросили дать комментарий ко вчерашней статье в Science, посвященной прародине индоевропейской семьи. Так как я написал текст на несколько страниц, а в итоговую заметку РИА попала только пара моих фраз, помещаю весь текст в блоге.

А. С. Касьян
Институт языкознания РАН / Центр компаративистики ИВКА РГГУ (Москва)

По поводу статьи R.Bouckaert, P.Lemey, M.Dunn, S.J.Greenhill, A.V.Alekseyenko, A.J.Drummond, R.D.Gray, M.A.Suchard, Q.D.Atkinson. Mapping the Origins and Expansion of the Indo-European Language Family // Science, Vol 337, 24 August 2012.

На мой взгляд, исследования, подобные рецензируемой статье, на данном этапе интереснее всего опытом применения в исторической лингвистике методов из смежных наук, например, биологии. Говорить же, что проблема праиндоевропейской прародины, ее датировки или хотя бы построения генеалогического древа индоевропейских языков закрыта, пока нельзя.


UPD. 12.09.2012. A short talk for the discussion with Profs. J.P.Mallory, D.W.Anthony, P. Heggarty et al. at Indo-European Homeland and Migrations: Linguistics, Archeology and DNA: N. Ya. Merpert Memorial Round Table, Moscow, RSUH, 12 Sept. 2012

Dear colleagues, briefly I would like to put together several general objections to the paper under review.

1) Any radio-carbon or molecular analysis starts from sample cleaning — a rather difficult and costly procedure. A clean sample, however, is a necessary condition of reliable results. In historical linguistics, the analogue are wordlists — accurately compiled and accurately etymologized.

The topology of Bouckaert’s tree seems very strange. I have not checked it through, but several points immediately burst upon the eye. Two instances:
Within the IE tree: Tocharian is linked to Armenian

Within the Slavic tree: Polish is linked to Byelorussian.

These affiliations are certainly wrong. Such a topology may only indicate the following thing: the original wordlists used by the authors under review are unfortunately of low quality. This is not surprising, however, because the basic lexicographic source of the authors is Isidore Dyen’s database as well various not very reliable secondary publications, including even the anonymous wordlists available in the current version of English Wikipedia(!).

Such an approach sharply contrasts with high quality standards of, for example, of our the Global Lexicostatistical Database project. Let me demonstrate an example.

This is the database of the Lezgian linguistic group, which I currently compile. As you can see, all relevant primary sources (dictionaries, grammars and text collections) are used, all forms are accompanied with direct references, all important phonetic, morphological and semantic nuances are explicitly discussed. Indeed it is a hard work, because compiling a list of one language can take two or three weeks.
Summing up, Bouckaert’s tree is unreliable topologically and, as a result, unreliable chronologically.

2) The assumption that a virus is an analogue of a language is not obvious. Such a statement needs serious arguments. I suppose, however, that the analogy is incorrect, because, firstly a human being is normally a native speaker of one language, whereas there are a lot of viruses in human organism. Secondly, language shift is only possible under very special socioeconomic conditions, whereas the virus transfer is a mechanical process.

One could recall the previous publication of Q.Atkinson in the same Science journal one year ago, where the author compares the phonemic inventory of a human language with gene pool of a population. This precedent demonstrates that using incorrect methodological assumptions and incorrect raw data we are able to “prove” any hypothesis with the help of a heavy mathematical apparatus.

So perhaps we currently deal with a scientific phantom of the same kind.

3) The authors only evaluate two homeland theories: the Steppe scenario and the Anatolian scenario. In fact, however, there exists the third theory: Carpatho-Balkan Metallurgical Province.

As one can see, authors’ maps suggest that indeed Anatolia is the probabilistic center, but Carpatho-Balkans, South Caucasus and North Syria cannot be excluded as an IE homeland.
The left map reflects the plane landscape, that is the Proto-Tocharians go directly across the Caspian sea, whereas the Proto-Indo-Iranians go directly across the Zagros.

The right map reflects the scenario in which waterways are prohibited. We see that the probabilistic center has moved from Central Anatolia to the west. So I suspect that if we forbid the Proto-Indo-Iranians to fly over the Zagros and, e.g., disjoin Tocharian from Armenian, the probabilistic center might move from Anatolia to Carpatho-Balkans.

In any case, such a virological geographical method is hardly applicable to linguistics, because there are a lot of ancient IE languages, which are undocumented or too poorly documented. Theoretically these lacunae do not falsify our phylogenetic tree, but they should seriously affect any geographic models. For example, I suppose that the large Scythia territory (which was not taken into account by the authors due to lack of data on the Scythian language) could dramatically change the picture.