Talk:Universal Dependencies

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Lead rewrite flagged in 2019[edit]

Yes, the lead needs a rewrite, as correctly flagged. I did manage to make one giant step forward by removing the middle sentence of the quoted material (too much, wrong source, wrong tone, wrong place) and placing the same material in the lead text itself, while supplying it with three clarifying citations, none of which are ideal: one to arXiv, two to project home pages. But at least these track efficiently to the specific identities of the referenced projects. Which is more than the lead used to do.— MaxEnt 20:57, 8 May 2020 (UTC)[reply]

I ended up lingering and along the way rewrote the lead enough to remove the rewrite flag, though it now verges on stub territory. Hasty, but probably a good start. — MaxEnt 21:26, 8 May 2020 (UTC)[reply]

Function word nontroversy[edit]

I enjoyed reading the passage presently devoted to this, though I think there's perhaps more emphasis on this controversy than warranted. It's not even really a proper controversy; it's a more a matter of taste and evolution.

The quote you hear about the transition from old-school MT to the modern, more statistical approach: every time I fire a linguist, my accuracy goes up.

One of the problems here is that the human cortex is known to predict in parallel across all domains simultaneously. No-one told the human brain about the box model: lexicography, morphology, syntax, grammar, semantics.

And while many insights have been gained through formal syntactic analysis, it has always remained a dog's breakfast in the applied setting. The applied people would put up with this if the human brain had ever heard about the box model, but it hasn't. So then you're more in the domain of striving for portability and consistency across languages (and entire language families) and this is largely going to be a statistical beast, where your objective function concerning aggregate consistency is relatively remote from traditional content/function word analysis.

But speaking here from the perspective of a writer: rare is the sentence that can be rescued from having been miscast around a flabby or infelicitous main verb. Some words simply bear more of the weight than others. Call these "content" words if you wish. Absolutely, the applied people downstream are going to want the heavy nodes placed at the top of the annotation tree.

The human mind, when generating language—especially as a writer—needs some way to navigate among the myriad of closely related syntactic forms. Because in premeditated communication—where you often lack the interpersonal side-channel—it really pays to hitch subject and theme to strength of language when navigating word placement, syntax be damned (though you try to squeeze that in, too).

And if your sentence doesn't work? How shall I recast thee? Let me count the ways ...

Sometimes this involves a fiddle, other times wholesale revision, and yet other times, and simply toss it all away and begin again with an entirely new armload of material. Our mental versatility is this space is truly amazing. And I think this is closer to the job that syntax actually performs, helping us to quickly navigate the mental menu of similar forms.

For my money, our sentence production in ordinary speech is only quasi-syntactic: much is built out of commonplace chunks and fragments and then you try not to break the worst rules (like subject verb agreement). In a hot discussion where sentences are changing horses mid-flight, you can't even count on that, entirely.

Screenwriting 101: Try listening to how people really speak sometime. You'll be (anti)amazed.

"Controversy" is a slippery word sometimes. There was only ever one thing controversial about Einstein's theories of SR and GR: whether they described the actual physical world we live in, rather than merely being an inspired, consistent, and mathematically potent exploration of physical geometry. The first, last, and only refuge of a relativity Refusnik was to wish these theories far, far away into some other hypothetical universe.

I see very little controversy here: syntax had its long day in the sun, and failed to deliver the kinds of systems we now wish to build—largely because syntax rules the box model, and the box model was a misguided hope in the first place.

I've largely come to view syntax as a set of cognitive limbering exercises around best rhetorical practice (e.g. Winston Churchill).

The article in the Chicago Tribune on the same day was very similar, but the words attributed to Churchill were re-ordered:

"This is the kind of tedious nonsense with which I will not put up." The prime minister underscored "up" heavily.

In the rush to copy, no syntactic mutilation is spared by the saints above. That's a significant tell, if ever I saw one. — MaxEnt 22:53, 8 May 2020 (UTC)[reply]

A long-winded comment from atop the soapbox. Come to the point, and perhaps a meaningful exchange can ensue that might improve the article.--Tjo3ya (talk) 06:01, 9 May 2020 (UTC)[reply]