Page 10 of 11

The Canonical What I Learned On My Summer Vacation Post: SNL in Helsinki

August 28, 2019 / Alec Marantz / 0 Comments

In retrospect, we can identify the beginnings of contemporary neurolinguistics with Neville et al 1991 (Neville, H., Nicol, J. L., Barss, A., Forster, K. I., & Garrett, M. F. (1991). Syntactically based sentence processing classes: Evidence from event-related brain potentials. Journal of cognitive Neuroscience, 3(2), 151-165.) At the time, a dominant approach to language processing envisaged a system whereby people would predict the next word in a sentence by applying their statistical knowledge of word n-grams, based on their experience with language. Grammatical knowledge, as studied by linguists, was somehow an emergent property of string learning and not part of the cognitive model of linguistic performance. On this view, every ungrammatical string involved a point in which one’s string knowledge predicted zero likelihood of the encountered word, and ungrammaticality was on a continuum with unacceptability and very low Cloze probability. What the Neville et al paper demonstrated was that different types of ungrammaticality had different neural “signatures,” and these differed from that of an unexpected word (with low Cloze probability). One can have many quibbles with this paper. The generalization from the specific sentence types studied to classes of syntactic violations (as in the title), for example, is suspect. But the paper launched a ton of work examining the connection between detailed aspects of sentence structure and neural responses as measured by EEG. In recent years, there has been a move away from violation studies to work varying grammatical sentences along different dimensions, but experiments have found a consistent correlation between interesting linguistic structure and brain responses.

So it was a bit of a surprise to hear Ev Fedorenko at the just finished Neurobiology of Language meeting in Helsinki claim that the “language network” in the brain wasn’t particularly interested in the sort of details that the electrophysiologists (ERP, MEG researchers) among us have been studying. In particular, she was explicitly equating the type of computation that is involved in Cloze probability (lexical surprisal) with syntactic computations. Fedorenko’s gold standard for localizing the language network within an individual’s brain is an fMRI paradigm contrasting the brain’s response to listening to grammatical, coherent sentences and the brain’s response to listening to lists of pronounceable non-words. The activity in this network, for example, seems equally well predicted by syntactic and lexical surprisal modulations.

Given that the ERP/MEG literature details specific differences between e.g. lexical prediction and grammatical computations, if Fedorenko’s language network were in fact responsible for language processing, then perhaps the same areas of the brain are performing different tasks – i.e., separation in brain space would perhaps not be informative for the neurobiology of language. Fedorenko was asked this question after her talk, but she didn’t understand it. However, Riitta Salmelin’s talk in the same session of the conference did address Fedorenko’s position. Salmelin has been investigating whether ERP/MEG responses in particular experimental paradigms might yield different localizations for the source of language processing than fMRI activation from identical paradigms. Her work demonstrates that this is in fact the case, and she presented some ideas about why. She also remarked to me at the conference that Fedorenko’s “language network” does not include areas of interest for linguistic processing that she studies with MEG.

Of interest for our Blog notes is the nature of Fedorenko’s “syntactic surprisal” measure – the one that is supposed to correlate with activity in the same network as lexical surprisal, where lexical surprisal is computed via word 5-grams. Fedorenko’s syntactic surprisal measure comes from a delexicalized probabilistic context free grammar, i.e., from a consideration of the likelihood of a syntactic structure independent of any lexical items. We asked in a previous post whether this kind of syntactic surprisal is likely to represent speakers’ use of syntactic knowledge for parsing words, given the importance of lexical specific selection in word processing, but the same question could be asked about sentence processing. A recent paper from our lab, Sharpe, V., Reddigari, S., Pylkkänen, L., & Marantz, A. (2018). Automatic access to verb continuations on the lexical and categorical levels: evidence from MEG. Language, Cognition and Neuroscience, 34(2), 137–150, clearly separates predictions from verbs about the follow syntactic category from predictions for the following word. However, the predictions are grounded in the identity of the verb, so this is a mix of lexical and syntactic prediction (the predictor is lexicalized but the predicted category is syntactic, modulo the prediction of particular prepositions). What is clear is that syntactic surprisal as measured by a de-lexicalized probabilistic context free grammar is not the be-all and end-all of possible variables that might be used to explore the brain for areas performing syntactic computation. In particular, the status of de-lexicalized syntactic structure for syntactic parsing is up in the air. Nevertheless, in a proper multiple regression analysis of MEG brain responses to naturalistic speech, I’m willing to go out on the limb and predict that different brain regions will be sensitive to word n-gram surprisal and syntactic surprisal, as measured via a de-lexicalized probabilistic context free grammar.

A final note of clarification: Fedorenko in her talk suggested that there were linguistic theories that might predict no clear separation between, I believe, word meaning and syntax. Thus somehow e.g. Jackendoff’s Parallel Architecture and Construction Grammar would predict a lack of separation between lexical and syntactic surprisal. For both Jackendoff and Construction Grammar – and all serious current linguistic frameworks I know of – the ontology of lexical semantics and the ontology of syntactic categories are distinct. So Jackendoff has parallel syntactic and semantic structures, not no distinction between syntax and word meaning. Construction Grammar is similar in this respect. The question of whether speakers use de-lexicalized probabilistic syntactic knowledge in processing is a question for any syntactic theory, and all theories I can think of would survive a yes or no answer.

On Features

August 17, 2019 / Alec Marantz / 1 Comment

In the Jakobson/Halle tradition, morphological features were treated on par with phonological features. Binary features cross-classified a set of entities, phonemes in the case of Phonology and perhaps morphemes in the case of Morphology. Jakobson was clear that binary features project a multidimensional space for phonemes or morphemes. An alternative to cross-classificatory binary features would be a unidimensional linear hierarchy. Applied to the geometry of case, and to the issue of expected syncretism across cases in a language, the linear hierarchy predicts syncretism across continuous stretches of the hierarchy, while the binary feature approach predicts syncretism across neighbors in multidimensional space. 3 binary features project a cube, with each element (say, a case) at a vertex and syncretism predicted between elements connected by an edge.

Catherine Chvany describes Jackobson’s experiments with features for Slavic case in her paper, Chvany, Catherine V. “Jakobson’s fourth and fifth dimensions: On reconciling the cube model of case meanings with the two- dimensional matrices for case forms.” Case in Slavic (1986): 107-129, which we’ll read for my fall morphology course. Apparently, Jakobson explored a linear hierarchy of cases to account for case syncretism but moved to binary features, and a multi-dimensional case space, because observed syncretisms involved non-adjacent cases on the linear hierarchy. Morris Halle and I reached a similar conclusion from a paradigm of Polish cases in our “No Blur” paper.

Generative phonology has continually questioned whether shared behavior between phonological segments are best captured via cross-classifying binary features of the traditional sort or via some other representational system. Particle and Government Phonologies exploit privative unary features, and linear and more complicated hierarchies of such features have been explored in “feature geometries” of standard theories.

For morphology, linear hierarchies of monovalent features of the sort Jakobson abandoned have re-emerged most notably in Nanosyntax for the analysis of case, of person, gender and number, and of tense/aspect. I will blog about Nanosyntax later in the fall; here, one is tempted to remark that, as far as I can tell, Nanosyntacticians have not sufficiently tackled the sorts of generalization that led Jakobson away from linear case hierarchies or that motivated Halle & Marantz’s analysis of Polish. Here I would like to highlight a couple issues concerning the distribution of morphological features in words and phrases.

DM claims that some sets of features are not formed via syntactic merge. In Halle & Marantz 1993, these sets include sets for person/number/gender values of agreement morphemes, and features defining cases like nominative or dative.

From the point of view of canonical DM, the features of, say, person/number/gender and their organization could be investigated apart from the “merge and move” principles of syntactic structure building. The peculiarities of features in PNG bundles or case bundles might relate to the role of the features in semantic interpretation. Maybe some relevant features would be monovalent, and organized in a linear hierarchy, while others might be binary and cross-classificatory. The internal structure of such bundles might involve a theory like feature geometry in phonology — a fixed structure in which the individual features would find their unique positions. In phonology, it would seem strange to build a phoneme by free merge of phonetic features, checking the result of merge against some template — although perhaps this might be explored as an option.

If you have a fixed template of PNG features, or a strict linear hierarchy of monovalent case features, one needs to ask why syntactic merge should build this structure. In any case, the leading idea in DM would be that fixed hierarchies of features are internal to morphemes while the hierarchies of syntactic merge would be constrained by syntactic selection and by interpretation at the interfaces. I hope to explore later in this Blog the question of whether the mini-tree structures implied by selectional features are really equivalent to what’s encoded in a templatic hierarchy. In the recent history of DM, though, the working distinction between morpheme internal templatic structure and syntactic hierarchies of morphemes has played a role in research.

Teaching Halle & Marantz (1993)

August 10, 2019 / Alec Marantz / 1 Comment

I can’t remember a phone number for even a second, and when I’m introduced to people, I lose the beginning of their names by the time they reach the end (even for one syllable names, it seems). So any recounting of the origins of Halle & Marantz will necessarily involve rational reconstruction of what must have been going on in the early 1990’s. That being said, attention to the text reveals the many forces that led to the structure and content of the paper, and thus to the structure of canonical Distributed Morphology. Here I want to concentrate on the relationship between the goals of the paper and the various technical pieces of early DM — why there’s morphological merger, vocabulary insertion, impoverishment, fission and fusion.

It should be clear from the number of pages H&M devote to Georgian and, in particular, to Potawatomi, that a main thrust of the paper is a response to Steve Anderson’s A-Morphous Morphology. Following the lead of Robert Beard’s insights into “Separationist” morphology, we wanted to show that item and arrangement morphology could have its realizationism (separation of the syntactic and semantic features of morphemes from their phonological realization) and eat it, too. So, as we stated more directly in “Key Features of Distributed Morphology,” the aim was a marriage of Robert Beard on separation and Shelly Lieber on syntactic word formation — Late Insertion, and Syntax All The Way Down.

However, one shouldn’t forget that I wrote a very long dissertation turned book in the early 1980’s that concerned the relationship between word formation and syntax. The work is titled, “On the Nature of Grammatical Relations,” because it is, in a sense, a paean to Relational Grammar. It re-considers some of the bread and butter issues in RG (causative clause union, applicatives (advancement to 2), ascensions) within a more standard generative framework, with a particular emphasis on the connection between word formation and syntax. So, morphological merger between a higher causative head and the embedded verb might both be the cause of word formation (a verb with a causative suffix) and the structure reduction associated with causative clause union (or “restructuring”). Within this framework, morphological merger is distinct from traditional affix-hopping and from head raising, which don’t by themselves cause structure reduction. Baker’s subsequent work on “Incorporation” tried — and, in my opinion, failed — to unify head raising with the structure reduction associated with morphological merger. These issues are still quite live — see, e.g., Matushanky’s work on head movement.

These days, it might be useful to review the work from the early 1980’s on word formation and syntax. Richard Sproat’s papers are exemplary here. Those of us thinking hard about the issues explicitly connected the “bracketing paradoxes” of inflection (affix hopping creates a local relation between a head and inflection when the syntactic and semantic scope of the inflection is phrasal, not head to head) to similar mismatches between morphological and syntactic/semantic scope exemplified by clitics in particular (so “played” = “the Queen of England’s hat”). While it’s possible to think of all these bracketing mismatches as arising post-syntactically from a PF side morphological merger operation, my book explored the possibility that the same word formation operation of merger could feed the syntax, yielding syntactic restructuring in the case of causative constructions, for example. This may or may not be on the right track, but, as Matushansky makes clear, any phase-based Minimalist Program type syntax adopts an approach to cyclicity that would allow PF-directed morphological merger to feed back into the syntax.
It’s quite remarkable that, given my own pre-occupation with morphological merger and its potential interaction with the syntax, H&M write as if the field had coalesced around the conclusion that syntactic word formation was largely the result of head movement (raising) and adjunction. I’ll blog about this later, but pragmatically, adoption of this assumption about word formation allowed for a straightforward comparison between DM and Chomsky’s lexicalist syntactic theory of the time in the last section of the paper. Nevertheless, H&M are assuming that something like morphological merger/affix-hopping/lowering was necessary to create phonological words. So, for the “syntactic structure all the way down” key feature of DM, H&M are promoting head movement and adjunction as well as morphological merger. H&M leave aside any question about whether morphological merger might feed syntax.

For the “late insertion” key feature, H&M propose a particular technology for Vocabulary Insertion. The empirical target here is contextual allomorphy and (local) blocking relations. One could conclude, then, that the core of DM are the mechanisms of syntactic word formation and the mechanisms of PF realization — and the mechanisms proposed in H&M have been the topic of continuous research for the last 25 years.

What, then, about Fission, Fusion and Impoverishment? For these mechanisms, there were two driving forces at play: empirical domains of interest to Morphologists and the particular research of Eulalia Bonet and Rolf Noyer, which we were convinced by. Fission is a particular approach to the appearance of multiple exponence, and was expertly employed by Noyer in his analysis of Semitic verbal agreement. Fusion involves a head-on tackling of apparent portmanteau vocabulary items. To derive our analysis of syncretism, we required the one to one connection of terminal nodes to vocabulary items, and Fusion was in essence a brute force mechanism for covering situations in which arguably multiple terminal nodes feed the insertion of a single vocabulary item. Impoverishment accounts for two types of phenomena. The first is exemplified in Bonet’s work on Catalan clitics: the use of a unmarked Vocabulary Item in a marked environment. I still believe that the Impoverishment analysis is required to separate standard contextual allomorphy, where a marked VI appears in a marked environment (and a more general VI occurs elsewhere) from situations in which a more general VI occurs in a particular environment — the main argument is that the environment for VI and thus contextual allomorphy is local, while Impoverishment can occur at a distance. The other use of Impoverishment is for systematic paradigmatic gaps — where, for example, gender distinctions are lost in the plural, say. Here, the feature designations of VIs are sufficient to generate the forms without Impoverishment, but Impoverishment explicitly states the underlying generalization (e.g., no gender distinctions in the context of plural).

Jochen Trommer and others have shown that, by playing with the mechanisms of Vocabulary Insertion and with the assumptions about syntactic structure, none of these mechanisms are required to cover the empirical domains for which they were exploited in H&M. That they’re not necessary does not entail that they’re not actually part of the grammar — maybe they were the right approach to the phenomena to which they were applied. Personally, I believe the evidence for Impoverishment is strong, but I no longer adopt Fission and Fusion in my own work (although I’ll happily endorse them in the work of others).

To summarize, H&M lays the foundation for the syntactic word-building and late insertion theory of DM by describing the mechanisms of head movement and adjunction and Morphological Merger for word formation and the mechanisms of Vocabulary Insertion for late insertion. There’s way more of interest going on in the paper, which is, in bulk, a response to A-Morphous Morphology and to Chomsky’s then current version of lexicalism for inflectional morphology. What’s unfortunately largely missing is the concerns of “On the Nature of Grammatical Relations” — the precise interaction of word formation and syntax.

NYU MorphLab

Page 10 of 11

The Canonical What I Learned On My Summer Vacation Post: SNL in Helsinki

On Features

Teaching Halle & Marantz (1993)

News & Events

Meta