Monthly Archives: August 2019

Contextual Allosemy in DM

So, Neil Myler and I are supposed to be writing a chapter on the topic of Contextual Allosemy for a DM volume. I thought I could Blog what I think is at stake here, to let the enormous Blogosphere let us know if we’re missing anything. All three of you readers. In our mind, the topic of contextual allosemy divides in two: contextual meanings of roots, and contextual meanings of functional morphemes. Both types of contextual allosemy, whether or not they reduce to a single phenomenon, should be subject to two sorts of locality constraints. Within the first phase in which they meet the interfaces, the trigger of allosemy – the context for contextual allosemy – must be structurally local to the target item whose meaning is being conditioned. In Marantz 2013 (Marantz, A. (2013). Locality domains for contextual allomorphy across the interfaces. Distributed morphology today: Morphemes for morris halle, 95-115), I suggested that the locality constraint here was adjacency, where semantically null items are invisible to the computation of the relevant “next to” relationship. Additionally, since the meaning of an element should be computed when it first hits the semantic interface, anything outside its first phase of interpretation could not serve to trigger a special meaning.

If Embick (perhaps Embick and me) is right, roots need to be categorized in the syntax – they won’t emerge bare at the semantic interface. So in a sense roots are always subject to contextual allosemy; they don’t have a bare semantic value. For functional morphemes, we’re inspired by Neil’s work on possession, where the little v that will be pronounced “have” is given a null interpretation in predicate possessive constructions. What’s suggested in Marantz 2013 is that contextual allosemy for functional morphemes involves a toggle between a specific meaning – say introducing an event variable for little v – and no meaning. The “no meaning” option creates situations in which a phonologically overt (but semantically null) morpheme fails to intervene between a trigger of contextual allosemy and a root subject to allosemy even though the morpheme intervenes phonologically (and thus would block contextual allomorphy between the trigger and the root).

I’ve been thinking more about this topic in light of phonological work by my colleague Juliet Stanton with Donca Steriade (Stanton, J. & Steriade, D. (2014). Stress windows and Base Faithfulness in English suffixal derivatives. (Handout)). S&S argue that, in English derivational morphology, the determination of the pronunciation of a derived form may depend on the pronunciation of a form of the root morpheme not included in the (cyclic) derivation of the form. For example, the first vowel of “atomicity” finds its quality, as a secondarily stressed vowel, in the form “atom” – the first vowel of its stem, “atomic,” is a reduced shwa from which the necessary value for stressed “a” in “atomicity” cannot be determined. If we’re thinking in DM terms, the adjective “atomic” should constitute a phase for phonological and semantic interpretation, after which the underlying vowel of “atom” in “atomic” would no longer be accessible, e.g., in the phase where noun “atomicity” is processed.

This argument assumes, reasonably, that “atomicity” has “atomic” as its base. The -ity ending is potentiated by -ic, and the derivation of a noun in -ity from an adjective in -ic is perhaps even productive. But is “atomicity” derived from “atomic” semantically?

Here’s the online definition of “atomic” in the sense most relevant to “atomicity”:

adjective
1. relating to an atom or atoms.
“the atomic nucleus”
o CHEMISTRY
(of a substance) consisting of uncombined atoms rather than molecules.
“atomic hydrogen”
o of or forming a single irreducible unit or component in a larger system.
“a society made up of atomic individuals pursuing private interests”

Here’s “atomicity”:

noun
1.
CHEMISTRY
the number of atoms in the molecules of an element.
2.
the state or fact of being composed of indivisible units.

Note that it’s “atomic individuals” and the “atomicity of society,” not the “atomicity of individuals” or “atomic society” (“atomic society” is post-apocalyptic). I think one can make the case that both “atomic” and “atomicity” (here in their non-nuclear, non-chemistry meanings) are semantically derived directly from “atom.”

Perhaps, then, the non-cyclicity of “atomicity” phonologically is paralleled by its non-cyclicity semantically, as would need to be the case in a strict interpretation of derivation by phase within DM. We would need -ic NOT to trigger a phase, meaning it could not be the realization of a little a node. I believe we’d need to commit to a theory in which the phonological form of most derivational affixes are the realizations of roots, not of category determining heads. So -ic in “atomicity” could then be a root attached to a category neutral head that does not trigger a phase. This conclusion that derivational affixes include phonologically contentful but a-categorical roots has already been argued for by Lowenstamm (on phonological grounds) and by De Belder (on syntactic and semantic grounds). De Belder specifically claims that -ic does not have an inherent category; we can point to words like “music,” “attic,” “traffic,” “mimic,” etc., alongside of words that are N/Adj ambiguous like “agnostic,” “stoic,” mystic,” etc.

In conclusion, although the within-phase domains of contextual allosemy and contextual allomorphy might diverge because null morphemes don’t intervene for the trigger/target relation of context/undergoer and what’s null in the phonology may differ from what’s null in the semantics, the phases that define the biggest domains from contextual allosemy/allomorphy might be the same. Standard DM assumes they are: one phase to rule them all.

The Canonical What I Learned On My Summer Vacation Post: SNL in Helsinki

In retrospect, we can identify the beginnings of contemporary neurolinguistics with Neville et al 1991 (Neville, H., Nicol, J. L., Barss, A., Forster, K. I., & Garrett, M. F. (1991). Syntactically based sentence processing classes: Evidence from event-related brain potentials. Journal of cognitive Neuroscience, 3(2), 151-165.) At the time, a dominant approach to language processing envisaged a system whereby people would predict the next word in a sentence by applying their statistical knowledge of word n-grams, based on their experience with language. Grammatical knowledge, as studied by linguists, was somehow an emergent property of string learning and not part of the cognitive model of linguistic performance. On this view, every ungrammatical string involved a point in which one’s string knowledge predicted zero likelihood of the encountered word, and ungrammaticality was on a continuum with unacceptability and very low Cloze probability. What the Neville et al paper demonstrated was that different types of ungrammaticality had different neural “signatures,” and these differed from that of an unexpected word (with low Cloze probability). One can have many quibbles with this paper. The generalization from the specific sentence types studied to classes of syntactic violations (as in the title), for example, is suspect. But the paper launched a ton of work examining the connection between detailed aspects of sentence structure and neural responses as measured by EEG. In recent years, there has been a move away from violation studies to work varying grammatical sentences along different dimensions, but experiments have found a consistent correlation between interesting linguistic structure and brain responses.

So it was a bit of a surprise to hear Ev Fedorenko at the just finished Neurobiology of Language meeting in Helsinki claim that the “language network” in the brain wasn’t particularly interested in the sort of details that the electrophysiologists (ERP, MEG researchers) among us have been studying. In particular, she was explicitly equating the type of computation that is involved in Cloze probability (lexical surprisal) with syntactic computations. Fedorenko’s gold standard for localizing the language network within an individual’s brain is an fMRI paradigm contrasting the brain’s response to listening to grammatical, coherent sentences and the brain’s response to listening to lists of pronounceable non-words. The activity in this network, for example, seems equally well predicted by syntactic and lexical surprisal modulations.

Given that the ERP/MEG literature details specific differences between e.g. lexical prediction and grammatical computations, if Fedorenko’s language network were in fact responsible for language processing, then perhaps the same areas of the brain are performing different tasks – i.e., separation in brain space would perhaps not be informative for the neurobiology of language. Fedorenko was asked this question after her talk, but she didn’t understand it. However, Riitta Salmelin’s talk in the same session of the conference did address Fedorenko’s position. Salmelin has been investigating whether ERP/MEG responses in particular experimental paradigms might yield different localizations for the source of language processing than fMRI activation from identical paradigms. Her work demonstrates that this is in fact the case, and she presented some ideas about why. She also remarked to me at the conference that Fedorenko’s “language network” does not include areas of interest for linguistic processing that she studies with MEG.

Of interest for our Blog notes is the nature of Fedorenko’s “syntactic surprisal” measure – the one that is supposed to correlate with activity in the same network as lexical surprisal, where lexical surprisal is computed via word 5-grams. Fedorenko’s syntactic surprisal measure comes from a delexicalized probabilistic context free grammar, i.e., from a consideration of the likelihood of a syntactic structure independent of any lexical items. We asked in a previous post whether this kind of syntactic surprisal is likely to represent speakers’ use of syntactic knowledge for parsing words, given the importance of lexical specific selection in word processing, but the same question could be asked about sentence processing. A recent paper from our lab, Sharpe, V., Reddigari, S., Pylkkänen, L., & Marantz, A. (2018). Automatic access to verb continuations on the lexical and categorical levels: evidence from MEG. Language, Cognition and Neuroscience, 34(2), 137–150, clearly separates predictions from verbs about the follow syntactic category from predictions for the following word. However, the predictions are grounded in the identity of the verb, so this is a mix of lexical and syntactic prediction (the predictor is lexicalized but the predicted category is syntactic, modulo the prediction of particular prepositions). What is clear is that syntactic surprisal as measured by a de-lexicalized probabilistic context free grammar is not the be-all and end-all of possible variables that might be used to explore the brain for areas performing syntactic computation. In particular, the status of de-lexicalized syntactic structure for syntactic parsing is up in the air. Nevertheless, in a proper multiple regression analysis of MEG brain responses to naturalistic speech, I’m willing to go out on the limb and predict that different brain regions will be sensitive to word n-gram surprisal and syntactic surprisal, as measured via a de-lexicalized probabilistic context free grammar.

A final note of clarification: Fedorenko in her talk suggested that there were linguistic theories that might predict no clear separation between, I believe, word meaning and syntax. Thus somehow e.g. Jackendoff’s Parallel Architecture and Construction Grammar would predict a lack of separation between lexical and syntactic surprisal. For both Jackendoff and Construction Grammar – and all serious current linguistic frameworks I know of – the ontology of lexical semantics and the ontology of syntactic categories are distinct. So Jackendoff has parallel syntactic and semantic structures, not no distinction between syntax and word meaning. Construction Grammar is similar in this respect. The question of whether speakers use de-lexicalized probabilistic syntactic knowledge in processing is a question for any syntactic theory, and all theories I can think of would survive a yes or no answer.

On Features

In the Jakobson/Halle tradition, morphological features were treated on par with phonological features. Binary features cross-classified a set of entities, phonemes in the case of Phonology and perhaps morphemes in the case of Morphology. Jakobson was clear that binary features project a multidimensional space for phonemes or morphemes. An alternative to cross-classificatory binary features would be a unidimensional linear hierarchy. Applied to the geometry of case, and to the issue of expected syncretism across cases in a language, the linear hierarchy predicts syncretism across continuous stretches of the hierarchy, while the binary feature approach predicts syncretism across neighbors in multidimensional space. 3 binary features project a cube, with each element (say, a case) at a vertex and syncretism predicted between elements connected by an edge.

Catherine Chvany describes Jackobson’s experiments with features for Slavic case in her paper, Chvany, Catherine V. “Jakobson’s fourth and fifth dimensions: On reconciling the cube model of case meanings with the two- dimensional matrices for case forms.” Case in Slavic (1986): 107-129, which we’ll read for my fall morphology course. Apparently, Jakobson explored a linear hierarchy of cases to account for case syncretism but moved to binary features, and a multi-dimensional case space, because observed syncretisms involved non-adjacent cases on the linear hierarchy. Morris Halle and I reached a similar conclusion from a paradigm of Polish cases in our “No Blur” paper.

Generative phonology has continually questioned whether shared behavior between phonological segments are best captured via cross-classifying binary features of the traditional sort or via some other representational system. Particle and Government Phonologies exploit privative unary features, and linear and more complicated hierarchies of such features have been explored in “feature geometries” of standard theories.

For morphology, linear hierarchies of monovalent features of the sort Jakobson abandoned have re-emerged most notably in Nanosyntax for the analysis of case, of person, gender and number, and of tense/aspect. I will blog about Nanosyntax later in the fall; here, one is tempted to remark that, as far as I can tell, Nanosyntacticians have not sufficiently tackled the sorts of generalization that led Jakobson away from linear case hierarchies or that motivated Halle & Marantz’s analysis of Polish. Here I would like to highlight a couple issues concerning the distribution of morphological features in words and phrases.

DM claims that some sets of features are not formed via syntactic merge. In Halle & Marantz 1993, these sets include sets for person/number/gender values of agreement morphemes, and features defining cases like nominative or dative.

From the point of view of canonical DM, the features of, say, person/number/gender and their organization could be investigated apart from the “merge and move” principles of syntactic structure building. The peculiarities of features in PNG bundles or case bundles might relate to the role of the features in semantic interpretation. Maybe some relevant features would be monovalent, and organized in a linear hierarchy, while others might be binary and cross-classificatory. The internal structure of such bundles might involve a theory like feature geometry in phonology — a fixed structure in which the individual features would find their unique positions. In phonology, it would seem strange to build a phoneme by free merge of phonetic features, checking the result of merge against some template — although perhaps this might be explored as an option.

If you have a fixed template of PNG features, or a strict linear hierarchy of monovalent case features, one needs to ask why syntactic merge should build this structure. In any case, the leading idea in DM would be that fixed hierarchies of features are internal to morphemes while the hierarchies of syntactic merge would be constrained by syntactic selection and by interpretation at the interfaces. I hope to explore later in this Blog the question of whether the mini-tree structures implied by selectional features are really equivalent to what’s encoded in a templatic hierarchy. In the recent history of DM, though, the working distinction between morpheme internal templatic structure and syntactic hierarchies of morphemes has played a role in research.