Page 7 of 11

What Marantz 1981 (probably) got wrong

A popular account of what Grimshaw called “complex event nominalizations” (cf. John’s frequent destruction of my ego) involves postulating that these nominalizations involve a nominalizing head taking a VP complement. When the head V of the VP moves to merge with the nominalizing head, the resulting structure has the internal syntactic structure of an NP, not a VP. For example, there’s no accusative case assignment to a direct object, and certain VP-only complements like double object constructions (give John a book) and small clauses (consider John intelligent) are prohibited (*the gift of John of a book, *the consideration of John intelligent).

Note that this analysis relies on the assumption that head movement (of V to N) has an impact on syntax. Before head movement applies, the verb phrase has verb phrase syntax, with the possibility of accusative case, small clauses and double object complements. After head movement applies, there is no VP syntax and the internal structure of the NP is that of any NP.

Within the development of Distributed Morphology, these consequences of head movement fit within the general approach of Marantz (1981, 1984) in which the operation of “morphological merger” (later equated with head movement and adjunction) causes structure collapsing. That is, when the verb merges with the nominal head, the VP effectively disappears (in Baker’s 1985 version, the structure doesn’t disappear but rather becomes “transparent”).

In a recent book manuscript (read this book!), Jim Wood (2020) argues that the VP account is not appropriate for Icelandic complex event nominalizations, and probably not right for English either. Among the pieces of evidence that Wood brings to the argument, perhaps the most striking is the observation that verbs in these nominalizations do not assign the idiosyncratic “quirky” cases to their objects that they do in VPs. If the VP analysis of complex event nominalizations is indeed wrong, then one might conclude that morphological merger-driven clause collapsing is simply not part of syntactic theory. It’s worth asking, however, what the motivation was for these consequences of morphological merger (or, head movement and adjunction) in the first place, and where we stand today with respect to the initial motivation for these mechanisms.

Allow me a bit of autobiography and let’s take a trip down memory lane to the fall of 1976. That fall I’m a visiting student at MIT, and I sit in on David Perlmutter’s seminar on Relational Grammar (RG). I meet Alice Harris and Georgian, my notebook fills with stratal diagrams, and I’m introduced to the even then rather vast set of RG analyses of causative constructions and of advancements to 2 (which include the “applicative” constructions of Bantu). Mind blowing stuff.  One aspect of RG that particularly stuck in my mind and that I would return to later was the role of morphology like causative and applicative affixes in the grammar. In RG, morphemes were reflective rather than causal; they “flagged” structures. So an affix on a verb was a signal of a certain syntactic structure rather than a morpheme that created or forced the structure.

In an important sense my dissertation involved the importing of major insights of RG into a more mainstream grammatical theory. (In linguist years, the fall of 1976 and the summer of 1981, when I filed my dissertation, are not that far apart.) Consider the RG analysis of causative constructions involving “causative clause union.” In this analysis, a bi-clausal structure, with “cause” as the head (the predicate, P) of the upper clause, becomes mono-clausal. Since the upper clause has a subject (a 1) and the lower clause has a subject (another 1), and there can be only one 1 per clause (the Stratal Uniqueness Law), something has to give when the clauses collapse. In the very general case, if the lower clause is intransitive, the lower subject becomes an object (a 2), now the highest available relation in the collapsed clause.


Stratal diagram for Korean causative of intransitive ‘Teacher made me return’ (Gerdts 1990: 206)

If the lower clause is transitive, its object (a 2) becomes the object of the collapsed clause, and the lower subject becomes an indirect object (a 3), the highest relation available.


Stratal diagram for Korean causative of transitive ‘John made me eat the rice cake’ (Gerdts 1990: 206)

In languages with double object constructions like those of the Bantu family, after clause union with a lower transitive clause, the lower subject, now a 3, “advances” to 2, putting the lower object “en chômage” and creating a syntax that looks grammatically like that of John gave Mary a book in English, which also involves 3 to 2 advancement.


Stratal diagram for Korean ditransitive ‘I taught the students English’ (Gerdts 1990: 210)

Within the Marantz (1981, 1984) framework, applicative constructions involve a PP complement to a verb, with the applicative morpheme as the head P of the PP. Morphological merger of the P head with the verb collapses the VP and PP together and puts the object of the P, the “applied object,” in a position to be the direct object of the derived applicative verb.

My general take in 1981 was that affixation (e.g., of a causative suffix to a verb) was itself responsible for the type of clause collapsing one sees in causative clause union. The lower verb, in a sentence that is the complement to the higher causative verb, would “merge” with the causative verb, with the automatic consequence of clause collapsing. I argued that a general calculus determined what grammatical roles the constituents of the lower clause would bear after collapsing, as in RG. There are many interesting details swirling around this analysis, and I proposed a particular account of the distinction between Turkish-type languages, in which the “causee” in a causative construction built on a transitive verb is oblique (dative, usually), and Bantu-type languages, in which this causee is a direct object in a double object construction.  Read the book (particularly those of you habituated to citing it without reading – you know who you are).

At this point of time, nearly 40 years later, the analysis seems likely wrong-headed. Already by the late 1980’s, inspired by a deep dive into Alice Harris’s dissertation-turned-book on Georgian (1976, 1981), I had concluded that my 1981 analysis of causatives and applicatives was on the wrong track. Instead of bi-clausal (or bi-domain in the case of applicatives) structures collapsing as the result of morphological merger, a more explanatory account could be formulated if the causative and applicative heads were, in effect, heads on the extended projection of the lower verb. Affixation/merger of the verb with these causative and applicative heads would have no effect on the grammatical relations held by the different nominal arguments in these constructions. This general approach was developed by a number of linguists in subsequent decades, notably Pylkkänen (2002, 2008), Wood & Marantz (2017) and, for the latest and bestest, Nie (2020), which I’ll discuss in a later post.

The crucial point here is that the type of theory that underlies the N + VP analysis of complex event nominalizations has lost its raison d’être, thereby leaving the analysis orphaned. If morphological merger has no effect on the syntax, at least in terms of the collapsing of domains, then a nominalization formed by an N head and a VP complement could easily have the internal VP syntax of a VP under Tense. This does not describe complex event nominalizations, which are purely nominal in structure, but the discussion so far does not rule out a possible class of nominalizations that would show a VP syntax internally and NP syntax externally. As we discussed in an earlier post, English gerunds are not examples of such a construction, since they are not nominal in any respect (see Reuland 1983 and Kiparsky 2017). However, maybe such constructions do exist. If they don’t, it would important to understand if something in the general theory rules them out. We will return to this issue in a subsequent post.

 

References

Baker, M.C. (1985). Incorporation, a theory of grammatical function changing. MIT: PhD dissertation.

Gerdts, D.B. (1990). Revaluation and Inheritance in Korean Causative Union, in B. Joseph and P. Postal (eds.), Studies in Relational Grammar 3, 203-246. Chicago: University of Chicago Press.

Harris, A.C. (1976). Grammatical relations in Modern Georgian. Harvard: PhD dissertation.

Harris, A.C. (1981). Georgian syntax: A study in Relational Grammar. Cambridge: CUP.

Kiparsky, P. (2017). Nominal verbs and transitive nouns: Vindicating lexicalism. In C. Bowern, L. Horn & R. Zanuttini (eds.), On looking into words (and beyond), 311-346. Berlin: Language Science Press.

Marantz, A. (1981). On the nature of grammatical relations. MIT: PhD dissertation.

Marantz, A. (1984). On the nature of grammatical relations. Cambridge, MA: MIT Press.

Nie, Y. (2020). Licensing arguments. NYU: PhD dissertation. https://ling.auf.net/lingbuzz/005283

Pylkkänen, L. (2002). Introducing arguments. MIT: PhD dissertation.

Pylkkänen, L. (2008). Introducing arguments. Cambridge, MA: MIT Press.

Reuland, E.J. (1983). Governing –ingLinguistic Inquiry 14(1): 101-136.

Wood, J., & Marantz, A. (2017). The interpretation of external arguments. In D’Alessandro, R., Franco, I., & Gallego, Á.J. (eds.), The verbal domain, 255-278. Oxford: OUP.

Wood, J. (2020). Icelandic nominalizations and allosemy. Yale University: ms. https://ling.auf.net/lingbuzz/005004

 

Phrase structures rules within words, Part 2

In the last post, we explored the use of phrase structure rules in accounting for the internal structure of words and concluded, as we did for phrase structure rules and sentences, that phrase structure rules are not part of the explanatory arsenal of current linguistic theory. The word structures described by phrase structure rules are explained by independent principles. In particular, the “label” of a morphologically complex word or word-internal constituent is a function of the labels of its daughter constituents and general principles, including whatever (non-phrase structural) principles are implicated in explaining “extended projections” of lexical categories.

However, it may turn out to be the case that phrase structure rules can serve to explain how morphologically complex words are recognized in language processing. This post will explore some possibilities for the use of (word-internal) phrases structure rules in word recognition and highlight the issues involved.

To begin, let’s look at some possible word structures, given our previous discussions of extended projections, c-selection, s-selection, feature-selection, and the possible different types of what have traditionally been called derivational morphemes. First, consider affixes like –ity, which attach to stems of a certain semantic sort (-ity s-selects for a property) and also feature-select for the identity of the head of the stems. For –ity, the set of heads that it feature-selects for include stems like sane and suffixes like –able and –al.  The structure of sanity and treatability might look as below:

     

Rather than place –ity in these trees as a lone daughter to the N node (and –able as the lone daughter of Adj) or give –ityan “N” feature, we show –ity adjoined to N. This would be consistent with the analyses of category heads in Distributed Morphology, with little n replacing N in the trees, and –ity considered a root adjoined to n. This discussion will assume that the details here don’t matter (though they probably will turn out to).

In considering probabilistic context-free phrase structure rules as part of a model of word recognition, the relevant parts of the trees above are at the top. We can ask, for treatability, whether the frequency of all nouns derived from adjectives, independent of any of the specific morphemes in the word, matter for the recognition of the word. In phrase structure rule terms, this would be the frequency of the rule N → Adj + N. For sanity, there are at least a couple different ways to think of its structure in phrase structure terms. If sane is really a root, rather than an adjective, then it’s not clear that the top phrase structure of sanity is any different from that of cat, consisting of a root adjoined to a category head.

However, one could also ask whether the probability of a derived noun (involving a root like –ity as well as a categorizing head, sketched below) as opposed to a non-derived noun (just the stem and the category suffix, with no additional root) could make a difference in processing.

     

Some ways, then, in which probabilistic context-free phrase structure rules could be shown to make a difference in word recognition is if processing is affected by:

  • frequency of categories (nouns vs. verbs vs. adjectives)
  • frequency of derivational frames (when one category is derived from another category)
  • frequency difference between non-derived categories (involving only a root and a category affix) and derived categories (involving at least an additional root tied to the category affix)

In visual lexical decision experiments, we know that by the time a participant presses a button to indicate that yes, they saw a real word as opposed to a non-word, the lexical category of the word makes a difference for reaction time above the usual frequency and form variables. In fact, as shown in Sharpe & Marantz (2017), reaction time in lexical decision can be modulated by connections between phonological/orthographic form of a word and how often that word is used as a noun or a verb. What we don’t yet know is if lexical category (by itself) or the sort of variable investigated in Sharpe & Marantz can modulate the “M170” – the response measured at 170ms after visual onset of a word stimuli in the visual word form area (VWFA) associated with morphological processing. Similarly, if we find that reaction time in lexical decision is modulated by the frequency of nouns formed from adjectives, we would still not know whether this variable is implicated specifically in morphological processing or in some later stage of word recognition within the lexical decision experimental paradigm.

However, we do know that certain probabilistic variables that don’t seem to implicate phrase structure rules do modulate visual word recognition at the M170. These include “transition probability,” which for the experiments in question was computed as the ratio of the frequency of a given stem + affix combination to the frequency of the stem in all its uses. So the transition probability from sane to –ity is computed as the ratio of the frequency of sanity to the stem frequency of sane (sane in all its derived and inflected forms). But we should investigate whether transition probability works to explain variance in the M170 because it represents something intrinsic to the storage of knowledge about words, or whether it could correlate with a set of variables related to phrase structure.

Compounds represent another class of morphologically complex words for which probabilistic phrase structure rules might be appropriate. Compound structures are subject to a great deal of cross-linguistic variation, and in work from the 1980’s, Lieber and others suggested that the phrase structure rules of a language might describe the types of compounds available in the language. So in English, rules like N → {Adj, N} + N might describe nominal compounds (green house, book store), while the lack of a compound rule V → X + V might account for the lack of productive verbal compounding. It’s not clear that the category of the nonhead constituent in an English compound is categorially constrained (keep off the grass sign, eating place), and in any case the syntactic structure of compounds is probably more complicated than it seems on the surface. Nevertheless, experiments should check whether, say, the general frequency of compounds consisting of noun + noun (yielding a noun) modulates morphological processing independently of the specific nouns involved.

Patterning perhaps with compounds are structures with affixes like –ish. Derivation with –ish is productive, which might seem to put –ish with comparative –er in the extended projection of adjectives (smaller, smallish). However, –ish, like compound heads, is not really restrictive as to the category of its stem (Google-ish, up-ish), and of course also has a use as an independent word (Dating one’s ex’s roommate is so ish, according to Urban Dictionary).

In short, it’s an open and interesting question what the relevant probabilistic structural information is for processing compounds and –ish-type derivatives, but we don’t yet know how general phrase structure knowledge might be relevant.

Finally, let’s return to inflection and the derivation we suggested might appear with inflection in the extended projections of lexical categories (e.g., nominalizing –er for verbs). If we treat the category of a word along its extended projection as remaining stable (e.g., Verb, for all nodes along the “spine” of the extended projection of a Verb), then the phrase structure rules for morphemes along extended projections would look something like: Verb → Verb + Tense. Note again that neither phrase structure rules nor standard selectional features are good tools for deriving the (relatively) fixed sequence of functional heads in an extended projection. But we could ask whether encoding knowledge of extended projections in phrase structure rules like Verb → Verb + Tense could aid in explaining morphological processing in some way. That is, could the processing of a tensed verb depend on the frequency of tensed verbs in the language, independently of any knowledge of the particular verb and tense at hand?

Other than phrase structure-type probabilities, what other probabilistic information about extended projections might modulate the processing of an inflected word independently of the specific morphemes in the word? In an interesting series of papers, Harald Baayen and colleagues have suggested that processing might be modulated by variables associated with probability distributions over paradigms (see, e.g., Milin et al. 2009a,b). In addition to exploring the effects on processing of what we have called transition probability (the probability of the inflected word given the stem, in one direction, or the probability of the inflected word given the affix, in the other), they propose that processing is also affected by the relative frequency of the various inflected forms of a word, computed as “paradigm entropy.” Transition probabilities and paradigm entropy are both variables associated with particular stems. Interestingly, they also employ variables involving probabilities from the language beyond the statistics of particular stems. Milin et al. (2009a) suggest that the relative entropy of the paradigm of a stem also modulates processing. Relative entropy involves a comparison of the paradigm entropy of the stem of a word with the average entropy of all the stems in the same inflectional class. The idea is information theoretic: how much additional information do you gain from identifying a specific stem (with its own paradigm entropy) once you know to which inflectional class the stem belongs? Figure 1 below from Milin et al. (2009a) shows the paradigm entropies of three Serbian nouns (knjiga, snaga, pucina) and the frequencies of the inflectional class (feminine a-class) to which they belong.

Relative entropy is a variable like the one explored in Sharpe & Marantz, which involved a comparison of the relationship between the form of a word and its usage as a noun vs. a verb with the average relationship between forms and usage across the language. What’s particularly interesting in the present context about the sort of paradigm variables identified by Milin et al. becomes clear if we recall the connection between paradigms and extended projections, and the identity between extended projections in the context of inflected words and extended projections in the context of sentences. As I suggested before, sentences in an important sense belong to verbal paradigms, which in English consist of a verb and the set of modals and auxiliaries associated with the “spine” of functional elements as summarized in Chomsky’s Tense-Modal-have-be-be-Verb sequence. If Milin et al. are on the right track, the relative entropy of these extended verbal “paradigms” should also be considered as a variable in sentence processing.

 

References

Milin, P., Filipović Đurđević, D., & Moscoso del Prado Martín, F. (2009a). The Simultaneous effects of inflectional paradigms and classes on lexical recognition: Evidence from Serbian. Journal of Memory of Language 60(1): 50-64.

Milin, P., Kuperman, V., Kostic, A., & Baayen, R.H. (2009b). Paradigms bit by bit: An information theoretic approach to the processing of paradigmatic structure in inflection and derivation. In Blevins, J.P. & J. Blevins, J. (eds.), Analogy in grammar: Form and acquisition, 214-252. Oxford: OUP.

Sharpe, V., & Marantz, A. (2017). Revisiting form typicality of nouns and verbs: a usage-based approach. The Mental Lexicon 12(2): 159-180.

Phrase structure rules within words

In a recent posting, we examined the nature and history of phrase structure rules in syntax, which describe the distribution of phrasal categories according to their internal structure. However, it is clear that phrases aren’t distributed within sentences according to their internal structure, i.e., according the category of their “lexical” head (N, V, Adj, P) and their associated “arguments” and modifiers. For example, English gerunds like Mary’s winning the race have the internal structure of verb phrases (including the subject) but distribute like noun phrases, appearing as subjects, objects, objects of prepositions, etc. Eric Reuland argued in 1983 that this behavior of gerunds could be attributed essentially to a feature associated with the –ing morpheme in a sentential structure, a feature that requires case marking. As long as gerunds meet the s-selectional properties of the verbs or prepositions for which they serve as objects or subjects, they can appear in the positions of noun phrases if these positions are associated with case. In a recent paper, Paul Kiparsky (2017) makes the same point: English gerund phrases contain none of the usual constituents of a noun phrase (determiners, adjectives, quantifiers, etc.), with the possible exception of the possessive subject. Given the variety of ways that overt subjects of non-finite clauses appear cross-linguistically, however, the possessive subject of a gerund would only be taken as indicating that gerunds were noun phrases if this correlated with something else (determiners, quantifiers, adjectives, etc.). In the absence of any corroborating evidence, the possessive subjects are just that – possessive (perhaps “genitive”) marked subjects of a non-finite gerund clause. Kiparsky thus revives Reuland’s analysis that associates the distribution of gerunds with the case-bearing property of the –ing morphology, not the possibly verbal categorial status of gerunds.

Our conclusion, then, was that standard phrase structure rules really play no role in current syntactic theory. Syntactic phrase structures have nonetheless been used to describe and explain the internal structure of words probably since the introduction of phrase structure rules to linguistic thinking. For morphological theory, the development of X-bar theory led to various applications of the theory to word structure. Selkirk and Lieber provide some early examples, with Lieber’s (1980) work from her dissertation on being the best guide to thinking here. The general idea is that words, like phrases, are endocentric, with the category of the word being the category of its head morpheme. Inflectional morphology then could be without category, as long as the lexical head (N, V, Adj) to which the inflection attaches serves as the head of the word, or some default mechanism “percolates” the category label of the stem to be the label of the inflected word. In this approach, category-changing derivational morphemes like English agentive –er would serve as heads and determine the category of the words they derive. Derivational affixes like un  that do not change category would either not be heads (perhaps they would be “adjuncts”) or would be category-less affixes, with the category of the stem “percolating” up by default to be the category of the derived form.

In a Lieber-style approach, phrase structure rules are superfluous. N → V + N, for example, with the N on the right dominating a derivational suffix, simply follows from the existence of a derivational suffix with the category N and a selectional requirement to attach to verb stems. For inflectional morphology, Lieber (1992) exploits the possibilities of “levels” given by X-bar theory to partially account for the templatic nature of inflection. In English, there is just one inflectional “slot” on the verb for tense and agreement information. One can account for this by establishing hierarchical levels for verbs: a lower stem level within which derivational morphology appears and a higher word level that is formed via the attachment of any tense or agreement suffix to a stem-level verb. In more highly inflected languages, there might be separate levels for, say, subject agreement and object agreement, which would allow both to occur on a single verb while prohibiting the occurrence of multiple subject agreement affixes on the same verb.

Inkelas (1993) extends this “levels” approach to for her analysis of Nimboran in a paper that explicitly contrasts Lieber’s phrase structure analysis of complementary distribution among morphemes to a linear template model of morphology. Inkelas organizes the prefixes and suffixes of the verbal system of Nimboran into levels in a fixed morphological hierarchy, shown below from her (68).

As Noyer (1998) and others point out, the main problem with the Lieber/Inkelas approach here is that, in the level system, there is no explicit connection between the sets of morphemes that appear at a given level and are thus in complementary distribution with one another (say, object agreement morphemes) and the feature(s) they express. In point of fact, for the most part the affixes that appear at any given level are featurally coherent (see in this light the recent work on feature clustering by Mansfield et al. 2020). Within standard phrase structure assumptions, if we want to say that a set of morphemes express tense and attach to a verbal stem, we would use the rule X → Verb + Tense, where the Tense on the right side of the rule is the node for the tense morphemes. But what’s on the left side of the rule? Lieber would have another “level” of V.

One can see, then, that Lieber’s approach is a notational variant of the notion of an “extended projection” of a verb, where functional material appears, level by level, in a fixed hierarchical order above the verb stem. We noted that in a previous post that within these extended projections, the “arguments” of “lexical” categories like V and N appear according to s(emantic)-selection of heads and requirements like those studied under the name of “case theory” (noun phrases need to appear in positions in which they can acquire case). Aside from the hierarchy of extended projections and the requirements of s-selection, the distribution of constituents is heavily restricted by feature selection – the requirement of heads for features on the constituents with which they combine. For example, the perfect auxiliary have in English requires a perfect participle feature on the verb phrase with which it combines.

Lieber’s approach shows that much of inflectional morphology can be seen as generated as part of the extended projection of verbs, nouns and adjectives. Certain types of productive derivational morphology, such as agentive –er, appear to pattern with inflection and could therefore also be seen as part of the extended projection of verbs. This would avoid having, for instance, a nominal element like –er c-select for the lexical category Verb. Furthermore, other derivational morphemes that are restricted to attach to a specific list of heads could be seen as s-selecting for the semantic category of the stems to which they attach and feature-selecting for a particular list of stems, rather than c-selecting for the category of these stems.

Putting node labels on X, Y, and Z in the phrase structure rule X → Y + Z is thus redundant. If we have instead, x → y + z, where x, y and z are variables over categories, then the categories for x, y, and z should be determined by examining the features of the constituents y and z that form x. This observation is what supports Chomsky’s (1995) move from a phrase structure theory to a “merge” theory of constituent structure. In a “merge” theory, the general rule is to combine or “merge” two constituents, y and z in our example, to form x, and have the features of y and z determine the features (or “label”) of x by general principles. Chomsky’s move was already anticipated in Lieber’s theory of word structure.

However, although phrase structure rules for a language can be derived from basic principles of constituent combination and of “labeling” (the manner in which the features of a constituent are determined from the features of its constituents), speakers might still use phrase structure rules as part of their knowledge of language in morphological processing. Recall that the “surprisal” of a morphologically complex word, as indexed, for instance, by the neural M170 response from the visual word form area, is not well modelled by the surface frequency of the word. Oseki (2018) suggests that a syntactic processing model relying on probabilistic context-free phrase structure rules might provide a better model of surprisal for visually presented complex words. In particular, such grammars assign an importance to the frequency of “frames” of categories, for example, Adj → Verb + Adj for deverbal adjective formation as in readable. The frequency of a frame would be relevant independent of the particular lexical items (read, –able) that form a particular word. The next installment will discuss this possibility.

 

References

Chomsky, N. (1995). The Minimalist Program. MIT Press.

Inkelas, S. (1993). Nimboran position class morphology. Natural Language & Linguistic Theory 11(4): 559-624

Kiparsky, P. (2017). Nominal verbs and transitive nouns: Vindicating lexicalism. In C. Bowern, L. Horn & R. Zanuttini (eds.), On looking into words (and beyond), 311–346. Berlin: Language Science Press.

Lieber, R. (1980). On the organization of the lexicon. MIT: PhD dissertation.

Lieber, R. (1992). Deconstructing morphology: word formation in syntactic theory. University of Chicago Press.

Mansfield, J., Stoll, S., & Bickel, B. (2020). Category clustering: A probabilistic bias in the morphology of verbal agreement marking. Language 96(2): 255-293.

Noyer, R. (1998). Impoverishment theory and morphosyntactic markedness. In Noyer, R., Lapointe, S. G., Brentari, D. K., & Farrell, P. M. (eds.), Morphology and its relation to phonology and syntax. CSLI.

Oseki, Y. (2018). Syntactic structures in morphological processing. New York University: PhD dissertation.

Reuland, E.J. (1983). Governing –ingLinguistic Inquiry 14(1): 101-136.

« Older posts Newer posts »

© 2024 NYU MorphLab

Theme by Anders NorenUp ↑