Author: Alec Marantz (Page 7 of 11)

Phrase structures rules within words, Part 2

In the last post, we explored the use of phrase structure rules in accounting for the internal structure of words and concluded, as we did for phrase structure rules and sentences, that phrase structure rules are not part of the explanatory arsenal of current linguistic theory. The word structures described by phrase structure rules are explained by independent principles. In particular, the “label” of a morphologically complex word or word-internal constituent is a function of the labels of its daughter constituents and general principles, including whatever (non-phrase structural) principles are implicated in explaining “extended projections” of lexical categories.

However, it may turn out to be the case that phrase structure rules can serve to explain how morphologically complex words are recognized in language processing. This post will explore some possibilities for the use of (word-internal) phrases structure rules in word recognition and highlight the issues involved.

To begin, let’s look at some possible word structures, given our previous discussions of extended projections, c-selection, s-selection, feature-selection, and the possible different types of what have traditionally been called derivational morphemes. First, consider affixes like –ity, which attach to stems of a certain semantic sort (-ity s-selects for a property) and also feature-select for the identity of the head of the stems. For –ity, the set of heads that it feature-selects for include stems like sane and suffixes like –able and –al.  The structure of sanity and treatability might look as below:

     

Rather than place –ity in these trees as a lone daughter to the N node (and –able as the lone daughter of Adj) or give –ityan “N” feature, we show –ity adjoined to N. This would be consistent with the analyses of category heads in Distributed Morphology, with little n replacing N in the trees, and –ity considered a root adjoined to n. This discussion will assume that the details here don’t matter (though they probably will turn out to).

In considering probabilistic context-free phrase structure rules as part of a model of word recognition, the relevant parts of the trees above are at the top. We can ask, for treatability, whether the frequency of all nouns derived from adjectives, independent of any of the specific morphemes in the word, matter for the recognition of the word. In phrase structure rule terms, this would be the frequency of the rule N → Adj + N. For sanity, there are at least a couple different ways to think of its structure in phrase structure terms. If sane is really a root, rather than an adjective, then it’s not clear that the top phrase structure of sanity is any different from that of cat, consisting of a root adjoined to a category head.

However, one could also ask whether the probability of a derived noun (involving a root like –ity as well as a categorizing head, sketched below) as opposed to a non-derived noun (just the stem and the category suffix, with no additional root) could make a difference in processing.

     

Some ways, then, in which probabilistic context-free phrase structure rules could be shown to make a difference in word recognition is if processing is affected by:

  • frequency of categories (nouns vs. verbs vs. adjectives)
  • frequency of derivational frames (when one category is derived from another category)
  • frequency difference between non-derived categories (involving only a root and a category affix) and derived categories (involving at least an additional root tied to the category affix)

In visual lexical decision experiments, we know that by the time a participant presses a button to indicate that yes, they saw a real word as opposed to a non-word, the lexical category of the word makes a difference for reaction time above the usual frequency and form variables. In fact, as shown in Sharpe & Marantz (2017), reaction time in lexical decision can be modulated by connections between phonological/orthographic form of a word and how often that word is used as a noun or a verb. What we don’t yet know is if lexical category (by itself) or the sort of variable investigated in Sharpe & Marantz can modulate the “M170” – the response measured at 170ms after visual onset of a word stimuli in the visual word form area (VWFA) associated with morphological processing. Similarly, if we find that reaction time in lexical decision is modulated by the frequency of nouns formed from adjectives, we would still not know whether this variable is implicated specifically in morphological processing or in some later stage of word recognition within the lexical decision experimental paradigm.

However, we do know that certain probabilistic variables that don’t seem to implicate phrase structure rules do modulate visual word recognition at the M170. These include “transition probability,” which for the experiments in question was computed as the ratio of the frequency of a given stem + affix combination to the frequency of the stem in all its uses. So the transition probability from sane to –ity is computed as the ratio of the frequency of sanity to the stem frequency of sane (sane in all its derived and inflected forms). But we should investigate whether transition probability works to explain variance in the M170 because it represents something intrinsic to the storage of knowledge about words, or whether it could correlate with a set of variables related to phrase structure.

Compounds represent another class of morphologically complex words for which probabilistic phrase structure rules might be appropriate. Compound structures are subject to a great deal of cross-linguistic variation, and in work from the 1980’s, Lieber and others suggested that the phrase structure rules of a language might describe the types of compounds available in the language. So in English, rules like N → {Adj, N} + N might describe nominal compounds (green house, book store), while the lack of a compound rule V → X + V might account for the lack of productive verbal compounding. It’s not clear that the category of the nonhead constituent in an English compound is categorially constrained (keep off the grass sign, eating place), and in any case the syntactic structure of compounds is probably more complicated than it seems on the surface. Nevertheless, experiments should check whether, say, the general frequency of compounds consisting of noun + noun (yielding a noun) modulates morphological processing independently of the specific nouns involved.

Patterning perhaps with compounds are structures with affixes like –ish. Derivation with –ish is productive, which might seem to put –ish with comparative –er in the extended projection of adjectives (smaller, smallish). However, –ish, like compound heads, is not really restrictive as to the category of its stem (Google-ish, up-ish), and of course also has a use as an independent word (Dating one’s ex’s roommate is so ish, according to Urban Dictionary).

In short, it’s an open and interesting question what the relevant probabilistic structural information is for processing compounds and –ish-type derivatives, but we don’t yet know how general phrase structure knowledge might be relevant.

Finally, let’s return to inflection and the derivation we suggested might appear with inflection in the extended projections of lexical categories (e.g., nominalizing –er for verbs). If we treat the category of a word along its extended projection as remaining stable (e.g., Verb, for all nodes along the “spine” of the extended projection of a Verb), then the phrase structure rules for morphemes along extended projections would look something like: Verb → Verb + Tense. Note again that neither phrase structure rules nor standard selectional features are good tools for deriving the (relatively) fixed sequence of functional heads in an extended projection. But we could ask whether encoding knowledge of extended projections in phrase structure rules like Verb → Verb + Tense could aid in explaining morphological processing in some way. That is, could the processing of a tensed verb depend on the frequency of tensed verbs in the language, independently of any knowledge of the particular verb and tense at hand?

Other than phrase structure-type probabilities, what other probabilistic information about extended projections might modulate the processing of an inflected word independently of the specific morphemes in the word? In an interesting series of papers, Harald Baayen and colleagues have suggested that processing might be modulated by variables associated with probability distributions over paradigms (see, e.g., Milin et al. 2009a,b). In addition to exploring the effects on processing of what we have called transition probability (the probability of the inflected word given the stem, in one direction, or the probability of the inflected word given the affix, in the other), they propose that processing is also affected by the relative frequency of the various inflected forms of a word, computed as “paradigm entropy.” Transition probabilities and paradigm entropy are both variables associated with particular stems. Interestingly, they also employ variables involving probabilities from the language beyond the statistics of particular stems. Milin et al. (2009a) suggest that the relative entropy of the paradigm of a stem also modulates processing. Relative entropy involves a comparison of the paradigm entropy of the stem of a word with the average entropy of all the stems in the same inflectional class. The idea is information theoretic: how much additional information do you gain from identifying a specific stem (with its own paradigm entropy) once you know to which inflectional class the stem belongs? Figure 1 below from Milin et al. (2009a) shows the paradigm entropies of three Serbian nouns (knjiga, snaga, pucina) and the frequencies of the inflectional class (feminine a-class) to which they belong.

Relative entropy is a variable like the one explored in Sharpe & Marantz, which involved a comparison of the relationship between the form of a word and its usage as a noun vs. a verb with the average relationship between forms and usage across the language. What’s particularly interesting in the present context about the sort of paradigm variables identified by Milin et al. becomes clear if we recall the connection between paradigms and extended projections, and the identity between extended projections in the context of inflected words and extended projections in the context of sentences. As I suggested before, sentences in an important sense belong to verbal paradigms, which in English consist of a verb and the set of modals and auxiliaries associated with the “spine” of functional elements as summarized in Chomsky’s Tense-Modal-have-be-be-Verb sequence. If Milin et al. are on the right track, the relative entropy of these extended verbal “paradigms” should also be considered as a variable in sentence processing.

 

References

Milin, P., Filipović Đurđević, D., & Moscoso del Prado Martín, F. (2009a). The Simultaneous effects of inflectional paradigms and classes on lexical recognition: Evidence from Serbian. Journal of Memory of Language 60(1): 50-64.

Milin, P., Kuperman, V., Kostic, A., & Baayen, R.H. (2009b). Paradigms bit by bit: An information theoretic approach to the processing of paradigmatic structure in inflection and derivation. In Blevins, J.P. & J. Blevins, J. (eds.), Analogy in grammar: Form and acquisition, 214-252. Oxford: OUP.

Sharpe, V., & Marantz, A. (2017). Revisiting form typicality of nouns and verbs: a usage-based approach. The Mental Lexicon 12(2): 159-180.

Phrase structure rules within words

In a recent posting, we examined the nature and history of phrase structure rules in syntax, which describe the distribution of phrasal categories according to their internal structure. However, it is clear that phrases aren’t distributed within sentences according to their internal structure, i.e., according the category of their “lexical” head (N, V, Adj, P) and their associated “arguments” and modifiers. For example, English gerunds like Mary’s winning the race have the internal structure of verb phrases (including the subject) but distribute like noun phrases, appearing as subjects, objects, objects of prepositions, etc. Eric Reuland argued in 1983 that this behavior of gerunds could be attributed essentially to a feature associated with the –ing morpheme in a sentential structure, a feature that requires case marking. As long as gerunds meet the s-selectional properties of the verbs or prepositions for which they serve as objects or subjects, they can appear in the positions of noun phrases if these positions are associated with case. In a recent paper, Paul Kiparsky (2017) makes the same point: English gerund phrases contain none of the usual constituents of a noun phrase (determiners, adjectives, quantifiers, etc.), with the possible exception of the possessive subject. Given the variety of ways that overt subjects of non-finite clauses appear cross-linguistically, however, the possessive subject of a gerund would only be taken as indicating that gerunds were noun phrases if this correlated with something else (determiners, quantifiers, adjectives, etc.). In the absence of any corroborating evidence, the possessive subjects are just that – possessive (perhaps “genitive”) marked subjects of a non-finite gerund clause. Kiparsky thus revives Reuland’s analysis that associates the distribution of gerunds with the case-bearing property of the –ing morphology, not the possibly verbal categorial status of gerunds.

Our conclusion, then, was that standard phrase structure rules really play no role in current syntactic theory. Syntactic phrase structures have nonetheless been used to describe and explain the internal structure of words probably since the introduction of phrase structure rules to linguistic thinking. For morphological theory, the development of X-bar theory led to various applications of the theory to word structure. Selkirk and Lieber provide some early examples, with Lieber’s (1980) work from her dissertation on being the best guide to thinking here. The general idea is that words, like phrases, are endocentric, with the category of the word being the category of its head morpheme. Inflectional morphology then could be without category, as long as the lexical head (N, V, Adj) to which the inflection attaches serves as the head of the word, or some default mechanism “percolates” the category label of the stem to be the label of the inflected word. In this approach, category-changing derivational morphemes like English agentive –er would serve as heads and determine the category of the words they derive. Derivational affixes like un  that do not change category would either not be heads (perhaps they would be “adjuncts”) or would be category-less affixes, with the category of the stem “percolating” up by default to be the category of the derived form.

In a Lieber-style approach, phrase structure rules are superfluous. N → V + N, for example, with the N on the right dominating a derivational suffix, simply follows from the existence of a derivational suffix with the category N and a selectional requirement to attach to verb stems. For inflectional morphology, Lieber (1992) exploits the possibilities of “levels” given by X-bar theory to partially account for the templatic nature of inflection. In English, there is just one inflectional “slot” on the verb for tense and agreement information. One can account for this by establishing hierarchical levels for verbs: a lower stem level within which derivational morphology appears and a higher word level that is formed via the attachment of any tense or agreement suffix to a stem-level verb. In more highly inflected languages, there might be separate levels for, say, subject agreement and object agreement, which would allow both to occur on a single verb while prohibiting the occurrence of multiple subject agreement affixes on the same verb.

Inkelas (1993) extends this “levels” approach to for her analysis of Nimboran in a paper that explicitly contrasts Lieber’s phrase structure analysis of complementary distribution among morphemes to a linear template model of morphology. Inkelas organizes the prefixes and suffixes of the verbal system of Nimboran into levels in a fixed morphological hierarchy, shown below from her (68).

As Noyer (1998) and others point out, the main problem with the Lieber/Inkelas approach here is that, in the level system, there is no explicit connection between the sets of morphemes that appear at a given level and are thus in complementary distribution with one another (say, object agreement morphemes) and the feature(s) they express. In point of fact, for the most part the affixes that appear at any given level are featurally coherent (see in this light the recent work on feature clustering by Mansfield et al. 2020). Within standard phrase structure assumptions, if we want to say that a set of morphemes express tense and attach to a verbal stem, we would use the rule X → Verb + Tense, where the Tense on the right side of the rule is the node for the tense morphemes. But what’s on the left side of the rule? Lieber would have another “level” of V.

One can see, then, that Lieber’s approach is a notational variant of the notion of an “extended projection” of a verb, where functional material appears, level by level, in a fixed hierarchical order above the verb stem. We noted that in a previous post that within these extended projections, the “arguments” of “lexical” categories like V and N appear according to s(emantic)-selection of heads and requirements like those studied under the name of “case theory” (noun phrases need to appear in positions in which they can acquire case). Aside from the hierarchy of extended projections and the requirements of s-selection, the distribution of constituents is heavily restricted by feature selection – the requirement of heads for features on the constituents with which they combine. For example, the perfect auxiliary have in English requires a perfect participle feature on the verb phrase with which it combines.

Lieber’s approach shows that much of inflectional morphology can be seen as generated as part of the extended projection of verbs, nouns and adjectives. Certain types of productive derivational morphology, such as agentive –er, appear to pattern with inflection and could therefore also be seen as part of the extended projection of verbs. This would avoid having, for instance, a nominal element like –er c-select for the lexical category Verb. Furthermore, other derivational morphemes that are restricted to attach to a specific list of heads could be seen as s-selecting for the semantic category of the stems to which they attach and feature-selecting for a particular list of stems, rather than c-selecting for the category of these stems.

Putting node labels on X, Y, and Z in the phrase structure rule X → Y + Z is thus redundant. If we have instead, x → y + z, where x, y and z are variables over categories, then the categories for x, y, and z should be determined by examining the features of the constituents y and z that form x. This observation is what supports Chomsky’s (1995) move from a phrase structure theory to a “merge” theory of constituent structure. In a “merge” theory, the general rule is to combine or “merge” two constituents, y and z in our example, to form x, and have the features of y and z determine the features (or “label”) of x by general principles. Chomsky’s move was already anticipated in Lieber’s theory of word structure.

However, although phrase structure rules for a language can be derived from basic principles of constituent combination and of “labeling” (the manner in which the features of a constituent are determined from the features of its constituents), speakers might still use phrase structure rules as part of their knowledge of language in morphological processing. Recall that the “surprisal” of a morphologically complex word, as indexed, for instance, by the neural M170 response from the visual word form area, is not well modelled by the surface frequency of the word. Oseki (2018) suggests that a syntactic processing model relying on probabilistic context-free phrase structure rules might provide a better model of surprisal for visually presented complex words. In particular, such grammars assign an importance to the frequency of “frames” of categories, for example, Adj → Verb + Adj for deverbal adjective formation as in readable. The frequency of a frame would be relevant independent of the particular lexical items (read, –able) that form a particular word. The next installment will discuss this possibility.

 

References

Chomsky, N. (1995). The Minimalist Program. MIT Press.

Inkelas, S. (1993). Nimboran position class morphology. Natural Language & Linguistic Theory 11(4): 559-624

Kiparsky, P. (2017). Nominal verbs and transitive nouns: Vindicating lexicalism. In C. Bowern, L. Horn & R. Zanuttini (eds.), On looking into words (and beyond), 311–346. Berlin: Language Science Press.

Lieber, R. (1980). On the organization of the lexicon. MIT: PhD dissertation.

Lieber, R. (1992). Deconstructing morphology: word formation in syntactic theory. University of Chicago Press.

Mansfield, J., Stoll, S., & Bickel, B. (2020). Category clustering: A probabilistic bias in the morphology of verbal agreement marking. Language 96(2): 255-293.

Noyer, R. (1998). Impoverishment theory and morphosyntactic markedness. In Noyer, R., Lapointe, S. G., Brentari, D. K., & Farrell, P. M. (eds.), Morphology and its relation to phonology and syntax. CSLI.

Oseki, Y. (2018). Syntactic structures in morphological processing. New York University: PhD dissertation.

Reuland, E.J. (1983). Governing –ingLinguistic Inquiry 14(1): 101-136.

Understanding sentences, Part 2

In Syntactic Structures, Chomsky (1957) provides an analysis of the English auxiliary verb system that explains both the order of auxiliary verbs, when more than one are present, and the connection between a given auxiliary and the morphological form of the auxiliary or main verb that follows. For example, progressive be follows perfect have and requires the present participle –ing suffix on the verb that follows it, as in John has been driving recklessly. Subsequent work on languages that load more “inflectional” morphology on verbs and use fewer independent auxiliary verbs has revealed that the order of tense, aspect, and modality morphology cross-linguistically generally mirrors the order of English auxiliaries: tense, modal, perfect, progressive, passive, verb (or, if these are realized as suffixes, verb, passive, progressive, perfect modal, tense). Work on the structure of verb phrases and noun phrases has revealed a set of “functional” categories (for nouns, things like number, definiteness and case) that form constituents with the noun and appear in similar hierarchies across languages.

Grimshaw (1991) was concerned with puzzles that involve the apparent optionality of these functional categories connected to nouns and verbs. For example, a verb phrase may appear only with tense, as in John sings, or it may appear with a number of auxiliaries, in which case tense appears on the top/left most auxiliary: John was/is singing, John has/had been singing, etc. If tense c-selects for a (main) verb, does it optionally also c-select for the progressive auxiliary, perfect auxiliary, etc.? The proper generalization, which was captured by Chomsky’s (1957) system in an elegant but problematic way, is that the functional categories appear in a fixed hierarchical order from the verb up (Chomsky had the auxiliaries in a fixed linear order, rather than a hierarchy, but subsequent research points to the hierarchical solution). There’s a sense in which the functional categories are optional – certainly no overt realization of aspect or “passive” is required in every English verb phrase. Yet there is also a downward selection associated with these categories. The modal auxiliaries, for example, require a bare verbal stem, while the perfect have auxiliary requires a perfect participle to head its complement, and the progressive auxiliary requires a present participle for its own complement.

Grimshaw suggested that noun, verbs, adjectives and prepositions (or postpositions) anchor the distribution of “functional” material like tense or number that appears with these words in larger phrases. To borrow her terminology, a “lexical” category (N, V, Adj, P) is associated with an “extended projection” of optional “functional” (non-lexical) heads. This fixed hierarchy of heads is projected above the structure in which the “arguments” of lexical categories, like subjects and objects, appear.

What emerges from this history of phrase structure within generative syntax since the 1950’s is an understanding of the distribution of morphemes and phrases in sentences that is not captured by standard phrase structure rules. Lexical categories are associated with an “extended projection,” the grammatical well-formedness of which is governed by a head’s demands for the features of the phrases that they combine with; for example, the perfective auxiliary wants to combine with a phrase headed by a perfect participle, and the verb rely wants to combine with a phrase headed by the preposition on. The requirements of heads are thus governed by properties related to semantic compositionality (s-selection) and not directly by subcategorization (c-selection). The “arguments” of lexical categories similarly have their distribution governed by factors of s-selection and other properties (e.g., noun phrases need case), rather than by c-selection of a particular item or by phrase structure generalizations that refer directly to category (e.g., VP → V NP, where NP is the category of the verb’s direct object).

How does this discussion of constituent structure relate to morphology and the internal structure of words? First, note that the formal statement of a selectional relation between one constituent and a feature of another constituent to which it is joined in a larger structure describes a small constituent structure (phrase structure) tree. For instance, to return to an example from Syntactic Structures, the auxiliary have in English selects for a complement headed by a perfect participle (often indicated by the form of one of the allomorphs of the perfect participle suffix –en). Chomsky formalized this dependency by having have introduced along with the –en suffix, then “hopping” the –en onto the adjacent verb, whatever that verb might be (progressive be, passive be, or the main verb). In line with contemporary theories, we might formalize the selectional properties of have with the feature in (1). This corresponds to, and could be used to generate or describe, the small tree in (1). We can suppose that the “perfect participle” features of –en are visible on the verb phrase node that contains verb-en.

(1) have :  [ __ [ verb+en … ] ]

     

Extrapolating from this example, we can note that by combining various mini-trees corresponding to selectional features, one can generate constituent structure trees for whole sentences. That is, sentence structure to some extent can be seen as a projection of selectional features.

Here we can see the connection between the structure of sentences and the internal structure of words. It is standard practice in generative grammar to encode the distributional properties of affixes in selectional features. For example, the suffix –er can attach to verbs to create agentive or instrumental nouns, a property encoded in the selectional feature in (2) with its corresponding mini-tree.

(2) –er : [N verb __ ]

     

The careful reader may notice an odd fact about the selectional feature (4): –er, of category N, appears to c-select for the category V. Yet in our discussion of lexical categories above in the phrasal domain, we noted that nouns, verbs and adjectives don’t generally c-select for their complements; rather, lexical categories “project” an “extended projection” of “functional” heads, and s-select for complements.

The term “derivational morphology” can be used to refer to affixes that appear to determine, and thus often appear to change, the category of the stems to which they attach. Derivational affixes in English fall into at least two (partially overlapping) categories: (i) those that are widely productive and don’t specify (beyond a category specification) a set of stems or affixes to which they like to attach, and (ii) those that are non- or semi-productive and only attach to a particular set of stems and affixes. Agentive/instrumental –er is a prime example of the first set, attaching to any verb, with the well-formedness of the result a function of the semantics of the combination (e.g., seemer is odd). The nominalizer –ity is of the second sort, creating nouns from a list of stems, some of which are bound roots (e.g., am-ity), and a set of adjectives ending specifically in the suffixes –al and –able. For this second set of derivational affixes, we can say that they s-select for their complement (-ity s-selects for a “property”) and further select for a specific set of morphemes, in the same way that, e.g., depend selects for on.

But for –er and affixes that productively attach to a lexical category of stems like verbs, we do seem to have some form of c-selection: the affixes seem to select for the category of the stems they attach to. But suppose this is upside-down. Suppose we can say that being a verb means that you can appear with –er. This is very similar to saying that the form verb-er can be projected up from the verb, in the same way that (tensed) verb-s and verb-ed are constructed. That is, –ercan be seen as part of the extended projection of a verb.

Extended projections are frequently analyzed as morphological paradigms when the functional material of the extended projection is realized as affixes on the head. By performing an extended projection and realizing the functional material morphophonologically, one fills out the paradigm of inflected forms of the head. On the proposed view that productive derivational morphology associated with categories of stems involves the extended projections of the stems themselves, forms in –er, for example, would then be part of the paradigm of verbs. (This discussion echoes Shigeru Miyagawa’s (1980) treatment of Japanese causatives in his dissertation.) I’ll fill in the details of this proposal, as well as explain the contrast that emerges between the two types of derivation (productive-paradigmatic vs. semi-productive-selectional), in a later post.

Finally, remember that extended projections can be phrasal. That is, the structure of an English sentence, with its possible auxiliary verbs and other material on top of the inflected main verb, is the extended projection of the verb that heads the verb phrase in the sentence. If we view the paradigms of inflected verbs and nouns as generated from the extended projections of their stems, we can view sentences in languages like English as paradigmatic – cells in the paradigm of the head verb generated via the extended projection of that verb. When we look at phonological words in agglutinative languages like Yup’ik Eskimo, we see that these words (i) can stand alone as sentences translated into full phrasal sentences in English and (ii) have been analyzed as part of the enormous paradigm of forms associated with the head verbal root of the word. These types of examples point directly to the connection between parsing words and parsing sentences.

 

References

Chomsky, N. (1957). Syntactic Structures. Walter de Gruyter.

Grimshaw, J. (1991). Extended projection. Brandeis University: Ms. (Also appeared in Grimshaw, J. (2005). Words and Structure. Stanford: CSLI).

Miyagawa, S. (1980). Complex verbs and the lexicon. University of Arizona: PhD dissertation.

« Older posts Newer posts »

© 2024 NYU MorphLab

Theme by Anders NorenUp ↑