Monthly Archives: July 2020

Words and Rules redux

On the general line of thinking in these blog posts, a word like walked is morphologically complex because it consists of at least some representation of a stem plus a past tense feature (more specifically, a head in the extended projection of a verb). This is also true of the “irregular” word taught. Thus there is an important linguistic angle from which walked and taught are equally morphologically complex, whatever one thinks about how many phonological or syntactic pieces there are in either form.

Steven Pinker, in his (1999/2015) Words and Rules work, proposes a sharp dichotomy between morphologically complex words that are constructed by a rule of grammar and thus are not stored as wholes vs. complex words that are not constructed by a rule and thus are stored as wholes. For Pinker, the E. coli of psycho-morphology is the English past tense, which he began to study (in 1988, with Alan Prince) when preparing a response to McClelland and Rumelhart’s (1987) connectionist model. The idea was that the relationship between teach and taught is a relationship between whole words (like that between cat and dog), while the relationship between walk and walked is rule-governed, such that walked is not stored as a word and must be generated by the grammar when the word is used.

In point of fact, the English past tense is not a particularly good test animal for theories of morphological processing. The type of allomorphy illustrated by English inflection is limited and it confuses two potentially separable issues: stem allomorphy and affix allomorphy. For example, in the “irregular” past tense felt, we see the special fel- stem, where feel-would be expected, and the irregular –t affix, where –d would be expected (compare peeled). In canonical Indo-European languages with rich inflectional morphology, “irregular” (not completely predictable) forms of a stem can combine with regular suffixes, and unpredictable forms of suffixes can combine with regular stems. From a linguistic point of view, taught could be either a special stem form taught- with a phonologically zero past tense ending, a special stem form taugh- with a special ending –t (a wide-spread allomorph of the English past tense, but not generally predicted after stems ending in a vowel, where –d is the default), or a “portmanteau” form covering both the stem and the past tense – this last option seems to be what Pinker had in mind. Even mildly complex inflectional systems, then, exhibit a variety of types of “irregularity.” The very notion of “irregularity,” that a pattern is not predictable from the general facts of a language, implies that something needs to be learned and memorized about irregular forms. But the conflation of irregularity with stored whole forms, as in Pinker’s analysis of English irregular past tense, obscures important issues and questions for morphology and morphological processing.

A textbook case of irregular stems with regular endings occur in the Latin verb ‘to carry.’ As canonically presented in Latin dictionaries, the three “principal parts” of ‘to carry’ are ferō ‘I carry,’ tulī ‘I carried’ and the participle lātum‘carried’, with three “suppletive” stems. Crucially, each of these stems occurs with endings appropriate for the inflectional class of the stem (for contrasts like indicative vs. subjunctive and for person and number of the subject). It’s not at all obvious what the general Words and Rules approach would say about such cases, but memorization of whole words here doesn’t seem to be a plausible option. Once we sketch out a general theory of “irregularity,” the proper analysis of the English past tense should fall into line with what’s demanded by the general theory.

Pinker invokes an “add –ed” past tense rule when explaining his approach in general terms, but in his work he sometimes presents a more explicit account of how a grammar might generate the past tense forms in a Word and Rules universe. Here, the important concept is that a stored, special form blocks the application of a general rule.

The implementation follows the linguistics lead of Paul Kiparsky’s version of Lexical Phonology and Morphology from around 1982. At this point in time, Kiparsky’s notion was that a verb would enter into the (lexical) morphology with a tense feature. At “Level 1” in a multi-level morpho-phonology, an irregular form would spell out the verb plus the past tense feature, blocking the affixation of the regular /d/ suffix at a later level. The pièce de résistance of this theory was an account of an interesting contrast between regular and irregular plurals in compound formation. Famously, children and adults find mice-eater to be more well-formed than rats-eater. Kiparsky’s account put compound formation at a morpho-phonological “level” between irregular morphology and regular inflection, allowing irregular inflected forms to feed compound formation, but having compound formation bleed regular inflection (on an internal constituent). The phenomenon here is fascinating and worthy of the enormous literature devoted to it. However, Kiparsky’s analysis was a kind of non-starter from the beginning. The problem was that if irregular inflection occurs at Level 1 in the morpho-phonology, it should not only feed compound formation but also derivational morphology. So, someone that used to teach could be a taughter on this analysis.

To cut to the chase, the Words and Rules approach to morphology isn’t compatible with any (even mildly current) linguistic theory, and as noted above, it’s difficult to apply beyond specific examples that closely resemble the English past tense, where the irregular form may appear to be a portmanteau covering a stem and affixal features. However, Pinker has always claimed that experimental data support his approach, so it’s important to investigate whether his particular proposal about stored vs. computed words makes interesting and correct predictions about experimental outcomes. Here it’s important to distinguish data from production studies and data from word recognition.

For production, the Words and Rules framework was supposed to make two types of prediction, one for non-impaired populations and another for impaired populations. In terms of reaction time, non-impaired speakers were supposed to produce the past tense of a presented verb stem in a time that, for irregulars, correlated with the surface frequency of the past tense verb and, for regulars, correlated with the frequency of the stem. For impaired speakers, the prediction was a double dissociation: impairment to the memory systems would differentially impair irregulars over regulars, while impairment to the motor sequencing system would differentially impair regulars over irregulars. Michael Ullman took over this research project, using the English past tense as an assay for the type of impairment a particular population might be suffering (see, e.g., Ullman et al. 2005). In his declarative/procedural model, irregulars are produced as independent words, while regulars are produced by the procedural system, which is involved in motor planning and execution. However, for Ullman the story is clearly one about the specifics of production, and not about the grammatical system, as it is for Pinker. For example, his studies find that women are more likely than men to produce regular past tense forms at a speed correlated with surface frequency, which suggests that women memorize these forms, while (most/many) men do not. If Ullman were connecting his studies to the grammatical system, he would predict that women more than men would like rats-eater, for example. But his theory is about online performance rather than grammatical knowledge or use. By sticking with systems like the English past-tense, which confounds morphological affixation with phonological concatenation, Ullman can’t distinguish whether the declarative/procedural divide is about the phonological sequencing of the phonological forms of morphemes or about the concatenation of morphemes themselves.

A nice study by Sahin et al. (2009, which includes Pinker as co-author) does explore the neural mechanisms of the production of inflected forms with an eye to distinguishing phonological and morphological processing. Sahin et al. find stages in processing in the frontal lobe that are differentially sensitive to morphological structure (reflecting, say, the process “add inflection”) and phonological structure (reflecting, say, the process “add an overt suffix”), with the former preceding the latter. Interestingly, Sahin et al. found no difference between regular and irregular inflection.

In short, the conclusion from the production studies, no matter how charitable one is to Ullman’s experiments (see Embick and Marantz 2005 for a less charitable view), is that although phonological concatenation in production may distinguish between forms with overt suffixes and forms with phonologically zero affixes, no data from these studies support the Words and Rules theory when interpreted to be about morphological processing.

But what about processing in word recognition or perception? Here, it’s unclear whether there was ever any convincing support for the Words and Rules approach. Pinker and others cite a paper by Alegre and Gordon (1999) as providing evidence for the memorization vs. rule application distinction in lexical decision paradigms. However, Alegre and Gordon’s experiments and their interpretation, even if taken at face value, would hardly be the type of evidence one would want for Words and Rules. Their initial experiment finds no frequency effects for reaction time in lexical decision for regular verbs and nouns (expanding well beyond the past tense to other verb forms and to noun plurals) – neither “surface” frequency of the inflected form nor a type of base frequency (frequency of the stem across all inflections, which they call “cluster frequency”). In subsequent experiments reported in the paper, and in a reanalysis of their data from the first experiment, Alegre and Gordon claim that regularly inflected forms show surface frequency effects in lexical decision if they occur above a certain threshold frequency. If that were true (and subsequent work has shown that the generalization is incorrect), it would severely undermine Pinker’s theory. We’re not just talking about peas and toes here; “high frequency” and putatively memorized inflected forms include deputies, assessing, pretending and monuments. If the Words and Rules approach were the correct explanation of the data, we’d expect monuments-destroying to be as well-formed as monument-destroying. If we are indeed memorizing these not really so frequent inflected forms as wholes, the notion of “memorization” here must be divorced from any connection to grammatical knowledge.

However, Lignos and Gorman (2012) show that Alegre and Gordon’s results and interpretation can’t be taken at face value, pointing out a number of problems in the paper, including the reliance on frequency counts inappropriate for their study. The more robust finding is that the surface frequency effect is stronger, not weaker, in the low surface frequency range for morphologically complex words. Recent work in this area paints a complex picture of the variables modulating reaction time in lexical decision, which include both some measure related to base frequency and some measure related to surface frequency, but no current research in morphologically complex word recognition supports the key predictions of the Words and Rules framework, at least as laid out by Pinker and colleagues.

Recall that if you know the grammar of a language – if you’ve “memorized” or “learned” the rules – you have, in an important sense, memorized all the words (and all the sentences) that are analyzable or generatable by the grammar, even the ones you haven’t heard or spoken yet. That is, the “memorized” grammar generates words that you have already encountered or used in the same way it generates words that you haven’t (yet) encountered or used. In other words, when you’ve “memorized” the grammar, you’ve “memorized” both sets of words. From the standpoint of contemporary research in morphological processing, this understanding of “memorization” should replace the thinking of the Words and Rules framework, which makes speakers’ prior experience with words a crucial component of their internal representation.

However, it should be noted that Pinker’s main concern in Words and Rules was on language acquisition and the generalization of “rules” to novel forms. Recent work by Charles Yang (2016), Tim O’Donnell (2015) and others recasts the Words and Rules dichotomy between memorization vs. constructed as an issue of words following unproductive rules or generalizations, for which you have to memorize for each word that the rule or generalization applies (or memorize the output without reference to the generalization) vs. words following productive rules or generalizations, for which the output is predicted. Key data for these investigations come from wug tests of rules application to novel forms. An issue to which we will return soon is how these theories of productivity of morphological rules tie into models of morphological processing in word recognition.

 

References

Alegre, M., & Gordon, P. (1999). Frequency effects and the representational status of regular inflections. Journal of memory and language40(1), 41-61.

Embick, D., & Marantz, A. (2005). Cognitive neuroscience and the English past tense: Comments on the paper by Ullman et al. Brain and Language93(2), 243-247.

Kiparsky, P. (1982). From cyclic phonology to lexical phonology. The structure of phonological representations1, 131-175.

Lignos, C., & Gorman, K. (2012). Revisiting frequency and storage in morphological processing. In Proceedings from the Annual Meeting of the Chicago Linguistic Society, 48(1), 447-461. Chicago Linguistic Society.

O’Donnell, T. J. (2015). Productivity and reuse in language: A theory of linguistic computation and storage. MIT Press.

Pinker, S. (1999/2015). Words and rules: The ingredients of language. Basic Books.

Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition28(1-2), 73-193.

Rumelhart, D. E., & McClelland, J. L. (1987). Learning the past tenses of English verbs: Implicit rules or parallel distributed processing?. In B. MacWhinney (ed.), Mechanisms of language acquisition, 195-248. Lawrence Erlbaum Associates, Inc.

Sahin, N. T., Pinker, S., Cash, S. S., Schomer, D., & Halgren, E. (2009). Sequential processing of lexical, grammatical, and phonological information within Broca’s area. Science326(5951), 445-449.

Ullman, M. T., Pancheva, R., Love, T., Yee, E., Swinney, D., & Hickok, G. (2005). Neural correlates of lexicon and grammar: Evidence from the production, reading, and judgment of inflection in aphasia. Brain and Language93(2), 185-238.

Yang, C. (2016). The price of linguistic productivity: How children learn to break the rules of language. MIT Press.

What Marantz 1981 (probably) got wrong

A popular account of what Grimshaw called “complex event nominalizations” (cf. John’s frequent destruction of my ego) involves postulating that these nominalizations involve a nominalizing head taking a VP complement. When the head V of the VP moves to merge with the nominalizing head, the resulting structure has the internal syntactic structure of an NP, not a VP. For example, there’s no accusative case assignment to a direct object, and certain VP-only complements like double object constructions (give John a book) and small clauses (consider John intelligent) are prohibited (*the gift of John of a book, *the consideration of John intelligent).

Note that this analysis relies on the assumption that head movement (of V to N) has an impact on syntax. Before head movement applies, the verb phrase has verb phrase syntax, with the possibility of accusative case, small clauses and double object complements. After head movement applies, there is no VP syntax and the internal structure of the NP is that of any NP.

Within the development of Distributed Morphology, these consequences of head movement fit within the general approach of Marantz (1981, 1984) in which the operation of “morphological merger” (later equated with head movement and adjunction) causes structure collapsing. That is, when the verb merges with the nominal head, the VP effectively disappears (in Baker’s 1985 version, the structure doesn’t disappear but rather becomes “transparent”).

In a recent book manuscript (read this book!), Jim Wood (2020) argues that the VP account is not appropriate for Icelandic complex event nominalizations, and probably not right for English either. Among the pieces of evidence that Wood brings to the argument, perhaps the most striking is the observation that verbs in these nominalizations do not assign the idiosyncratic “quirky” cases to their objects that they do in VPs. If the VP analysis of complex event nominalizations is indeed wrong, then one might conclude that morphological merger-driven clause collapsing is simply not part of syntactic theory. It’s worth asking, however, what the motivation was for these consequences of morphological merger (or, head movement and adjunction) in the first place, and where we stand today with respect to the initial motivation for these mechanisms.

Allow me a bit of autobiography and let’s take a trip down memory lane to the fall of 1976. That fall I’m a visiting student at MIT, and I sit in on David Perlmutter’s seminar on Relational Grammar (RG). I meet Alice Harris and Georgian, my notebook fills with stratal diagrams, and I’m introduced to the even then rather vast set of RG analyses of causative constructions and of advancements to 2 (which include the “applicative” constructions of Bantu). Mind blowing stuff.  One aspect of RG that particularly stuck in my mind and that I would return to later was the role of morphology like causative and applicative affixes in the grammar. In RG, morphemes were reflective rather than causal; they “flagged” structures. So an affix on a verb was a signal of a certain syntactic structure rather than a morpheme that created or forced the structure.

In an important sense my dissertation involved the importing of major insights of RG into a more mainstream grammatical theory. (In linguist years, the fall of 1976 and the summer of 1981, when I filed my dissertation, are not that far apart.) Consider the RG analysis of causative constructions involving “causative clause union.” In this analysis, a bi-clausal structure, with “cause” as the head (the predicate, P) of the upper clause, becomes mono-clausal. Since the upper clause has a subject (a 1) and the lower clause has a subject (another 1), and there can be only one 1 per clause (the Stratal Uniqueness Law), something has to give when the clauses collapse. In the very general case, if the lower clause is intransitive, the lower subject becomes an object (a 2), now the highest available relation in the collapsed clause.


Stratal diagram for Korean causative of intransitive ‘Teacher made me return’ (Gerdts 1990: 206)

If the lower clause is transitive, its object (a 2) becomes the object of the collapsed clause, and the lower subject becomes an indirect object (a 3), the highest relation available.


Stratal diagram for Korean causative of transitive ‘John made me eat the rice cake’ (Gerdts 1990: 206)

In languages with double object constructions like those of the Bantu family, after clause union with a lower transitive clause, the lower subject, now a 3, “advances” to 2, putting the lower object “en chômage” and creating a syntax that looks grammatically like that of John gave Mary a book in English, which also involves 3 to 2 advancement.


Stratal diagram for Korean ditransitive ‘I taught the students English’ (Gerdts 1990: 210)

Within the Marantz (1981, 1984) framework, applicative constructions involve a PP complement to a verb, with the applicative morpheme as the head P of the PP. Morphological merger of the P head with the verb collapses the VP and PP together and puts the object of the P, the “applied object,” in a position to be the direct object of the derived applicative verb.

My general take in 1981 was that affixation (e.g., of a causative suffix to a verb) was itself responsible for the type of clause collapsing one sees in causative clause union. The lower verb, in a sentence that is the complement to the higher causative verb, would “merge” with the causative verb, with the automatic consequence of clause collapsing. I argued that a general calculus determined what grammatical roles the constituents of the lower clause would bear after collapsing, as in RG. There are many interesting details swirling around this analysis, and I proposed a particular account of the distinction between Turkish-type languages, in which the “causee” in a causative construction built on a transitive verb is oblique (dative, usually), and Bantu-type languages, in which this causee is a direct object in a double object construction.  Read the book (particularly those of you habituated to citing it without reading – you know who you are).

At this point of time, nearly 40 years later, the analysis seems likely wrong-headed. Already by the late 1980’s, inspired by a deep dive into Alice Harris’s dissertation-turned-book on Georgian (1976, 1981), I had concluded that my 1981 analysis of causatives and applicatives was on the wrong track. Instead of bi-clausal (or bi-domain in the case of applicatives) structures collapsing as the result of morphological merger, a more explanatory account could be formulated if the causative and applicative heads were, in effect, heads on the extended projection of the lower verb. Affixation/merger of the verb with these causative and applicative heads would have no effect on the grammatical relations held by the different nominal arguments in these constructions. This general approach was developed by a number of linguists in subsequent decades, notably Pylkkänen (2002, 2008), Wood & Marantz (2017) and, for the latest and bestest, Nie (2020), which I’ll discuss in a later post.

The crucial point here is that the type of theory that underlies the N + VP analysis of complex event nominalizations has lost its raison d’être, thereby leaving the analysis orphaned. If morphological merger has no effect on the syntax, at least in terms of the collapsing of domains, then a nominalization formed by an N head and a VP complement could easily have the internal VP syntax of a VP under Tense. This does not describe complex event nominalizations, which are purely nominal in structure, but the discussion so far does not rule out a possible class of nominalizations that would show a VP syntax internally and NP syntax externally. As we discussed in an earlier post, English gerunds are not examples of such a construction, since they are not nominal in any respect (see Reuland 1983 and Kiparsky 2017). However, maybe such constructions do exist. If they don’t, it would important to understand if something in the general theory rules them out. We will return to this issue in a subsequent post.

 

References

Baker, M.C. (1985). Incorporation, a theory of grammatical function changing. MIT: PhD dissertation.

Gerdts, D.B. (1990). Revaluation and Inheritance in Korean Causative Union, in B. Joseph and P. Postal (eds.), Studies in Relational Grammar 3, 203-246. Chicago: University of Chicago Press.

Harris, A.C. (1976). Grammatical relations in Modern Georgian. Harvard: PhD dissertation.

Harris, A.C. (1981). Georgian syntax: A study in Relational Grammar. Cambridge: CUP.

Kiparsky, P. (2017). Nominal verbs and transitive nouns: Vindicating lexicalism. In C. Bowern, L. Horn & R. Zanuttini (eds.), On looking into words (and beyond), 311-346. Berlin: Language Science Press.

Marantz, A. (1981). On the nature of grammatical relations. MIT: PhD dissertation.

Marantz, A. (1984). On the nature of grammatical relations. Cambridge, MA: MIT Press.

Nie, Y. (2020). Licensing arguments. NYU: PhD dissertation. https://ling.auf.net/lingbuzz/005283

Pylkkänen, L. (2002). Introducing arguments. MIT: PhD dissertation.

Pylkkänen, L. (2008). Introducing arguments. Cambridge, MA: MIT Press.

Reuland, E.J. (1983). Governing –ingLinguistic Inquiry 14(1): 101-136.

Wood, J., & Marantz, A. (2017). The interpretation of external arguments. In D’Alessandro, R., Franco, I., & Gallego, Á.J. (eds.), The verbal domain, 255-278. Oxford: OUP.

Wood, J. (2020). Icelandic nominalizations and allosemy. Yale University: ms. https://ling.auf.net/lingbuzz/005004

 

Phrase structures rules within words, Part 2

In the last post, we explored the use of phrase structure rules in accounting for the internal structure of words and concluded, as we did for phrase structure rules and sentences, that phrase structure rules are not part of the explanatory arsenal of current linguistic theory. The word structures described by phrase structure rules are explained by independent principles. In particular, the “label” of a morphologically complex word or word-internal constituent is a function of the labels of its daughter constituents and general principles, including whatever (non-phrase structural) principles are implicated in explaining “extended projections” of lexical categories.

However, it may turn out to be the case that phrase structure rules can serve to explain how morphologically complex words are recognized in language processing. This post will explore some possibilities for the use of (word-internal) phrases structure rules in word recognition and highlight the issues involved.

To begin, let’s look at some possible word structures, given our previous discussions of extended projections, c-selection, s-selection, feature-selection, and the possible different types of what have traditionally been called derivational morphemes. First, consider affixes like –ity, which attach to stems of a certain semantic sort (-ity s-selects for a property) and also feature-select for the identity of the head of the stems. For –ity, the set of heads that it feature-selects for include stems like sane and suffixes like –able and –al.  The structure of sanity and treatability might look as below:

     

Rather than place –ity in these trees as a lone daughter to the N node (and –able as the lone daughter of Adj) or give –ityan “N” feature, we show –ity adjoined to N. This would be consistent with the analyses of category heads in Distributed Morphology, with little n replacing N in the trees, and –ity considered a root adjoined to n. This discussion will assume that the details here don’t matter (though they probably will turn out to).

In considering probabilistic context-free phrase structure rules as part of a model of word recognition, the relevant parts of the trees above are at the top. We can ask, for treatability, whether the frequency of all nouns derived from adjectives, independent of any of the specific morphemes in the word, matter for the recognition of the word. In phrase structure rule terms, this would be the frequency of the rule N → Adj + N. For sanity, there are at least a couple different ways to think of its structure in phrase structure terms. If sane is really a root, rather than an adjective, then it’s not clear that the top phrase structure of sanity is any different from that of cat, consisting of a root adjoined to a category head.

However, one could also ask whether the probability of a derived noun (involving a root like –ity as well as a categorizing head, sketched below) as opposed to a non-derived noun (just the stem and the category suffix, with no additional root) could make a difference in processing.

     

Some ways, then, in which probabilistic context-free phrase structure rules could be shown to make a difference in word recognition is if processing is affected by:

  • frequency of categories (nouns vs. verbs vs. adjectives)
  • frequency of derivational frames (when one category is derived from another category)
  • frequency difference between non-derived categories (involving only a root and a category affix) and derived categories (involving at least an additional root tied to the category affix)

In visual lexical decision experiments, we know that by the time a participant presses a button to indicate that yes, they saw a real word as opposed to a non-word, the lexical category of the word makes a difference for reaction time above the usual frequency and form variables. In fact, as shown in Sharpe & Marantz (2017), reaction time in lexical decision can be modulated by connections between phonological/orthographic form of a word and how often that word is used as a noun or a verb. What we don’t yet know is if lexical category (by itself) or the sort of variable investigated in Sharpe & Marantz can modulate the “M170” – the response measured at 170ms after visual onset of a word stimuli in the visual word form area (VWFA) associated with morphological processing. Similarly, if we find that reaction time in lexical decision is modulated by the frequency of nouns formed from adjectives, we would still not know whether this variable is implicated specifically in morphological processing or in some later stage of word recognition within the lexical decision experimental paradigm.

However, we do know that certain probabilistic variables that don’t seem to implicate phrase structure rules do modulate visual word recognition at the M170. These include “transition probability,” which for the experiments in question was computed as the ratio of the frequency of a given stem + affix combination to the frequency of the stem in all its uses. So the transition probability from sane to –ity is computed as the ratio of the frequency of sanity to the stem frequency of sane (sane in all its derived and inflected forms). But we should investigate whether transition probability works to explain variance in the M170 because it represents something intrinsic to the storage of knowledge about words, or whether it could correlate with a set of variables related to phrase structure.

Compounds represent another class of morphologically complex words for which probabilistic phrase structure rules might be appropriate. Compound structures are subject to a great deal of cross-linguistic variation, and in work from the 1980’s, Lieber and others suggested that the phrase structure rules of a language might describe the types of compounds available in the language. So in English, rules like N → {Adj, N} + N might describe nominal compounds (green house, book store), while the lack of a compound rule V → X + V might account for the lack of productive verbal compounding. It’s not clear that the category of the nonhead constituent in an English compound is categorially constrained (keep off the grass sign, eating place), and in any case the syntactic structure of compounds is probably more complicated than it seems on the surface. Nevertheless, experiments should check whether, say, the general frequency of compounds consisting of noun + noun (yielding a noun) modulates morphological processing independently of the specific nouns involved.

Patterning perhaps with compounds are structures with affixes like –ish. Derivation with –ish is productive, which might seem to put –ish with comparative –er in the extended projection of adjectives (smaller, smallish). However, –ish, like compound heads, is not really restrictive as to the category of its stem (Google-ish, up-ish), and of course also has a use as an independent word (Dating one’s ex’s roommate is so ish, according to Urban Dictionary).

In short, it’s an open and interesting question what the relevant probabilistic structural information is for processing compounds and –ish-type derivatives, but we don’t yet know how general phrase structure knowledge might be relevant.

Finally, let’s return to inflection and the derivation we suggested might appear with inflection in the extended projections of lexical categories (e.g., nominalizing –er for verbs). If we treat the category of a word along its extended projection as remaining stable (e.g., Verb, for all nodes along the “spine” of the extended projection of a Verb), then the phrase structure rules for morphemes along extended projections would look something like: Verb → Verb + Tense. Note again that neither phrase structure rules nor standard selectional features are good tools for deriving the (relatively) fixed sequence of functional heads in an extended projection. But we could ask whether encoding knowledge of extended projections in phrase structure rules like Verb → Verb + Tense could aid in explaining morphological processing in some way. That is, could the processing of a tensed verb depend on the frequency of tensed verbs in the language, independently of any knowledge of the particular verb and tense at hand?

Other than phrase structure-type probabilities, what other probabilistic information about extended projections might modulate the processing of an inflected word independently of the specific morphemes in the word? In an interesting series of papers, Harald Baayen and colleagues have suggested that processing might be modulated by variables associated with probability distributions over paradigms (see, e.g., Milin et al. 2009a,b). In addition to exploring the effects on processing of what we have called transition probability (the probability of the inflected word given the stem, in one direction, or the probability of the inflected word given the affix, in the other), they propose that processing is also affected by the relative frequency of the various inflected forms of a word, computed as “paradigm entropy.” Transition probabilities and paradigm entropy are both variables associated with particular stems. Interestingly, they also employ variables involving probabilities from the language beyond the statistics of particular stems. Milin et al. (2009a) suggest that the relative entropy of the paradigm of a stem also modulates processing. Relative entropy involves a comparison of the paradigm entropy of the stem of a word with the average entropy of all the stems in the same inflectional class. The idea is information theoretic: how much additional information do you gain from identifying a specific stem (with its own paradigm entropy) once you know to which inflectional class the stem belongs? Figure 1 below from Milin et al. (2009a) shows the paradigm entropies of three Serbian nouns (knjiga, snaga, pucina) and the frequencies of the inflectional class (feminine a-class) to which they belong.

Relative entropy is a variable like the one explored in Sharpe & Marantz, which involved a comparison of the relationship between the form of a word and its usage as a noun vs. a verb with the average relationship between forms and usage across the language. What’s particularly interesting in the present context about the sort of paradigm variables identified by Milin et al. becomes clear if we recall the connection between paradigms and extended projections, and the identity between extended projections in the context of inflected words and extended projections in the context of sentences. As I suggested before, sentences in an important sense belong to verbal paradigms, which in English consist of a verb and the set of modals and auxiliaries associated with the “spine” of functional elements as summarized in Chomsky’s Tense-Modal-have-be-be-Verb sequence. If Milin et al. are on the right track, the relative entropy of these extended verbal “paradigms” should also be considered as a variable in sentence processing.

 

References

Milin, P., Filipović Đurđević, D., & Moscoso del Prado Martín, F. (2009a). The Simultaneous effects of inflectional paradigms and classes on lexical recognition: Evidence from Serbian. Journal of Memory of Language 60(1): 50-64.

Milin, P., Kuperman, V., Kostic, A., & Baayen, R.H. (2009b). Paradigms bit by bit: An information theoretic approach to the processing of paradigmatic structure in inflection and derivation. In Blevins, J.P. & J. Blevins, J. (eds.), Analogy in grammar: Form and acquisition, 214-252. Oxford: OUP.

Sharpe, V., & Marantz, A. (2017). Revisiting form typicality of nouns and verbs: a usage-based approach. The Mental Lexicon 12(2): 159-180.