Page 5 of 11

Back to visual word recognition

In the post on phoneme surprisal, we asked whether the processing of morphemes within words, as measured by phoneme surprisal (the neural response to a cohort-based phoneme surprisal variable), might be less sensitive to contextual variables than we might have assumed. For example, the recognition of a suffix might involve surprisal computed not over a cohort of morphemes that are compatible with the stem (with candidate frequency within the cohort itself modulated by the transition probability from the stem to the candidate suffix), but over a cohort of morphemes compatible simply with the first phoneme of the suffix independent of morphological context.

Similar considerations might apply for visual word recognition and the M170 response from the Visual Word Form Area (VWFA). Instead of trying to explain the M170 using a single surprisal value computed from the likelihood of the word given the grammar, we should explore the possibility that the M170 is modulated by a number of variables that aren’t fully captured by a single surprisal measure.

The landmark paper exploring morphological variables modulating the M170 is Solomyak and Marantz (2010), which itself built on the work of Zweig and Pylkkänen (2009), which demonstrated that morphologically complex words yielded larger M170 responses than matched morphologically simple words. Solomyak and Marantz explored M170 reactions to a diverse set of derived English words, with a variety of suffixes. We included both free stem words (farmer, where farm can occur by itself) and bound stem words (tolerable, where toler– occurs also in e.g. tolerate, but not without a suffix). We also included “unique stem” words, like amity, whose stem arguably occurs only in the presented word. This last group did not yield absolutely clear results, but a return to this type of word in Gwilliams and Marantz (2018) provided evidence that they, too, are decomposed at the M170 response.

The oft-cited result from Solomyak and Marantz is that, for free stem and bound stem words, the M170 is modulated by the transition probability from stem to suffix but not by the surface frequency of the word, once the variance associated with transition probability is removed. What is not often remembered is that other variables did in fact modulate the M170 independent of transition probability, specifically stem and affix frequencies. That is, in addition to the contextual variable transition probability, a-contextual unigram stem and affix frequencies also showed significant effects at the M170 response. These results are variously replicated in subsequent studies.

Consider the possibility that in the general M170 brain region and time interval, multiple connected processes are performed. The look-up of the representations of morphological forms is pursued along with the evaluation of the syntactic structure connecting multiple morphemes, if more than one morpheme is recovered from the input. Here, as suggested for auditory processing, the recognition of individual morphemes based on the visual input might be governed by contextual variables as well as a-contextual variables. If this suggestion is correct, we should be able to distinguish a number of sub-responses to different stimulus variables in perhaps different sub-regions of the VWFA and neighboring cortex and at different time points within the general M170 interval.

Another possibility exists, where a single measure of visual word surprisal, computed by us from the variables discussed and perhaps other variables, can account for all the relevant variation in the M170 response, without decomposition of this response. However, if our cognitive understanding of the various variables involved (e.g., stem frequency, affix frequency, transition probability) implicate different processes in a cognitive model, then even if we’re successful in accounting for the M170 response in terms of this single, composite variable, we can’t be said to have explained the M170 – we won’t understand why this variable works. Rather, we would need to re-think our cognitive theory of complex word recognition to make sense of why that single variable would be key.

 

References

Gwilliams, L., & Marantz, A. (2018). Morphological representations are extrapolated from morpho-syntactic rules. Neuropsychologia114, 77-87.

Solomyak, O., & Marantz, A. (2010). Evidence for early morphological decomposition in visual word recognition. Journal of Cognitive Neuroscience22(9), 2042-2057.

Zweig, E., & Pylkkänen, L. (2009). A visual M170 effect of morphological complexity. Language and Cognitive Processes24(3), 412-439.

Phoneme surprisal

What stimulus properties are responsible for driving neural activity during speech comprehension? For quite some time, we’ve known that word frequency modulates brain responses like the ERP N400; higher frequency words elicit smaller amplitude responses. In addition, listeners’ expectancies for particular words, as quantified in Cloze probability ratings, also modulate reaction time and the N400; higher Cloze probabilities elicit faster RTs and smaller amplitude responses.

For variables that modulate brain responses to speech sounds, phoneme surprisal seems to robustly correlate with superior temporal activity around 100 to 140ms after the onset of a phoneme (Brodbeck et al. 2018), although we’ll show below that phoneme surprisal effects are not limited to this time window. As a variable, phoneme surprisal is a product of information theory – how much information a phoneme carries in context, where the usual context to consider is the “prefix” or list of phonemes before the target phoneme starting from the beginning of a word. As we have seen in previous posts, phoneme surprisal is conventionally measured in terms of the probability distribution over the “cohort” of words that are consistent with this “prefix.”

In considering the literature on phoneme surprisal, and in planning future experiments, we should distinguish between “phoneme surprisal” as a variable (PSV) and “phoneme surprisal” as a particular neural response (or multiple such responses) modulated by the phoneme surprisal variable (PSN). We should also be clear about the difference between using surprisal as variable without an account of why linguistic processing should be sensitive to this variable at a particular point of time and in a particular brain region, and using surprisal as a variable in connection with a neurobiological theory of speech processing, as say in the “predictive coding” framework (see e.g., Gagnepain et al. 2012).

On the first distinction, at the NYU MorphLab we have published two studies using the PSV that have discovered different – at least in time – neural responses sensitive to the variable. In Gwilliams and Marantz (2015), a study on Arabic root processing, we found that PSV computed over roots yielded a PSN on the third root consonant in the 100-140ms post phoneme onset time period in the superior temporal gyrus (see Figure 3 below). PSV measured at this third root consonant for the whole word, by contrast, did not yield a PSN in the same time frame. (The graph of PSV effects shows additional PSN’s after 200ms and after 300ms, which we will set aside.)  

In Gaston & Marantz (2018), we examined the effect of prior context on the PSN of English words like clash that can be used either as nouns or as verbs. We computed PSV in various ways when these words were placed after to to force the verb use and after the to force the noun use.  For example, we considered measuring PSV after the by removing from the cohort all the words with only verb uses but leaving the full frequency of the target word as both noun and verb in the probability distribution of the remaining cohort vs. also reducing the frequency of the target in the cohort by removing its verb uses as well. We found a set of complicated PSN responses sensitive to the contextual manipulation of PSV (see Figure 3 below). However, the PSN was not in the same time range as the (first) PSN from the Gwilliams and Marantz study but instead came 100ms later.

For these studies from the MorphLab, the fact that the PSV yielded distinct PSN’s was not crucial to the interpretation of the results. Of interest in the Arabic study was whether there was evidence for processing the root of the word independent of the whole word, despite the fact that the root and “pattern” morphology that make up the word are temporally intertwined. For the study on the effects of context, we were interested in whether the preceding context would modulate cohort frequency effects, and if so, which measure of cohort refinement provided the best modulator of brain responses in the superior temporal lobe. Our conclusions were connected to our hypotheses about the relationship between PSV and cognitive models of representation and processing, not to prior assumptions about PSN.

That being said, ultimately our goal is to understand the neurobiology of speech perception – the way that the cognitive models are implemented in the neural hardware. For this goal, we should be seeking a consistent PSN (or multiple consistent PSN’s) and develop an interpretation of this PSN within a neurologically grounded processing theory. For this goal, the literature provides some promising results. In a study examining subjects’ responses to listening to natural speech (well, audiobooks), Brodbeck et al. (2018) identify a strong PSN in the same time range and same neural neighborhood as the PSN in Gwilliams and Marantz’s Arabic study (see Figure 2 below). Brodbeck et al. did not decompose the words in their study, so PSV was computed based solely on whole word cohorts, and function and content words weren’t distinguished. While context, including word-internal morphological context, may have modulated the effects of PSV on the PSN, this simple whole word PSV measure nevertheless remained robust and stronger than other variables they entertained as potentially modulating the brain response. Laura Gwilliams’ ongoing work in the MorphLab has found a similar latency for a PSN from naturalistic speech, using different analysis techniques (and a different data set) from Brodbeck et al.

The timing and location of Brodbeck’s PSN response is broadly compatible with the timing and location of responses associated with the initial identification of phonemes, as measured, e.g., by ECoG recordings in Mesgarani et al. (2014) and subsequent publications (see Figure 1 below).  This invites an interpretation of the PSN as a measure of the predictability of a phoneme being identified, rather than in terms of the information content of the phoneme. Such an analysis is part of the “predictive coding” framework as described, e.g., in Gagnepain et al. (2012). In this framework, a response that could be Brodbeck’s PSN in time and space is construed as an error signal proportional to the discrepancy between the predicted phoneme and the incoming phonetic information. It will be of great interest to tease apart predictions of a processing model based on predictive coding vs. one based on information theory. We note here the prediction made by Gagnepain et al. (2012) that we should not see, in addition to a PSV-related neural response, a response that is modulated by cohort entropy. However, Brodbeck et al. observe a robust entropy response that was close to the PSN both temporally and areally but nonetheless statistically independent (see their Figure 2 above).

Returning to the topic of PSV responses in morphologically complex words, we see that it’s important to understand whether PSV responses are uniquely associated with a PSV that is computed using considerations like transition probability (the likelihood of an affix given the stem) and other factors that fix the probability of the affix before assessing the likelihood of the phonemes in the affix. One could imagine instead that the PSVs that matter most for the Brodbeck/Gwilliams early PSN is computed over cohorts of morphemes, without modulation associated with the contextual statistics of the morphemes. Functional morphemes (prepositions, determiners, complementizers) are highly predicted in syntactic context, but the PSV relevant to the Brodbeck/Gwilliams PSN might ignore syntactic prediction in assigning probability weights to the morphemes in the relevant cohort for the PSV. Consider that the context-modulated PSN we observed in the Gaston & Marantz paper was not this early PSN, but a significantly later response (with respect to phoneme onset), and that the Brodbeck et al. study apparently included functional morphemes without contextualizing their PSV to predicted contextual frequencies of these morphemes. A contextually unmodulated PSVwould not strictly speaking be an information theoretic PSV, since a contextually predicted phoneme is simply not as informative as a contextually unpredicted one, and this PSV would thus overestimate the information content of contextually predicted phonemes (say, phonemes in a suffix after a stem that highly predicts the suffix). Still, the field awaits a set of plausible processing theories that make sense of the importance of the non-contextual PSV and make further predictions for MEG experiments (that we can run).

 

References

Brodbeck, C., Hong, L. E., & Simon, J. Z. (2018). Rapid transformation from auditory to linguistic representations of continuous speech. Current Biology28(24), 3976-3983.

Gagnepain, P., Henson, R. N., & Davis, M. H. (2012). Temporal predictive codes for spoken words in auditory cortex. Current Biology, 22(7), 615-621.

Gaston, P., & Marantz, A. (2018). The time course of contextual cohort effects in auditory processing of category-ambiguous words: MEG evidence for a single “clash” as noun or verb. Language, Cognition and Neuroscience33(4), 402-423.

Gwilliams, L., & Marantz, A. (2015). Non-linear processing of a linear speech stream: The influence of morphological structure on the recognition of spoken Arabic words. Brain and language147, 1-13.

Mesgarani, N., Cheung, C., Johnson, K., & Chang, E. F. (2014). Phonetic feature encoding in human superior temporal gyrus. Science343(6174), 1006-1010.

 

Probability distributions over infinite lists?

Recall that a grammar provides a representation of the words and sentences of a language. For standard generative grammars, the grammar is a finite set of rules that describes or enumerates an infinite list (of words and sentences). In a previous post, we conceptualized word and sentence recognition as a process of determining from auditory or orthographic input which memb er of the infinite list of words or sentences we’re hearing or reading.  Just as in “cohort” models of auditory word recognition, one could imagine for sentence recognition a cohort of all the sentences compatible with the auditory input at each point in a sentence, and a probability distribution over these sentences. Each subsequent phoneme narrows down the cohort and changes the probability distribution of the remaining candidates in the cohort.

The cat …       
/s/ /sɪ/ /sɪt/ /sɪts/
simpers simpers sits sits
sits sits sitter  
sitter sitter  
sniffed    
     

From one point of view, syntactic theory provides an account of the nature of the infinity that explains the “creativity” of language – that we understand and produce sentences that we haven’t heard or said before. The working linguist is mostly less concerned with the mathematical nature of the infinity (the subject matter to which the “Chomsky hierarchy” of grammars is related) and more concerned with what the human solution to infinity we describe in Universal Grammar tells us about the nature of language and matters like locality restrictions on long-distance dependencies. In my post on phrase structure rules and extended projections, I emphasized aspects of grammar that are finite. The core of a sentence, for example, could be viewed as the extended projection of a verb, and thus the set of sentences described/generated by the grammar could be a finite paradigm of the possibilities allowed in an extended projection.

But of course, while the set of inflected forms of a verb may be finite (as it is in English, for example), the set of extended projections of a verb are obviously not. While there may be a diverse set of “structure building” possibilities to consider here for multiplying the extended projections of verbs, the two most central considerations for infinity are usually called “arguments” and “adjuncts.” Argument noun phrases, or technically Determiner Phrases (DPs) in most contemporary theories, may occur as subjects and complements (e.g., objects) of extended projections. DPs (in at least most languages, putting aside the status of Everett’s (2005, et seq.) claims about Pirahã) may contain DPs, which may contain DPs, leading to infinity (the story about children in a country with an army with an unusual set of vehicles…). For adjuncts, consider at least the possibility of repeated prepositional phrase modification of verb phrases as a form of infinity (She played the piano on Thursday in Boston on a Steinway with a large audience in 80 degree heat…).

As noted in a previous post, the linguistically interesting account of these types of infinities involve recursion. DPs may occur in various positions, including within other DPs, but the structure of the DP is the same wherever they occur. That is, a DP within a DP has the same structure and follows the same rules as the containing DP.

Now it’s not entirely true that the position of a phrase doesn’t determine any aspects of its internal structure. For example, the number of a noun phrase (e.g., singular or plural), which is related to the number of its head noun, determines whether it may appear as the subject of __ is a good thing (Soup is a good thing, yes; *Beans is a good thing, no). So appearing as the subject of an agreeing verb determines aspects of the internal structure of subjects in English. And the verb phrase put on the table is grammatical in the context of What did you put on the table, but not in the context of *We put on the table. So being in the context of e.g., a wh-question determines aspects of the internal syntax of the VP.

Chomsky’s (1995) Minimalist Program offers one theory of the limits on this contextual determination of the internal structure of phrases. In this theory, a set of important constituents, such as DPs and verb phrases (the part of the extended projection of a verb that includes the subject DP), are “phases.” 


Phase diagram (Citko 2014: 32)

A phase presents to the outside world only its “label” and constituents at its “edge” (at the top of the phase, α above). The label of a phase is a finite set of features, including those like number which would be relevant for subject-verb agreement. The edge of the phase would include what in what (you) put on the table, which is associated with the object position between put and on the table. So the verb phrase put on the table is only grammatical when the object of put appears at the edge of the verb phrase, and the appearance of the object at the edge will insure that the verb phrase is embedded within a larger structure that allows the object to appear in an appropriate position (What did you put on the table?). 


Phase diagram for verb phrase (Citko 2014: 32)

The Minimalist Program allows for some information from inside a phase to matter for the grammatical embedding of a phase in a larger structure, but it does not allow the larger structure to mess with the internal structure of the phase beyond “selecting” for features of the label and features of the edge. And phases within phases can only help determine the grammatical position of the containing phase if they contribute features to its label or constituents to its edge. 

Adopting the phase approach as a means to constrain infinity yields grammars that are not problematic to use in “parsing” (assigning a grammatical structure to) sentences as we hear them phoneme by phoneme or word by word (see e.g., Stabler (2013) and Fong (2014) for examples of “Minimalist” parsers). However, even phase-based infinity causes important difficulties for assigning a probability distribution over the members of a candidate set of sentences to compare to the linguistic input one hears or sees. How much probability should we assign to each of the infinite number of possible DPs as the subject of a sentence, for example, where the infinity is generated by DPs within DPs?

Even without these issues with infinity, the locality of syntactic dependencies, as captured by phase theory, itself puts pressure on any simple cohort style theory of sentence (or word) recognition. Since no information within a phase apart from its label and the constituents at its edge can interact with the syntactic structure above the phase, it’s not clear whether shifting probabilities for the internal structure of the phase should affect the probabilities for the containing structure as well. That is, once one has established the label and edge features of a subject DP, for example, the probability distribution over the cohort of compatible extended projections of the verb for which the DP is subject may be fixed, independent of further elaborations of the subject DP, including possible PP continuations as in the story about children in a country with an army with an unusual set of vehicles…  – as far as the extended projection of the verb is concerned, we may be done computing a probability distribution after the story about children. Given the way phases fit together, this consideration about how the internal structure of a phase may affect processing of a containing phase covers one issue with infinity as well.

Note that cohort-style analyses of information-theoretic variables like phoneme surprisal always assume that the computation of cohorts can be reasonably accomplished while ignoring some possible context effects. The cohort is a set of units, perhaps morphemes or words. In any situation of language processing, there are an infinite set of contexts to consider that might affect the probability distribution over the members of the cohort, including larger words, phrases, sentences, and discourse contexts. Any experimental investigation of phoneme surprisal based on probability distributions over units must assume that these computations of cohorts and probability are meaningful even without computing the influence of some or all of these possible contextual influences.

Our MorphLab has some data, and an experiment in progress, that are relevant to this discussion.  In Fruchter et al. (2015), subjects were visually presented with two-word modifier-noun phrases, one word at a time. For phrases where the second word is highly predicted by the first, like stainless steel, we found evidence that subjects retrieve the representation of the second word before they see it. This response was modulated by the strength of the prediction, but also, surprisingly, by the unigram frequency of the word. That is, even when a word is being retrieved solely on the basis of prediction, the relative frequency of that word compared to other words in the language, independent of context, modulates processing. This suggests the possibility that cohort-related phoneme surprisal responses might be driven at least in part by probability distributions over morphemes that are context-independent. Partly to test this possibility, Samantha Wray in the NeLLab is analyzing data from an experiment in which Tagalog speakers listened to a variety of Tagalog words, including compounds and reduplicated forms (involving full reduplication of bisyllabic stems). If frequency-based, but context-free, cohorts of morphemes are always relevant for phoneme surprisal, then phoneme surprisal phoneme by phoneme in the first and second copies in a reduplicated stem should be similar. By contrast, if one considers the prediction of the second copy in the reduplicated form from the first, the contextual phoneme surprisal in the second part of the reduplicated word should look nothing like the phoneme surprisal for the same phonemes in the first part of the word. So far, context-free phoneme surprisal in the second copy seems to be winning, although there are numerous complications.

Returning to the sentence level, the phase may provide a relevant context-free “cohort” domain for assigning probability distributions to an infinite set of syntactic structures. Without abandoning the idea that syntactic processing involves consideration of whole sentence syntactic structures, we can reduce our cohorts of potential structures to finite sets if we shield phase-internal processing from the processing of the larger structures containing the phase. When we’re processing a structure containing an embedded phase, we consider only the finite set of possibilities for the label and edge properties of this phase. Once we’re processing inside the phase, we define our cohort of possible structures using only external contextual information that fix the probability distribution over the phase’s label and its edge properties.

Applying this approach to morphological processing within words involves identifying what the relevant phases might be. Even within a phase, we need to consider the various ways in which a contextually-determined probability distribution over small (say, morpheme-sized) units might be affected by context. Much more on these topics in upcoming posts.

 

References

Citko, B. (2014). Phase theory: An introduction. Cambridge: CUP.

Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press.

Everett, D. L. (2005). Cultural constraints on grammar and cognition in Pirahã: Another look at the design features of human language. Current anthropology, 46(4), 621-646.

Fong, S. (2014). Unification and efficient computation in the Minimalist Program. In Lowenthal, F., & Lefebre, L. (eds.), Language and recursion, 129-138. New York: Springer.

Fruchter, J., Linzen, T., Westerlund, M., & Marantz, A. (2015). Lexical preactivation in basic linguistic phrases. Journal of cognitive neuroscience, 27(10), 1912-1935.

Stabler, E. P. (2013). Two models of minimalist, incremental syntactic analysis. Topics in cognitive science, 5(3), 611-633.

« Older posts Newer posts »

© 2024 NYU MorphLab

Theme by Anders NorenUp ↑