In retrospect, we can identify the beginnings of contemporary neurolinguistics with Neville et al 1991 (Neville, H., Nicol, J. L., Barss, A., Forster, K. I., & Garrett, M. F. (1991). Syntactically based sentence processing classes: Evidence from event-related brain potentials. Journal of cognitive Neuroscience, 3(2), 151-165.) At the time, a dominant approach to language processing envisaged a system whereby people would predict the next word in a sentence by applying their statistical knowledge of word n-grams, based on their experience with language. Grammatical knowledge, as studied by linguists, was somehow an emergent property of string learning and not part of the cognitive model of linguistic performance. On this view, every ungrammatical string involved a point in which one’s string knowledge predicted zero likelihood of the encountered word, and ungrammaticality was on a continuum with unacceptability and very low Cloze probability. What the Neville et al paper demonstrated was that different types of ungrammaticality had different neural “signatures,” and these differed from that of an unexpected word (with low Cloze probability). One can have many quibbles with this paper. The generalization from the specific sentence types studied to classes of syntactic violations (as in the title), for example, is suspect. But the paper launched a ton of work examining the connection between detailed aspects of sentence structure and neural responses as measured by EEG. In recent years, there has been a move away from violation studies to work varying grammatical sentences along different dimensions, but experiments have found a consistent correlation between interesting linguistic structure and brain responses.

So it was a bit of a surprise to hear Ev Fedorenko at the just finished Neurobiology of Language meeting in Helsinki claim that the “language network” in the brain wasn’t particularly interested in the sort of details that the electrophysiologists (ERP, MEG researchers) among us have been studying. In particular, she was explicitly equating the type of computation that is involved in Cloze probability (lexical surprisal) with syntactic computations. Fedorenko’s gold standard for localizing the language network within an individual’s brain is an fMRI paradigm contrasting the brain’s response to listening to grammatical, coherent sentences and the brain’s response to listening to lists of pronounceable non-words. The activity in this network, for example, seems equally well predicted by syntactic and lexical surprisal modulations.

Given that the ERP/MEG literature details specific differences between e.g. lexical prediction and grammatical computations, if Fedorenko’s language network were in fact responsible for language processing, then perhaps the same areas of the brain are performing different tasks – i.e., separation in brain space would perhaps not be informative for the neurobiology of language. Fedorenko was asked this question after her talk, but she didn’t understand it. However, Riitta Salmelin’s talk in the same session of the conference did address Fedorenko’s position. Salmelin has been investigating whether ERP/MEG responses in particular experimental paradigms might yield different localizations for the source of language processing than fMRI activation from identical paradigms. Her work demonstrates that this is in fact the case, and she presented some ideas about why. She also remarked to me at the conference that Fedorenko’s “language network” does not include areas of interest for linguistic processing that she studies with MEG.

Of interest for our Blog notes is the nature of Fedorenko’s “syntactic surprisal” measure – the one that is supposed to correlate with activity in the same network as lexical surprisal, where lexical surprisal is computed via word 5-grams. Fedorenko’s syntactic surprisal measure comes from a delexicalized probabilistic context free grammar, i.e., from a consideration of the likelihood of a syntactic structure independent of any lexical items. We asked in a previous post whether this kind of syntactic surprisal is likely to represent speakers’ use of syntactic knowledge for parsing words, given the importance of lexical specific selection in word processing, but the same question could be asked about sentence processing. A recent paper from our lab, Sharpe, V., Reddigari, S., Pylkkänen, L., & Marantz, A. (2018). Automatic access to verb continuations on the lexical and categorical levels: evidence from MEG. Language, Cognition and Neuroscience, 34(2), 137–150, clearly separates predictions from verbs about the follow syntactic category from predictions for the following word. However, the predictions are grounded in the identity of the verb, so this is a mix of lexical and syntactic prediction (the predictor is lexicalized but the predicted category is syntactic, modulo the prediction of particular prepositions). What is clear is that syntactic surprisal as measured by a de-lexicalized probabilistic context free grammar is not the be-all and end-all of possible variables that might be used to explore the brain for areas performing syntactic computation. In particular, the status of de-lexicalized syntactic structure for syntactic parsing is up in the air. Nevertheless, in a proper multiple regression analysis of MEG brain responses to naturalistic speech, I’m willing to go out on the limb and predict that different brain regions will be sensitive to word n-gram surprisal and syntactic surprisal, as measured via a de-lexicalized probabilistic context free grammar.

A final note of clarification: Fedorenko in her talk suggested that there were linguistic theories that might predict no clear separation between, I believe, word meaning and syntax. Thus somehow e.g. Jackendoff’s Parallel Architecture and Construction Grammar would predict a lack of separation between lexical and syntactic surprisal. For both Jackendoff and Construction Grammar – and all serious current linguistic frameworks I know of – the ontology of lexical semantics and the ontology of syntactic categories are distinct. So Jackendoff has parallel syntactic and semantic structures, not no distinction between syntax and word meaning. Construction Grammar is similar in this respect. The question of whether speakers use de-lexicalized probabilistic syntactic knowledge in processing is a question for any syntactic theory, and all theories I can think of would survive a yes or no answer.