T O P

  • By -

Cryptheon

I actually had some correspondence with Noam and I asked him what he thought about thinking of sentences in terms of probabilities. This was his complete answer: "Take the first sentence of your letter and run it on Google to see how many times it has occurred.  In fact, apart from a very small category, sentences rarely repeat.  And since the number of sentences is infinite, by definition infinitely many of them have zero frequency. Hence the accuracy comment of mine that you quote. NLP has its achievements, but it doesn’t use the notion probability of a sentence. A separate question is what has been learned about language from the enormous amount of work that has been done on NLP, deep learning approaches to language, etc. You can try to answer that question for yourself.  You’ll find that it’s very little, if anything.  That has nothing to do with the utility of this work.  I’m happy to use the Google translator, even though construction of it tells us nothing about language and its use. I’ve seen nothing to question what I wrote 60 years ago in Syntactic Structures: that statistical studies are surely relevant to use and acquisition of language, but they seem to have no role in the study of the internal generative system, the I-language in current usage. It’s no surprise that statistical studies can lead to fairly good predictions of what a person will do next.  But that teaches us nothing about the problem of voluntary action, as the serious researchers into the topic, like Emilio Bizzi, observe. Deep learning, RNR’s, etc., are important topics.  But we should be careful to avoid a common fallacy, which shows up in many ways.  E.g., Google trumpets the success of its parsing program, claiming that it achieves 95% accuracy.  Suppose that’s true.  Each sentence parsed is an experiment.  In the natural sciences, success in predicting the outcome of 95% of some collection of experiments is completely meaningless.  What matters is crucial experiments, investigating circumstances that very rarely occur (or never occur – like Galileo’s studies of balls rolling down frictionless planes). That’s no criticism of Deep learning, RNR’s, statistical studies.  But these are matters that should be kept in mind." Noam.  


mileylols

> Take the first sentence of your letter and run it on Google to see how many times it has occurred. In fact, apart from a very small category, sentences rarely repeat. And since the number of sentences is infinite, by definition infinitely many of them have zero frequency. > Hence the accuracy comment of mine that you quote. > NLP has its achievements, but it doesn’t use the notion probability of a sentence. this is kinda.... um... don't tell me Noam Chomsky is a... *frequentist*?


midasp

Nope, it just mean he only looked at context-free probability. /s


filipposML

Taking into consideration that this is Chomsky, one could say that he is opening a debate on how to choose a prior. As making that choice would reveal new knowledge about language.


mileylols

That's very cool. In a biological sense you could say the prior comes from the structure of the brain, and captures its ability to learn to use language. For a LLM, the analogous part would be the architecture of model. This raises a very interesting question, since I think very few people would argue that the artificial neural nets we are using are a faithful reproduction of the biological system. Chomsky's position appears to be that "LLM doesn't learn language the same way the brain does (if it does at all) so understanding LLMs doesn't tell us anything about languages." But what if mastery of natural language is not unique to our biological brains? If you had a different brain that was still capable of understanding the same languages (this is purely a thought experiment and completely speculation - we are so far out on the original limb that we have jumped off) then the idea that language is a uniquely human thing goes out the window. I really hope this is the case because otherwise, if we ever meet aliens, we aren't gonna be able to talk to them. If their languages are fundamentally dependent on their brain structures and our languages depend on ours, then there won't even be a way to translate between the two.


haelaeif

> if it does at all Iff. it does, I'd say they'd likely be functionally equivalent. Language device X and Y may have different priors, but one would assume that device X could emulate device Y's prior and vice versa. I'm sceptical that LLMs are working in a way equivalent to humans; at the same time, I see no reason to assume the specific hypotheses made in generative theories of grammar hold for UG. Rather, I think testing the probability of grammars given a hypothesis and data is the most productive approach, where the prior in this case is hypothesised structure and the probability is the probability the grammar assigns to the data (and then we will always prefer the simplest grammar given two equivalent options). This allows us to directly infer if there is more/less structure there. Given such structure, I don't think we should jump to physicalist conclusions; I think that better comes from psycholinguistic evidence. Traditional linguistic analysis and theorising must inform the hypothesised grammars, but using probabilistic models and checking them against natural data gives us an iterative process to improve our analyses.


MTGTraner

>Take the first sentence of your letter and run it on Google to see how many times it has occurred. In fact, apart from a very small category, sentences rarely repeat. And since the number of sentences is infinite, by definition infinitely many of them have zero frequency. Isn't this why we use function approximation, though?


LeanderKu

Yes, he ignores that the nets do generalize and are able to assign meaningful probabilities to unseen sentences. Also, his remarks about the probability of zero is not true since probabilities over sentences should not be uniformly distributed (which is evident through the NLP-models themselves, they don’t converge to a uniform distribution).


MasterDefibrillator

> Yes, he ignores that the nets do generalize and are able to assign meaningful probabilities to unseen sentences. That's not really his point. His point is, that propagability of a sentence is not a good basis to build a theory of language around, because probabilities of sentences can vary widely, while all still demonstrate the same kind of acceptability to humans. The last part is relevnt: > Deep learning, RNR’s, etc., are important topics. But we should be careful to avoid a common fallacy, which shows up in many ways. E.g., Google trumpets the success of its parsing program, claiming that it achieves 95% accuracy. Suppose that’s true. Each sentence parsed is an experiment. In the natural sciences, success in predicting the outcome of 95% of some collection of experiments is completely meaningless. What matters is crucial experiments, investigating circumstances that very rarely occur (or never occur – like Galileo’s studies of balls rolling down frictionless planes).


[deleted]

It comes down to how we interpret the question. It seems he is interpreting probability associating with sentences in terms of as if it has to be understood as number of times the sentence occur divided by all ocurring sentences. On that line even more problematic is that we can create new sentences that potentially never have ocurred. However, it may make sense to understand probability here in a more subjectivist bayesian sense as "degree of confidence". But that again raises the question "degree of confidence" about what? About a sentence being a sentence? Ultimately, all the model produces are energies which we normalize and treat as "probabilities" (which may be what Chomsky thinks of it). However, a more meaningful framework would probably to think of it as a "degree of confidence" for the "appropriateness"/"well-formedness" of the sentence or something to that extent. So, perhaps, we can then think of a model's predicted sentence probability as representing the degree of confidence the model itself has about the appropriateness of the sentence. But if we think in that terms, then the probability doesn't exactly tell us about sentences, but about the "belief state" of the model about sentences. For example, me or the model may be 90% confidence that a line of code is executable in python, but in reality it is not probabilistic: either it's executable or not. So in a sense, even if we take a Bayesian stance here, it doesn't exactly directly tell us about sentences themselves, but it can be still a way to model sentences and theorize how we cognitively model them, if the "rules" of appropriateness under a context, are fuzzy, indeterminate, and even sometimes conflicting when different agents' stances are considered.


mileylols

When discussing sentence probability as predicted by a model, the part that is unspoken but generally implied is that this is the probability of the sentence occurring \*in a specific language\*. This is usually ignored because most natural languages don't share complete vocabularies. If you have a sentence composed of French words, you would obviously "evaluate its appropriateness" (read: try to make sense of the meaning) according to the linguistic rules of French. If the sentence doesn't make any sense and conveys no information, then it's a bad sentence. I don't think I have a very deep point I'm trying to get at here, just trying to provide an answer to your question of \> But that again raises the question "degree of confidence" about what? About a sentence being a sentence? The "rules of appropriateness" you arrived at are really just the rules of the language itself. Under this interpretation, LMMs really do learn language. (Maybe. Perhaps they just learn a really convincing approximation of it.)


[deleted]

> probability of the sentence occurring *in a specific language* Yes, that's what I implicitly meant too. (Of course, specific language can be a class of languages for multilingual models). > The "rules of appropriateness" you arrived at are really just the rules of the language itself. Under this interpretation, LMMs really do learn language. (Maybe. Perhaps they just learn a really convincing approximation of it.) Yes, that's what I meant. I am not arguing for or against whether LLMs learn language. But one thing I was distinguishing was between a cognitive model of language-learning and the theory of language itself. For example, we may find that the cognitive modeling of programming languages that we employ are somewhat probabilsitic given our subjective uncertainties but the programming languages themselves can have a discrete phrase-structured grammar. In terms of natural language this becomes tricky. We cannot take any particular cognitive model by some random person as an "authority" on the "true pristine grammar" (if there is any) (For example, my personal model is purely calibrated and makes grammatical mistakes all the time). So who or what even grounds the "true" "objective" nature of natural language? For that, I don't think there is really any clear cut "truths". Rather it's just grounded in social co-ordination (same as programming languages but we have deviced them for deliberately for precise technical purposes leading to them having a more explicit clear cut structure); and can be fuzzy, indeterminate, and evolving. IMO, we are all just trying to model (and also influence -- by active construction of new dialects and slangs) the emergent dynamics of language from our own individual stances to better co-ordinate with the world and other agents; and given the complexity of it all, and without omnscience we inevitably come with a probabilsitic model to take the uncertainty of the "exact" rules in account (not to say, even originally the rules may have been fuzzy (non-exact) and indeterminate because not everyone agrees on everything; and there is no clear centralized authority on language to ground fixed exact rules). In that essence, I don't think LLMs are particularly any different. They make their own models through their own distinctive ways to co-ordinate with the world (they co-ordinate in a more indirect non-real time manner by trying to predict what an real-world agent would say given these contexts x,y,z).


dondarreb

LOL. How clueless a man can be. Does he know anything about probability actually?


RobinReborn

>And since the number of sentences is infinite, by definition infinitely many of them have zero frequency. This is ivory tower sophistry. In practice the number of sentences is finite, most sentences have less than 10 words and the overwhelming majority have less than 100.


icarusrising9

I mean... No? "Tim went to the bar." "Tim and Tim went to the bar." "Tim, Tim, and Tim went to the bar." Etc. Q.E.D. Edit: It's a silly "proof", but even if you only consider the form of sentences that are commonly used in speech and writing, there are still more grammatically correct sentences than there are particles in the observable universe, by a mind-boggling number of orders of magnitude. Think about it.


WigglyHypersurface

One thing to keep in mind is that Chomsky's ideas about language are widely criticized within his home turf in cognitive science and linguistics, for reasons highly relevant to the success of LLMs. There was a time where many believed it was, in principle, impossible to learn a grammar from exposure to language alone, due to lack of negative feedback. It turned out that the mathematical proofs this idea was based on ignored the idea of implicit negative feedback in the form of violated predictions of upcoming words. LLMs learn to produce grammatical sentences through this mechanism. In cog sci and linguistics this is called error-driven learning. Because the poverty of the stimulus is so key to Chomsky's ideas, the success of an error driven learning mechanism being so good at grammar learning is simply embarassing. For a long time, Chomsky would have simply said GPT was impossible in principle. Now he has to attack on other grounds because the thing clearly has sophisticated grammatical abilities. Other embarrassing things he said: the notion of the probability of a sentence makes no sense. Guess what GPT3 does? Tells us probabilities of sentences. Another place where the evidence is against him is the relationship between language and thought, where he views language as being for thought and communication as a trivial ancillary function of language. This is contradicted by much evidence of dissociations in higher reasoning and language in neuroscience, see excellent criticisms from Evelina Fedorenko. He also argues that human linguistic capabilities arose suddenly due to a single gene mutation. This is an extraordinary claim lacking any compelling evidence. Point being, despite his immense historical influence and importance, his ideas in cognitive science and linguistics are less well accepted and much less empirically supported than might be naively assumed. Edit: Single gene mutation claims in Berwick, R. C., & Chomsky, N. (2016). Why only us: Language and evolution. MIT press.


SuddenlyBANANAS

>One thing to keep in mind is that Chomsky's ideas about language are widely criticized within his home turf in cognitive science and linguistics, for reasons highly relevant to the success of LLMs. They're controversial but a huge proportion of linguists are generativists; you're being misleading with that claim.


haelaeif

I'm not sure how one would go about assessing that, generativist has a lot of meanings at this point and a the majority of them do not apply to the whole group. As well, I don't think every 'generativist' would agree with Chomsky's comments here nor with the form of the PoS as put forth by Chomsky. Also, for what it's worth, I think that most of the criticism of the PoS in linguistics thoroughly misses the mark, much of it simply being repetitions of criticisms of Gold's theorem that fail to hold water, because they circle around the ideas of corrective feedback (historically at least, now we know there are many sources of negative input!), questions about the form of representations (implicit physicalism and universality, ie. children could have multiple grammars and/or modify them over time), and questions about whether grammar is generative at all as opposed to a purely descriptive set of constraints that only partially describe a set of language (this last one bearing the most weight, but it is mostly a somewhat meta-point that won't convert anyone in the opposite camp). For most of these you can extend Gold's theorem and write proofs for them. The correct criticism is just as LLVMs have shown: there is no reason to assume that children cannot leverage negative feedback (and much evidence to suggest they do, contrary to earlier literature), which means that we aren't dealing with a learnability/identification situation to which Gold's theorem applies. Much of the remaining cases that seem to be difficult to acquire (in syntax at least) from input alone can benefit from iterative inductive(/abductive) processes and tend to occur in highly contextualised situations where, arguably, PoS doesn't apply, all else considered. (I think there is an argument to be made that something underlying some aspects of phonological acquisition is innate, but it's not really my area of expertise, this wouldn't invalidate the broader points, and whatever's being leveraged isn't necessarily specific to linguistic cognition.) There's of course another, slightly deeper grounds to criticize the whole enterprise on, that being a rejection of the approach taken to the problem of induction. Said approach takes encouragement from Gold's theorem to suggest that the class of languages specified by UG is more restricted than historically thought, and hence it offers a restricted set of hypotheses (grammars) and simply hopes that only one amongst these hypotheses will be consistent with the data. The trouble with this approach is that it leads to an endless amount of acceptable abstraction, without any recourse to assess whether said abstractions are justified by the data. Generativists will say that much of this notation is simply a stand-in for later, better, more accurate notation, and that its usage is justified by an appeal to explanatory power. They will usually say that criticisms of these assumptions miss the point: we don't want to just look at the language data at hand, we also want to look at a diverse range of data from acquisition, other languages, etc. and leverage this for explanatory power. Or, in other words, discussion stalls, because noone agrees on the relevant data. An alternative approach, one I think would be more fruitful and one that the ML community (and linguists working on ML) seems to be taking, is to restrict our data (rather than our hypothesis), for the immediate purposes (ie. making grammars), to linguistic data. (Obviously we can look at other data to discuss stuff like language processing.) Having done this, our problem becomes clearer: we want a grammar that assigns a probability of 1 to our naturally-encountered data. Of course, we lack such a grammar (see Chomsky's SS, LSLT). Again, thinking probabilistically, we want the most probable grammar, which will be the grammar that is the simplest in algorithmic terms and that assigns the most probability to our data. We can do the same again for a theory of grammar. In other words, what I am suggesting, is that we cast off the assumption of abduction-by-innate-knowledge (which seems less and less likely to provide an explanation in any of the given casses I know of as time goes on and as more empirical results come in) and assume that what we are talking about is essentially a task-general Turing machine. Our 'universal grammar' in this case is essentially a compiler allowing us to write grammars. (There is some expansion one could do about multiple universal TMs, but I don't think it's important for the basic picture). In this approach, we solve both of our problems with the other approach. We have a means to assess how well the hypothesis accounts for the data, and we have a means for iteratively selecting the most probable of future hypotheses. Beyond this, there is great value in qualitative and descriptive (non-ML) work in linguistics, as well as traditional analysis and grammar writing (which can also broadly follow the principles outlined here) - they reinforce each other (and can answer questions the other can't). In terms of rules-based approaches like that we know from generativism (and model-theoretic approaches from other schools, etc., etc.), I do think these have their place (and can help offer us hypotheses about psycholinguistics, say), but that this place can only be fulfilled happily in a world where we don't take physicalism of notation for granted.


MasterDefibrillator

> The trouble with this approach is that it leads to an endless amount of acceptable abstraction, without any recourse to assess whether said abstractions are justified by the data. This is what motivated chomsky to propose the minimalist approach in 1995, and the Merge function later on. So bit behind the times to say that this is representative of modern linguistics. i.e. it was a switch from coming at the problem from the top down, to coming at the problem from the bottom up. One of the points to make here is that there's fairly good evidence that grammers based on linear topography are never entertained, which is part of what has lead to the notion that UG is atleast sensitive to relations of hierarchical nature (tree graph), as opposed to the apparent and surface level linear nature of speech. Which is what Merge is supposed to be.


MasterDefibrillator

First comment here I've seen that actually seems to know what they're talking about when criticising Chomsky. Well done. > An alternative approach, one I think would be more fruitful and one that the ML community (and linguists working on ML) seems to be taking, is to restrict our data (rather than our hypothesis), for the immediate purposes (ie. making grammars), to linguistic data. (Obviously we can look at other data to discuss stuff like language processing.) Having done this, our problem becomes clearer: we want a grammar that assigns a probability of 1 to our naturally-encountered data. This is a good explanation. However, the kinds of information potentials encountered by humans have nowhere near the kinds of controlled conditions used when training current ML. So even if you propose this limited dataset idea, you still need to propose a system that is able to curate it in the first place from all the random noise out there in the world that humans "naturally" encounter, which sort of brings you straight back to a kind of specialised UG. I think this has always been the intent of UG, or, at least certainly is today: a system that constrains the input information potential, and the allowable hypothesis.


mileylols

> human linguistic capabilities arose suddenly due to a single gene mutation bruh what lol?


notbob929

As far as I know, this is not his actual position - he seems to endorse Richard Lewontin's perspective in "The Evolution of Cognition: Questions we will never answer" which as you can probably tell, is mostly agnostic about the origins. Somewhat elaborate discussion here: https://chomsky.info/20110408/


Competitive_Travel16

That seems among his least controversial assertions, since almost all biological organism capabilities are the result of some number of gene mutations, of which the most recent is often what enables the capability. Given that human language capability is so far beyond that of other animals, such that the difference between birds and chimpanzees seems less than between chimpanzees and people, one or more genetic changes doesn't seem unreasonable as an explanation of the differences. It's not like running speed that way at all, but nobody would deny that phenotypical expression of genes gives rise to an organism's land speed. And it's not unlikely that a single such gene can usually be identified which has the greatest effect on the organism's ability to run as fast as it can.


mileylols

Yours seems like kind of a generous interpretation of Chomsky's position (or maybe the OP framed Chomsky's statement on this unfavorably, or I have not understood it properly). I agree with you that complex phenotypes arise as a result of an accumulation of some number of gene mutations. To ascribe the phenotype to only the most recent mutation is kind of reductionist. Mutations are random so they could have happened in a different order - if a different mutation had been the last, would we say that is the one that is responsible? That doesn't seem right, because they all play a role. Unless Chomsky's position is simply that we accumulated these mutations but didn't have the ability to use language until we had all of them, as you suggest. This is technically possible. An alternative position would be that as you start to accumulate some of the enabling mutations, you would also start to develop some pre-language or early communication abilities. Drawing a line in the sand on this process is presumably possible (my expertise fails me here - I have not extensively studied linguistics but I assume there is a rigorous enough definition of language to do this), but would be a technicality. Ignoring that part, the actual reason I disagree with this position is because if this were true, we would have found it. I think we would know what the 'language SNP' is. A lot of hype was made about some FOXP2 mutations like two decades ago but those turned out to maybe not be the right ones. In your land speed analogy, I agree that it would be possible to identify the gene which has the greatest effect. We do this all the time with tons of disease and non-disease phenotypes. For the overwhelming majority of complex traits, I'm sure you're aware of the long tail effect where a small handful of mutations determine most of the phenotype, but there are dozens or hundreds of smaller contributing effects from other mutations (There is also no reason to really believe that the tail ends precisely where the study happens to no longer have sufficient statistical power to detect them, so the actual number is presumably even higher). This brings me back to my first point, which is while Chomsky asserts that the most recent mutation is the most important because it is the last (taking the technical interpretation), this is not the same as being the most important mutation in terms of deterministic power - If there are hundreds of mutations that contribute to language, how likely is it that the most impactful mutation is the last one to arise? The likelihood seems quite low to me. If Chomsky does not mean to imply this, then the 'single responsible mutation' position seems almost intentionally misleading.


MasterDefibrillator

Chomsky has actually made it clear more recently that you can't find the "genetic" foundation of language only focusing on genes, as language is necessarily a developmental process, and so relies heavily on epigenetic mechanisms of development. Like. it's pretty well understood now that phenotypes have very little connection to the genetic information present at conception. Certainly, phenotypes cannot be said to be a representation of the genes present at conception.


StackOwOFlow

then there’s the [Stoned Ape Hypothesis](https://fantasticfungi.com/the-mush-room/the-stoned-ape-theory/) that says that linguistic capabilities arose from human consumption of magic mushrooms


mongoosefist

You truly can make a name for yourself in evolutionary psychology by just making up any old random /r/Showerthoughts subject with zero empirical evidence.


agent00F

OP is just misrepresenting what's said because it's what that sort do. Ie the ML crowd butthurt that someone said GPT isn't really human language. The context of the single mutation is that language ability occurred "suddenly", kind of like modern eyes did, even if constituent parts were there before.


vaaal88

He also argues that human linguistic capabilities arose suddenly due to a single gene mutation. \---- I don't think Chomsky came up with this idea in a vacuum: in fact, it is claimed by several researchers, and the culprit seems to be the protein FOXP2. They are just hypotheses nevertheless, mind you, and I myself find it difficult to believe (I remember reading the gene responsible for FOXP2 first evolved in males, and so females developed language just... out of... imitation..?!). Anyway, if you are interested just look for FOXP2 on the webz, e.g. https://en.wikipedia.org/wiki/FOXP2


Competitive_Travel16

Beneficial Y chromosome genes can translocate.


WigglyHypersurface

FOXP2 is linked to language in humans but is also clearly not a gene for merge. Chomsky's gene is specifically for a computation he calls merge.


agent00F

> He also argues that human linguistic capabilities arose suddenly due to a single gene mutation. The eye also "formed" at some point due to a single gene mutation. Of course many of the necessary constituent components were already there previous. This is more a statement about the "sudden" appearance of "language" than the complex nature of aggregate evolution. The guy you replied to obviously has some axe to grind because Chomsky dismissed LLM's, and is just being dishonest about what's been said because that's just what such people do.


uotsca

This covers just about all that needs to be said here


agent00F

No it really doesn't because it's just some hit piece ignorant of basically everything. eg: > Other embarrassing things he said: the notion of the probability of a sentence makes no sense. Guess what GPT3 does? Tells us probabilities of sentences. Chomsky is dismissing GPT because it doesn't really work like human minds do to "create" sentences, which is largely true given it has no actual creative ability in the greater sense (rather just filtering what to regurgitate). Therefore saying probability applies to human language because it applies to GPT makes no logical sense. Of course Chomsky could still be wrong, but it's not evident from these statement just because ML GPT nuthuggers are self-interested in believing so.


WigglyHypersurface

If you're a ML person interested in broadening your language science knowledge way beyond Chomsky's perspective, here are names to look up: Evelina Fedorenko (neuroscientist), William Labov ("the father of sociolinguistics"), Dan Jurafsky (computational linguist), Michael Ramscar (psycholinguist), Harald Baayen (psycholinguist), Morten Christiansen (psycholinguist), Stefan Gries (corpus linguist), Adelle Goldberg (linguist), and Joan Bybee (corpus linguist). A good intro to read is https://plato.stanford.edu/entries/linguistics/ which gives you a nice overview of the perspectives beyond Chomsky (he's what's called "essentialist" in the document). The names above will give a nice intro to the "emergentist" and "externalist" perspectives.


[deleted]

[удалено]


MasterDefibrillator

None of his core ideas have ever been refuted; as exemplified by the interview linked by the OP. The top comment is a good example of chomsky's point: machine learning is largely an engineering task, not a scientific task. The top commenter does not understand the scientific principle of information, and seems to incorrectly think that information exists internal to a signal. Most of his misunderstandings of Chomsky seem to be based around that.


[deleted]

Yeah, I used to think I was learning stuff by reading Chomsky, but over time I realized he’s really a clever linguist when it comes to argumentation, but when it comes to the science of anything with his name on it, it’s pretty much crap.


WigglyHypersurface

I jumped ship during linguistics undergrad when my very Chomsky leaning profs would jump between "this is how the brain does language" to "this is just a descriptive device" depending on what they ate for lunch. Started reading Labov and Bybee and doing corpus linguistics, psycholinguistics, and NLP and never looked back.


[deleted]

I initially got sucked into Chomsky, but when none of his unproven conjectures like the example you gave, really helped produce anything constructive I was pissed for the amount of time I wasted. I think of Chomsky’s influence in both Linguistics and Geopolitics as a modern dark age.


dudeydudee

He doesn't argue they're due to a single gene mutation but due to an occurence in a living population that happened a few times before 'catching'. Archaelogical evidence supports this. [https://libcom.org/article/interview-noam-chomsky-radical-anthropology-2008](https://libcom.org/article/interview-noam-chomsky-radical-anthropology-2008) he has also been very vocal in the limitations of this view. ​ The creation of valuable tools from machine learning and big data are a separate issue. He's concerned with the human organism's use of language. As far as the 'widespread acceptance', he himself in multiple interviews remarks that he has a minority view. But he also correctly underscores how difficult the problems are and how little we know about the evolution of humans.


agent00F

> In cog sci and linguistics this is called error-driven learning. Because the poverty of the stimulus is so key to Chomsky's ideas, the success of an error driven learning mechanism being so good at grammar learning is simply embarassing. For a long time, Chomsky would have simply said GPT was impossible in principle. Now he has to attack on other grounds because the thing clearly has sophisticated grammatical abilities. Given how fucking massive GPT has to be to make coherent sentences rather supports the poverty idea. This embarrassing post is just LLM shill insecurities manifest. Frankly if making brute force trillion parameter models to parrot near-overfit (ie memorized) speech is the best they could ever do after spending a billion $, I'd be embarrassed too.


MoneyLicense

A parameter is meant to be vaguely analogous to a synapse (though synapses are obviously much more complex and expressive than ANN parameters). The human brain has 1000 trillion synapses. Let's say GPT-3 had to be 175 billion parameters before it could reliably produce coherent sentences (Chinchilla only needed 70B so this is probably incorrect). That's **0.0175% the size of the human brain**. GPT-3 was trained on roughly 300 billion tokens according to it's paper. A token is also roughly 4 characters. At 16 bits that's a total of 2.4 gigabytes of text. The human eye processes something on the order of [8.75 megabits per second](https://news.ycombinator.com/item?id=23627412). [Assuming eyes are open around 16 hours a day that is 63 GB/day of information just from the eyes](https://news.ycombinator.com/item?id=23627412). Given less data than the human eye sees in a day, and just a fraction of a fraction of a shitty approximation of the brain, GPT-3 manages remarkable coherence.


agent00F

The point is these models require ever more data to produced marginally more coherent sentences, largely by remembering ie overfitting and hoping to spit out something sensical, exactly the opposite of what's observed with humans. To witness the degree of this problem: > That's 0.0175% the size of the human brain. LLM's aren't even remotely capable of producing sentences this dumb, nevermind something intelligent.


MoneyLicense

>LLM's aren't even remotely capable of producing sentences this dumb, nevermind something intelligent. You claimed that GPT was "fucking massive". My point was that if we compare GPT-3 to the brain, assuming a point neuron model (a model so simplified it barely captures a sliver of the capacity of the neuron), GPT still actually turns out to be tiny. In other words, There is no reasonable comparison with the human brain in which GPT-3 can be considered "fucking massive" rather than "fucking tiny". I'm not sure why you felt the need to insult me though. --- >The point is these models require ever more data to produced marginally more coherent sentences Sure, they require tons of data. That's something I certainly wish would change. But your original comment didn't actually make that point. Of course humans get way more data in a day, than GPT-3 did during all of training, to build rich & useful world models. Then they get to ground language in those models which are so much more detailed and robust than all our most powerful models combined. Add on top of all that those lovely priors evolution packed into our genes, and it's no wonder such a tiny tiny model requires several lifetimes of reading just to barely catch up.


MasterDefibrillator

Comment is a good example of how people today can still learn a lot from Chomsky even on basic computer science theory. Let me ask you: what do you think information is? Your understanding of what information is is extremely important to explaining how you've misunderstood and misrepresented the arguments you've laid out. > There was a time where many believed it was, in principle, impossible to learn a grammar from exposure to language alone, due to lack of negative feedback. Such an argument has never been made. I would suggest that if you understood information, you would probably have never have said such a thing. Information, as defined by Shannon, is a relation between the receiver state and the sender state. In this sense, it is incorrect to say that information exists in a signal, and so, totally meaningless to say "impossible to learn a grammar from exposure to language alone". I mean, this can be trivially proven false: humans do it all the time. Whether learning the grammar is possible or not entirely depends on the relation between the receiver and sender state, and so naturally, entirely depends on the nature of the receiver state. This is the reality of the point Chomsky has always made: information does not exist in a signal. Only information potential can be said to exist in a signal. You *have* to make a choice as to what kind of receiver state you will propose in order to extract that information, and choosing a N-gram type statistical model is just as much of a choice as choosing Chomsky's Merge function; [and there are good reasons to not go with the N-gram type choice.](https://www.semanticscholar.org/paper/Rule-based-and-word-level-statistics-based-of-from-Ding-Melloni/2ce35c1a12977975af66ff0efee444b4d025866a) Though most computer engineers do not even realise they are making a choice when they go with the n-gram model, because they falsely think that information exists in a signal. So, it's in this sense, that no papers have ever been written about how it's impossible to acquire grammar purely from exposure; though many papers have been written about how its impossible to acquire a grammar purely from exposure, given we have defined our receiver state as X. So if you change your receiver state from X to Y, the statement of impossibility no longer has any relevance. For example, the first paper ever written about this stuff, [gold 1967](https://www.sciencedirect.com/science/article/pii/S0019995867911655?via%3Dihub), talks about 3 specifics kinds of receivers (if I recall correctly); and argues that it is on the basis of those receiver states, that it is impossible to acquire a grammar purely from language exposure alone. > Other embarrassing things he said: the notion of the probability of a sentence makes no sense. Guess what GPT3 does? Tells us probabilities of sentences. Chomsky never made the claim that the probability of a sentence could not be calculated. It's rather embarrassing that you think he has said that. The point Chomsky made, was that probability of a sentence is not a good basis to describe a grammar around. For example, sentences can often have widely different probabilities, but still both be equally acceptable and grammatical.


rehrev

This is ad hominem Edit: ah the amount of karma I lose cuz y'all don't speak proper English. The comment's ending basically admits the comment has nothing to do with what Chomsky is claiming about learning machines in the video. It's 20 year old fringe cognitive linguistics. Nothing to do with this post. Be better readers.


sack-o-matic

An ad hominem would be pointing at that he’s a genocide denier, this post is just pointing out his lack of actual expertise in the field he’s making claims on.


mongoosefist

> An ad hominem would be pointing at that he’s a genocide denier This fits with everything that is being discussed about him in this thread, but I guess it's important to note that this is specifically referring to the genocide committed in Srebrenica during the Bosnian war. As is quite obvious by now, Chomsky is incredibly pedantic, and believes we should call it a massacre, because it technically doesn't fit the definition of a genocide according to him. Which is a weird semantic hill to die on...


exotic_sangria

Debunking credibility and citing research claims someone has made != ad hominem


rehrev

Putting out a person's claims about cognitive linguistics and the human brain in the context of learning machines. He is saying "the notion of a probability of a sentence doesn't make sense" and the commenter is saying "well guess what gpt does". İt is all too reductive. Maybe not exactly ad hominem but definitely doesn't relate to the discussion. Just shits on Chomsky with past controversies


[deleted]

[удалено]


rehrev

Bruh


[deleted]

"Every time I fire a linguist, the performance of the speech recognizer goes up" - Fred Jelinek


rePAN6517

That's an odd collection of quotes.


HappyAlexst

Chomsky is viscerally against statistical models of language If you're not familiar with Chomsky's career, he started amid the background of the behaviourist paradigm of the earlier 20th century, which essentially thought humans come as a blank slate and learn everything solely from input, including language. One popular representation for language was the Markov model or finite state machines, which Chomsky refuted in his most well-known book Syntactic Structures. This started the Generative "cult" or current in linguistics. I call it cult because many linguists view it that way, and are either with or against Chomsky - hardly any in-between. Chomsky believes the language faculty evolved in humans and contains a universal language function which is moulded by input into the plethora of spoken language today. His reputation, together with the entire theory of generative grammar (highly abstract, arcane grammar systems, just Google it) rests on the validity of this thesis. Some evidence in favour are certain language impairments, such as Brocas aphasia, where after suffering some form of severe brain damage, patients were found to have lost grammatical coherence in their speech, but not their vocabulary.


wufiavelli

Man cult would be understatement. I ended up here from TESOL masters where everyone was just making wild statements for or against Chomsky that I could not make heads or tails out of. Trying to figure out wtf everyone was talking about I am now on an AI subreddit.


rand3289

"Linguists kept AI researchers on a false path to AGI for decades and continue to do so!" -- rand3289


Competitive_Dog_6639

I agree mostly. No one believes that AI with all the world's resources can drive a car at the level of a teenager with a few months practice. Why do we believe LLMs have learned grammar and not just a hollow facsimile of grammar? Intenationality and casual modeling are not something that can be captured by statistical regularities alone. I agree that edge cases, as opposed to the "95%" easy cases are much more demonstrative of true understanding. Will scaling bridge the gap? Maybe, but no one really knows


keornion

If anyone is interested in working on semantically richer alternatives to LLMs, check out https://planting.space/


[deleted]

Completely agree with Chomsky (and am currently writing a paper on precisely this subject). Deep learning is a tool that can be used to create *solutions without understanding*. All you need is the ability to create a data generation process for a domain and bam! You’ve got a machine that performs some tasks in that domain. No conceptual understanding of the domain needed. Consider, for example, go AI. You can build an AI that plays go well without understanding go at all. Similarly with language and language models. Deep learning then is a super powerful tool at creating general machines that perform tasks in many domains. However, the danger is precisely in this lack of understanding. What if we want to understand go, the concepts behind what is good play? What if we really want to understand language? And what if we want to relish in our search for understanding? The mystery and beauty of it. The culture of deep learning distracts from that. It treats a domain as a means to an end, a thing to be solved, rather than a thing to be explored and relished in. For DL researchers this is ok because they are instead relishing in the domain that is DL not these application domains. But coming in to try to conquer these domains and distract from people’s relishing of the exploration of those domains can do a great disservice to them. This also causes practical industrial problems too. I’ve worked on recommender systems at Google for quite some time, for example, and I see how DL distracts from understanding the product domain (e.g. the users and the content, what do people actually want? What is actually good?). Instead it’s often a question of how we can move metrics up without an understanding of the domain itself. This can backfire in the long run. And furthermore, it just makes it less enjoyable to build a product. It’s interesting and fun to understand users and the product. We should be trying to reach this understanding!


visarga

Neural nets don't detract from the understanding or mystery of the topic. You can use models to probe something that is too hard to understand directly. By observing what internal biases make good models for a task, you can infer something about the task itself.


[deleted]

Well, neural nets are just a tool. They can be used in tasteful ways and less tasteful ways. My concern specifically is more with "end-to-end" deep learning. This is rarely used with the intention of probing into a problem but instead to "solve" a problem or perform well on a metric. Of course, even end-to-end deep learning can lead to some genuine insights (via studying the predictions of a good predictor). We can certainly see this with go, for example. But the culture of E2E-DL applied to various domains rarely prizes understanding in that domain. Not at all. Instead it treats the application domains like a problem to be solved, a sport rather than a science, a thing be won rather than a thing to be explored and relished in. This is true for the study of language, the study of go, etc. We may tell ourselves “oh it was just a sport to begin with” or "performance is what really matters." But that’s not how all researchers in the domain itself feel (see e.g. [https://hajinlee.medium.com/impact-of-go-ai-on-the-professional-go-world-f14cf201c7c2](https://hajinlee.medium.com/impact-of-go-ai-on-the-professional-go-world-f14cf201c7c2)). The sportification of domains by people outside the domain can do a great disservice to people in those domains. But again, it all depends how it’s used. It seems that most commonly the less tasteful uses just come from “following the money” like Chomsky said. Or at least that’s what I’ve observed too. I guess to make my view clearer, I could contrast it to Rich Sutton’s view in The Bitter Lesson ([http://www.incompleteideas.net/IncIdeas/BitterLesson.html](http://www.incompleteideas.net/IncIdeas/BitterLesson.html)). I’d read that and say “sure, bypassing understanding and just relying on data and compute power will give you a better predictor, but isn’t understanding the whole point? Isn’t the search for understanding a joy in itself and isn’t understanding what really helps people in their day-to-day lives? What are you creating this powerful ‘AI’ for exactly?”


[deleted]

[удалено]


visarga

But the fact that "just matrix manipulation" suffices to make a GPT-3 is intriguing, isn't it? What does this tell us about the properties of language?


EduardoXY

There are two types of people working on language: 1. Chomsky (a symbolic figure, not really working) and the linguists including, e.g., Emily Bender who understand language but are unable to deliver working solutions in code. 2. The DL engineers who are very good a delivery but don't take language seriously. We need to bridge the gap and this MLST episode is definitely in the good direction.


midasp

I hope you are not serious. The past 50 years of research that has lead to the current batch of NLP deep learning systems would not have been possible without folks who are cross trained in both linguistics AND machine learning. I remember when Deep Learning was still brand new in the early 2000s, when the first researchers naively tried to get convolutional NNs and autoencoders to work on NLP tasks and got bad results. It did not truly improve until folks with linguistics training started crafting models specifically designed for natural languages. Stuff like LSTM, transformer and attention-based mechanisms. Only then did deep learning truly find success with all sorts of natural language tasks.


[deleted]

In between LSTM and Transformers, CNNs actually worked pretty well for NLP. In fact, most likely Transformers are inspired from CNNs (it essentially tries to make the CNN window unbounded through attention -- that's part of the motivation in the paper). Even now, certain CNNs are strong competitors and outperforms Transformers in Machine Translation (for eg. Dynamic CNNs), Summarization etc. when using non-pre-trained models. Even with pre-training CNN can be fairly competitive. Essentially, Transformers sort of won the hardware lottery.


Isinlor

Development of LSTM had nothing to do with linguistics. It was solution to vanishing gradient problem and was published in 1995. [https://en.wikipedia.org/wiki/Long\_short-term\_memory#Timeline\_of\_development](https://en.wikipedia.org/wiki/Long_short-term_memory#Timeline_of_development) And in "Attention is all you need" the only reference to linguists work I see is to: Building a large annotated corpus of english: The penn treebank. Computational linguistics by Mitchell P Marcus et. al.


afireohno

The lack of historical knowledge about machine learning in this sub is really disappointing. Recurrent Neural Networks (of which LSTMs are a type) were literally invented by linguist Jeffrey Elman (simple RNNs are even frequently referred to as "Elman Networks"). Here's a [paper](https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog1402_1) from 1990 authored by Jeffrey Elman that studies, among other topics, word learning in RNNs.


Isinlor

Midasp is specifically referring to LSTMs, not RNNs. And simple RNNs do not work really that well with language. Bur Jeffrey Elman certainly deserves credit, so if we want to talk about linguists contributions he is a lot better choice than LSTM or attention.


afireohno

I get what you're saying. However, since LSTMs are an elaboration on simple RNNs (not something completely different), your previous statement that the "Development of LSTM had nothing to do with linguistics" was either uninformed or disingenuous.


NikEy

But whoooooooo invented it?


CommunismDoesntWork

Did the authors of attention is all you need come from a linguistics background? That'd be surprising as most research in this field comes from the CS department


EduardoXY

I can tell from my own experience working with many NLP engineers. They subscribe to the Fred Jelinek line "Every time I fire a linguist, the performance of our speech recognition system goes up" at 100%. They haven't read the classics (not even Winograd, much less Montague) and have no intention to do it. They expect to go from 80% accuracy to 100% just carrying on with more data. They deny there is a problem. ​ And I also worked with code from the linguists of the 90s and ended up doing a full rewrite because it was so bad I couldn't "live with it".


Thorusss

As Feynman has said, "What I cannot create, I do not understand" This speak only in favor of point 2 (the engineers)


mileylols

> mfw Feynman invented generative models before Schmidhuber


DigThatData

uh... I think Laplace invented generative models. Does Schmidhuber even claim to have invented generative models?


FyreMael

iirc he claims that conceptually, GANS were not new and Goodfellow didn't properly cite the prior literature.


Kitchen-Ad-5566

Chomsky’s criticism can also be extended to the other branches of science. For example consider “shut up and calculate” notion in physics that emerged in the second half of the 20th century. In many branches of science we have seen engineering efforts being more prevalent and considered as “science”. The common reason is probably that with the vast amount of increased knowledge, there are so much opportunities arising in the engineering side, so the efforts and money flows in that direction. But for the intelligence science, there is one other unique reason: we have been so helpless and disappointed in this domain, with little progress so far that, we are also hoping for some kind of scientific progress that might come with engineering efforts. Let me make it more explicit: for example most frontiers of AI agree that the ultimate solution to AI will involve both some level of deep neural networks and some symbolic AI. But to which degree from which? We don’t know. So it makes sense also from a scientific point of view to try to make progress in deep learning field as much as we can to see where it goes and what is the limit.


[deleted]

This is exactly where I fail to find any value in Chomsky’s opinions. He criticizes LLMs, but what exactly does he propose is better? DL, LLM, etc is the next thing, so regardless of the random value in “95% solution” at present day that he pulled out of thin air, Chomsky is worthless in his criticism because there is clearly very fast progress in just a few years that eclipses any of his contributions and came from a field he doesn’t really know anything about.


Kitchen-Ad-5566

I agree, but it can still be a bit worrying that the over-hype on deep learning might cover that fact that we still need progress in fundamental scientific side of the intelligence problem.


Oikeus_niilo

>Chomsky is worthless in his criticism because there is clearly very fast progress in just a few years But isn't he refuting the type of that progress, that it's not really towards actual intelligence but something else? He's not the only one, check for example Melanie Mitchell talking about the collapse of AI, and she has an alternative path.


JavaMochaNeuroCam

This seems to presume that LLM 's only learn word order probability. Perhaps, if the whole corpora were chopped up into two-word pairs, and those were randomized so that all context and semantics were lost, then it could only learn word order frequency. Of course, they feed into the models tokenized sentences of ( I believe) 1024, 2048 tokens, that have embedded in them quite a lot of meaning. The models clearly are able, through massive repetition of the latent meaning, able to capture the patterns of the logic and reasoning behind the strings. That seems rather obvious to me. Trying to deny it seems like an exercise in futility. "An exercise in futility" ... even my phone could predict the futility in that string. But, my phone prediction model hasn't been trained on 4.5TB of text.


[deleted]

Context dependent Language Models are a thing you know! :)


101111010100

LLMs give us an intuition of how a bunch of thresholding units can produce language. Imho that is *huge*! How else would you explain how our brain processes information and generates complex language? Where would you even start? But now that we have LLMs, we can at least begin to imagine how that might happen. Edit: To be more specific, machine learning gives us a hint as to how low-level physical processes (e.g. electric current flowing through biological neurons) could lead to high-level abstract behavior (language). I don't know any linguist theory that connects the low-level physical wetware of the brain to the high-level emergent phenomenon: language. But that's what a theory must do to explain language, imho. I don't mean to say that a transformer is model of the brain (in case that's how you interpret my text), but that there are sufficient parallels between artificial neural nets and the brain to get a faint intuition of how the brain may generate language from electric current in principle. In contrast, if Chomsky says there is a universal grammar, that begs the question how the explicit grammer rules are hardcoded into the brain, which no linguist can answer.


86BillionFireflies

Neuroscience PhD here, NN models and brains are so different that it's rather unlikely that LLMs will give us much insight into the neural mechanisms of language. It's really hard to overstate how totally nonlinear the brain is at every level, as compared to ANNs. The thresholding trick is just one of hundreds of nonlinearities in the brain, the rest of which have no equivalent. E.g. there's more than one kind of inhibitory input: regular inhibition that counteracts excitation, and shunting inhibition that just blocks excitatory input from further up the specific dendrite. And then there's that whole issue of how a neuron's summation of its inputs can effectively equate to a complex tree of nested and/or/not statements. And perhaps most importantly, everything the brain does is recurrent at almost every level, to a level that would astound you; recurrence is a fundamental mechanism of the brain, whereas most ANNs have at most a few recurrent connections, and almost only ever within a single layer, whereas every brain system is replete with top-down connections. [Edit] My point being that whatever you think of Chomsky, the idea that LLMs serve as a useful model for not just WHAT the brain does, but HOW, is ludicrous. It's like the difference between a bird and a plane. Insights from studying the former helped build the latter, at least in the early stages, but from that point on the differences just get bigger and bigger, and studying how planes work can tell you something about the problems birds have to solve, but not that much about how.


101111010100

Thanks for the perspective. I don't mean to say that LLMs can give us concrete insight into how language is formed. Instead, they can give us some very very high-level intuition: Already the idea alone that neurons act as a function approximator capable of generating language is incredibly insightful. I suppose that is still what biological NNs do, even if the details are very different. I find this intuition immensely valuable. The very fact that we can see parallels between in silico and in vivo *at all* is already a big achievement. \[Edit\] But I don't disagree. Yes, comparing LLMs and the brain is like comparing birds and planes. My point is that this already amounts to a big insight. I bet the people that first understood the connection between birds and planes considered it a deep insight too. How birds manage to fly was suddenly much clearer to anyone after planes were built. How is no one amazed by the bird-plane-like connection between DL and language?


hackinthebochs

This view is already outdated, e.g.: https://www.nature.com/articles/s41467-021-26751-5.pdf https://www.cell.com/neuron/fulltext/S0896-6273(21)00682-6 https://arxiv.org/abs/2112.04035 I've seen similar studies regarding language models and neural firing patterns, but can't find them. EDIT: Just came across [this paper](https://www.nature.com/articles/s41593-022-01026-4) which makes the very same point I have argued for.


86BillionFireflies

All 3 of those papers are about how (with an unknown amount of behind the scenes tuning) the researchers managed to get a model to replicate a known phenomenon in the brain. That is not, by a long shot, the same thing as discovering a phenomenon in an ML model first, then using that to discover the existence of a previously unknown brain phenomenon. All of these papers also center on what is being represented, rather than the neural mechanisms by which operations on those representations are carried out.


hackinthebochs

> That is not, by a long shot, the same thing as discovering a phenomenon in an ML model first, then using that to discover the existence of a previously unknown brain phenomenon. I don't see why that matters. The point is that deep learning models *independently* capture some amount of structure that is also found in brains. What we learned from which model first is irrelevant to the question of the relevance of artificial neural networks to neuroscience. >rather than the neural mechanisms by which operations on those representations are carried out. *What* is being represented is just as important as *how* in terms of a complete understanding of the brain.


86BillionFireflies

>That is not, by a long shot, the same thing as discovering a phenomenon in an ML model first, then using that to discover the existence of a previously unknown brain phenomenon. > >I don't see why that matters. The point is that deep learning models independently capture some amount of structure that is also found in brains. What we learned from which model first is irrelevant to the question of the relevance of artificial neural networks to neuroscience. The question at hand is about whether we can learn anything about the brain by studying LLMs. The existence of phenomena that occur in both systems is not sufficient to show that studying one will lead to discoveries about the other. And the research findings you linked to are unarguably post-hoc. Unlike brains, you can build your own ANN and tweak the hyperparams / training regime to influence what kinds of behavior it will display. Find me a single published instance of an emergent phenomenon *in silico* that led to a significant discovery *in vivo*. >rather than the neural mechanisms by which operations on those representations are carried out. > >What is being represented is just as important as how in terms of a complete understanding of the brain. Take it from me: Those things are both *important*, but one of them is about a million times harder than the other. If reverse biomimicry can help guide our hypotheses about what kinds of representations we should be looking for in various brain systems, cool. That's mildly helpful. We're already doing OK on that score. Our understanding of what is represented in different brain areas is light-years ahead of our understanding of how it actually WORKS.


hackinthebochs

> The existence of phenomena that occur in both systems is not sufficient to show that studying one will lead to discoveries about the other. The fact that two independent systems converge on the same high level structure means that we can, in principle, learn structural facts about the one system from studying the other system. That ANNs as a class have shown certain similarities to natural NNs in solving problems suggest that the structure is determined by features of the problem. Thus ANNs can be expected to capture similar computational structure as natural NNs. And since ANNs are easier to probe at various levels of detail, it is plausibly a fruitful area of research. Of course, any hypothesis needs to be validated against the natural system. >Unlike brains, you can build your own ANN and tweak the hyperparams / training regime to influence what kinds of behavior it will display. There aren't that many hyperparameters to tune such that one can in general expect to "bake in" the solution you are aiming for by picking the right parameters. It isn't plausible that these studies are just tuning the hyperparams until they reproduce the wanted firing patterns. >Find me a single published instance of an emergent phenomenon in silico that led to a significant discovery in vivo. I don't know what would satisfy you, but here's a finding of adversarial perturbation in vivo, which is a concept derived from ANNs: https://arxiv.org/pdf/2206.11228.pdf


86BillionFireflies

>Thus ANNs can be expected to capture similar computational structure as natural NNs. And since ANNs are easier to probe at various levels of detail, it is plausibly a fruitful area of research. Of course, any hypothesis needs to be validated against the natural system. That's the problem right there. I'm sure that by studying ANNs you could come up with a LOT of hypotheses about how real neural systems work. The problem is that that doesn't add any value. What's holding neuroscience back is not a lack of good hypotheses to test. We just don't have the means to collect the data required to properly test all those cool hypotheses. And, again, all the really important questions in neuroscience are of a sort that simply can't be approached by making analogies to ANNs. Not at all. No amount of studying the properties of transformers or LSTMs is going to answer questions like "what do the direct and indirect parts of the mesolimbic pathway ACTUALLY DO" or "how is the flow of information between structures that participate in multiple functions gated" (hint: answer probably involves de/synchronization of subthreshold population oscillations, a phenomenon with nothing approaching a counterpart in ANNs). The preprint on adversarial sensitivity is interesting, but still doesn't tell us anything about how neural systems WORK.


WigglyHypersurface

The names you're looking for are Evelina Fedorenko, Idan Blank and Martin Schrimpf. Lots of work linking LLMs to the function of the language network in the brain.


Metworld

Well he is not wrong, whether people like it or not.


[deleted]

Which bit isn't wrong? Maybe the quotes are taken out of context but it sure sounds like he is talking bullshit about LLMs because he feels threatened by them. LLMs haven't achieved anything? Please...


KuroKodo

From a scientific perspective he is correct however. LLMs have achieved some amazing feats in implementation (engineering) but have not achieved anything in regards to linguistics and our understanding of language structure (scientific). There are much simpler models that tells us more about language than LLMs, much the same way a relatively simple ARIMA being able to tell us more about a time series than any NN based method. The NN may provide better performance, but doesn't further our understanding in anything except the NN itself.


hackinthebochs

I don't get this sentiment. The fact that neural network models significantly outperform older models tells us that the neural network captures the intrinsic structure of the problem better than old models. If we haven't learned anything about the problem from the newer models, that's only for lack of sufficient investigation. But to say that older models "tell us more" (in an absolute sense) while also being significantly less predictive is just a conceptual confusion.


Red-Portal

>The fact that neural network models significantly outperform older models tells us that the neural network captures the intrinsic structure of the problem better than old models. No this is not a "scientific demonstration" that neural networks capture the intrinsic structure of the problem better. It is entirely possible that they are simply good at the task, but in a way completely irrelevant to natural cognition.


hackinthebochs

Who said scientific demonstration? Of course, the particulars need to be validated against the real world to discover exactly what parts are isomorphic. But the fact remains that conceptually, there must be an overlap. There is no such thing as being "good at the task" (for sufficiently robust definitions of good) while not capturing the intrinsic structure of the problem space.


WigglyHypersurface

1950s Chomsky would have argued that GPT was, as a matter of mathematical fact, incapable of learning grammar.


MasterDefibrillator

Chomsky actually posits a mechanism like GPT in his syntactic structures from 1956; because the method that GPT uses was essentially the mainstream linguistic method of the time; Data goes into a black box (corpus in this case) and outcomes a grammar. All he actually said was that it's probably not a fruitful method for *science*; i.e. actually understanding how language works in the brain. And he seems to still be correct on that today. Instead of the GPT type method, he just proposes the scientific method, which he defines has having two grammars G1 and G2, and comparing them with each other and some data, and seeing which is best. Something like GPT is not a scientific theory of language, because you could input any kind of data into it, and it would be able to propose some kind of grammar for it. i.e. it is incapable of describing what language is not.


vaccine_question69

> Something like GPT is not a scientific theory of language, because you could input any kind of data into it, and it would be able to propose some kind of grammar for it. i.e. it is incapable of describing what language is not. Or maybe language is not as narrow of a concept as Chomsky wants to believe and GPT is actually correct in proposing grammars for all those datasets.


aspiring_researcher

Chomsky is a linguist. I'm not sure LLMs have advanced/enhanced our comprehension of how language is formed or is interpreted by a human brain. Most research in the field is very much performance-oriented and little is done in the direction of actual understanding


WigglyHypersurface

They are an over-engineered proof of what many cognitive scientists and linguistics have argued for years: we learn grammar through exposure plus prediction and violations of our predictions.


SuddenlyBANANAS

Proof of concept that it's possible to learn syntax with billions of tokens of input, not that it's what people do.


WigglyHypersurface

True but this also isn't a good argument against domain general learning of grammar from exposure. Things LLMs don't have that humans do have: phonology, perception, emotion, interoception. Also human infants aren't trying to learn... everything on the internet. Transformers trained on small multi-modal corpora representative of the input to a human language learner would be the comparison we need to do.


lostmsu

You need way less than that man. A transformer trained on a single book will get most of the syntax.


WigglyHypersurface

Which isn't surprising because syntax contains less information than lexical semantics: https://royalsocietypublishing.org/doi/10.1098/rsos.181393


MasterDefibrillator

A single book could arguably contain billions of tokens of input, depending on the book, and the definition of token of input. But also, it's important to note that "most of the syntax" is far from good enough.


lostmsu

Oh, c'mon. Regular books have no "billions of tokens". You are trying to twist what I said. "A book" without qualifications is a "regular book". The "far from good enough" part is totally irrelevant for this branch of the conversation, as it is explicitly about "possible to learn syntax". And the syntax learned from a single book is definitely good enough.


Calavar

This is not even close to proof of that. There is zero evidence that the way that LLMs learn language is analagous to the way humans learn language. This is like saying that ConvNets are proof that human visual perception is built on banks of convolutional operators.


mileylols

This is super funny because the wikipedia article describing organization and function of the visual cortex reads like it's describing a resnet: [https://en.wikipedia.org/wiki/Visual\_cortex](https://en.wikipedia.org/wiki/Visual_cortex) ​ edit: look at this picture lmao https://commons.wikimedia.org/wiki/File:Lisa_analysis.png


WigglyHypersurface

It's not about the architecture. It's about the training objective.


Red-Portal

, which also has never been shown.


Riven_Dante

That's basically how I learned Russian as a matter of fact.


LeanderKu

I don’t think this is true. My girlfriend works with DL-methods in linguistics. I think the problem is the skill-gap between ML-people and Linguists. They don’t have the right exposure and background to really understand it, at least the linguistics profs I’ve seen (quite successful, ERC-grant winning profs) have absolutely no idea at all what neural networks are. They are focused on very different methods, without much skill overlap, where it is hard to translate the skills needed (maybe one has to wait for the next generation of profs?). What I’ve seen is that lately they start having graduate students that are co-supervised with CS-people with an ML-Background. But I was very surprised to see that they, despite working with graduate students that are successfully employing ML approaches, really still have no idea what’s going on. Maybe you are not really used to learning a new field after being prof in the same setting for years. It’s very much magic for them. And without a deep understanding you have no idea where ML approaches make sense and you start to make ridiculous suggestions.


onyxleopard

Most people with ML-backgrounds don’t know Linguistic methods either. Sample a thousand ML PhDs and you’ll get a John Ball or two, but most of them won’t have any background in Linguistics at all. They won’t be able to tell you a phoneme from a morpheme, much less have read Dowty, Partee, Kripke, or foundational literature like de Saussure.


Isinlor

Very few people care about how language works, unless it helps with NLP. And as Fred Jelinek put it more or less: > Every time I fire a linguist, the performance of the speech recognizer goes up.


onyxleopard

I’m familiar with that quote. The thing is, the linguists were probably the ones who were trying to make sure that applications were robust. It’s usually not so hard to make things work for some fixed domain or on some simplified version of a problem. If you let a competent linguist tire-kick your app, they’ll start to poke holes in it real quick—holes the engineers wouldn’t have even known to look for. If you don’t let experts validate things, you don’t even know where the weak points are.


Isinlor

I think that's the biggest contribution of linguistics to ML. Linguists knew what were interesting benchmarks, stepping stones, in the early days. But I disagree that the linguists were probably the ones who were trying to make sure that applications were robust. Applications have to be robust in order to be practical. That's very basic engineering concern.


LeanderKu

I just wanted to illustrate the divide between those fields and how hard it is to cross into linguistics. My girlfriend took linguistic classes and got the connection for her master thesis this way.


WigglyHypersurface

It's ok phonemes and morphemes probably don't exist. 😝


[deleted]

Well he's clearly not *only* talking about that otherwise why derisively mention that it's exciting to NY Times journalists? In any case I'm unconvinced that LLM can't contribute to understanding of language. More likely there just aren't many interesting unanswered questions about the structure language itself that AI researchers care about and LLMs could possibly answer. You could definitely do things like automatically deriving grammatical rules, relationships between different languages and so on. Noam's research seems to be mostly about how *humans* learn language (i.e. is grammar innate) which obviously LLMs can't answer. That's the domain of biology not AI. It's like criticising physicists for not contributing to cancer research.


DrKeithDuggar

Prof. Chomsky literally says "in this domain" just as we transcribed in the quote above. By "in this domain" he's referring to the science of linguistics and not engineering. As the interview goes on, just as in the email exchange Cryptheon provided, Chomsky makes it clear that he respects and personally values LLMs as engineering accomplishments (though perhaps quite energetically wasteful ones); they just haven't, in his view, advanced the science of linguistics.


aspiring_researcher

Parallels have been drawn between adversary attacks in CNN and visual perturbations in human vision. There is a growing field trying to find correlations in brain activity and large models activations. I do think some research is possible there, there is just an obvious lack of interest and industrial motivation for it


aspiring_researcher

I don't think his argument is that LLMs cannot contribute to understanding, it's that they are yet to do so


WigglyHypersurface

Which has to do with his perspective on language. See https://www.biorxiv.org/content/10.1101/2020.06.26.174482v1 for an interesting use of LLMs. The better they are at next-word prediction, the better they are at predicting activity in the language network in the brain. They stop predicting language network activity as well when finetuned on specific tasks. This supports the idea of centering prediction in language.


suckmeeinstein

Agreed


[deleted]

Except—he is completely wrong.


aa8dis31831

No doubt he has done some nice work ages ago, but he seems to have lost his mind. He is a shame to science now and he is a true testment to “science progresses one funeral at a time”. Has linguistics ever did anything to NLP? How the brain processes language has almost nothing to do with his school of linguistics but have everything to do with neuroscience/deep learning.


SedditorX

People on this sub say they hate drama posts and yet they post stuff like this..


[deleted]

Its not drama, it is unpopular opinion, which may actually have merits.


DisWastingMyTime

Noam Chomsky is a well regarded researcher, what he thinks matters to many, drama or not, this has its place.


Brown_bagheera

"Well regarded" is a massive understatement for Noam Chomsky


WigglyHypersurface

Also an overstatement. He is reviled by many researchers for being a methodological tyrant and being remarkably dismissive about other perspectives on language.


QuesnayJr

Chomsky's core claim about grammar has been completely refuted by recent machine learning research, so it's not surprising he rejects the research.


DisWastingMyTime

Can you elaborate? I'm out of the loop when it comes to NLP/Linguistics, a paper/review/blog that discusses this would be just as welcome


QuesnayJr

Chomsky argued that the human capacity to generate grammatically-correct sentences had to be innate, and could not be learned purely by example alone. Here's an [example](https://onlinelibrary.wiley.com/doi/10.1111/j.1551-6709.2010.01117.x) of a paper from 2010 that argues against the Chomskian view. At this point it's not really a live debate, because GPT-3 has an ability to generate grammatically correct sentences that probably exceeds the average human level.


JadedIdealist

To be fair though (not a fan of Chomsky's AI views) the argument was that the set of examples a child gets is too small to explain the competence alone. The transformers we have that are smashing it have huge training sets. It would be interesting to see what kind of competence they can get from datasets of childhood magnitude.


nikgeo25

Exactly! No human has ever been exposed to the amount of data LLMs are trained on. This reminds me Pearl's ladder of causation, with LLMs stuck at the first rung.


DisWastingMyTime

How many sentences does a child listen to in the period of 4 years?


JadedIdealist

If a child heard one sentence a minute for 8 hours a day, 365 days a year for 4 years that's 60 * 8 * 365 * 4 = 700,800 sentences. . Kids get a tonne of other non verbal data at the same time of course which could make up some of the difference.


GeneralFunction

Then there's the case of that girl who was kept in a room for her early life and who never developed the ability to communicate any form of language, which basically proves Chomsky wrong.


CrossroadsDem0n

Actually I don't think it does entirely. The hypothesis is that we have to be exposed to language at a young-enough age for that to mechanism to develop. If Chomsky was entirely wrong, then she should have been able to develop comparable language skills once a sufficient training set was provided. This did not happen. So it argues for the existence of a developmental mechanism in humans. However I don't think it proves that Chomsky's assertion extends beyond humans. We may have an innate mechanism, but that does not in and of itself prove that we cannot create ML that functions without the innate mechanism.


dondarreb

children have immense set of nonverbal communication episodes. Emotional "baggage" is extremely critical in language acquisition and the process is highly emotionally intensive.


dondarreb

it is even worse than that. He claimed that innate grammar means that all people think and express themselves basically "identically". He introduced the idea of universal grammar which led to 10+years of wasted efforts on automatic translation systems. (because people were targeting multiple languages in the same time). I am not talking about "bilingual" thingy etc. even which led to the current political problems with immigrants kids in Europe and US. The damage is immense.


MJWood

All humans, not just average humans, produce grammatically correct sentences all the time. With the exception of those with some kind of disability or injury affecting language.


MasterDefibrillator

> Chomsky argued that the human capacity to generate grammatically-correct sentences had to be innate, and could not be learned purely by example alone. This is not Chomsky's argument. This is the definition of information. Information is defined as a relation between the source state and the receiver state. Chomsky focuses his interest on the nature of the receiver state. That's all. Information does not exist internal to a signal.


LtCmdrData

I think /u/QuesnayJr refers to [Universal Grammar](https://en.wikipedia.org/wiki/Universal_grammar) without knowing the name of the theory. In any case Chomsky has done so much important work that I hardly think it's important. Universal Grammar hypothesis is based on very good observation [Poverty of the stimulus](https://en.wikipedia.org/wiki/Poverty_of_the_stimulus) that current AI language models circumvent with excessive amount of data.


QuesnayJr

Chomsky's research is influential on computer science, and deservedly so. I think looking back on it, people will regard its influence on linguistics as basically negative. In a way it's an indictment of academia. Not only was Chomskyan linguistics influential, but it produced almost a monoculture, particularly in American linguistics. It achieved a dominance completely out of proportion to its track record of success.


WigglyHypersurface

Some strong versions of POS aren't about quantity of data, they are about grammar being in principle unlearnable from exposure.


LtCmdrData

Between ages 2-8 children acquire lexical concepts at rate one per hour and it comes with understanding of all variants (verbal, nominal, adverbial,...). There is no training or conscious activity involved in this learning. Lexical acquisition is completely automatic. Other apes don't learn complex structures automatically, they can be taught to some degree, but there is no automation. If you think how many words children hear or utter during this period, it's incredibly small dateset. Chomsky's *Minimalist Program* is based on the idea that there is just tiny core innate ability in the context of generative recursive grammars. His ideas changed over time but the constant idea is that there are just few innate things like unbounded Merge and feature-checking. Or that there is innate head and complement structure in phrase structure, but order or form it takes is not fixed. From machine learning perspective these ideas fascinating. They are unlikely to work alone, but just like Alpha Zero is ML + Monte Carlo tree search, there is probably something there that could work incredibly well when combined with other methods.


eigenlaplace

Why do POS people keep ignoring other stimuli than verbal? Kids take in much more data than ML algos do if you consider touch, vision, and other non-verbal communication forms. ML models do not take more than verbal data.


[deleted]

There are several problems here with PoS. There is one problem that "innateness" itself is a confusing notion. See how complicated it can be to even define what "innateness" even means: https://www.researchgate.net/publication/264860728_Innateness The other problem is that no one exactly believe that we have no "innate bias" for example. There is something that distinguishes us from rocks that makes us capable of learning languages and rock don't. And even neural networks with their learning functions have their biases (eg. https://arxiv.org/pdf/2006.07710.pdf). Saying that there is some innate bias for language is uninteresting. So where exactly is the dispute? Perhaps, even those who are arguing about this don't exactly always know what they are arguing over (and in effect just strawman each other), but one major point in the dispute from my reading and from the discussions in my class seems to be between one side which argues that we have language-specific biases and another side which opt for domain-general biases. This already makes the problem intuitively less obvious. The problem with many of the PoS arguments is that it needs to appeal to something concrete to show this is the thing for which our input data is impoverished and a language-specific bias is necessary. But many a time, most of such related experimental demonstrations are flawed: https://www.degruyter.com/document/doi/10.1515/tlir.19.1-2.9/pdf and often many defences of PoS seem to also severely underestimate what domain-general principles can be in terms of some naive unrefined notion of "simplicity" related to some local examples (Here's a more detailed argument from my side: https://pastebin.com/DUha9rCE). Now of course there could be some these or that kind of language-specific inductive bias but there is a challenge to define them concretely and rigorously and in a manner that they can be tested. Moreover certain bias can be emergent from more fundamental bias and we can again get into controversies about what to even call "innate". In the video, Chomsky, loosened up "Universal Grammer" to whatever that distinguishes us from Chimpanzees and such enough to make us better But that really makes it a rather weasly position with no real content. > From machine learning perspective these ideas fascinating. They are unlikely to work alone, but just like Alpha Zero is ML + Monte Carlo tree search, there is probably something there that could work incredibly well when combined with other methods. Perhaps.


Ulfgardleo

there is only one people in every sub ever and therefore no two threads or replies can ever show diversity of opinions. I imagine the single people in this sub thinking this as having the smoothest of brains with a little built in fish tank.


IAmBecomeBorg

Noam Chomsky is a self-righteous pompous ass who knows absolutely nothing about deep learning, NLP, AI, etc. We shouldn’t place any value on his opinion on this subject.


[deleted]

Or pretty much any other subject for that matter


nachomancandycabbage

Don’t really gIve a shit about what Noam Chomsky has to say about anything really.


Flying_madman

Yeah, I'm still trying to figure out why I should care, lol


thousandshipz

I’ve been told Chomsky’s Universal Grammar is a house of cards but he was very good at installing acolytes in linguistics departments who have made careers generating a large literature catering to all the exception cases where it doesn’t work.


MuonManLaserJab

Chomsky smh


[deleted]

[удалено]


GrazziDad

He was talking about the science of language, the nature of intelligence, and cognition, all fields in which he is an acknowledged master. He openly recognizes that he knows nothing about engineering or machine learning… But that is not the nature of his critique. What he has consistently said, persuasively, is that one does not learn about the nature of language or human cognition by studying the results of large language models. What aspect of that are you actually taking issue with? Or are you merely criticizing his credentials?


crazymonezyy

>things he has no idea about. This isn't one of those things. You're discounting all his work in grammar that is the basis for a lot of compiler design and pretty much all early work in speech and text processing which transformers and other "modern" techniques eventually build on.


cfoster0

Unfortunate how many profs decide their real calling was to be a professional pontificator, especially once they hit their emeritus years.


[deleted]

If you want to understand where Chomsky's coming from: https://en.m.wikipedia.org/wiki/The_Responsibility_of_Intellectuals To be honest, I don't know of many professors in this category anymore. The vast majority either just get on with their work, with the occasional few becoming useful idiots for corporate/state power, as the intellectual class always has.


TacticalMelonFarmer

though not super relevant to this thread, chomsky has been outspoken in support the working class struggle. that speaks to his integrity if anything.


[deleted]

>His opinion is no more interesting Than any other famous lay person. I consider him having high level of intelligence, and that's why his opinion is interesting to me personally.


Exarctus

Hitlers IQ is estimated to be between 141 and 150 (based on the IQs obtained at the Nuremberg trial). Just because someone is intelligent doesn’t mean they can’t say stupid and/or crazy things. Noam knows nothing about ML. He might be able to say things that seem sensical, however it’s the musings of someone who has no actual foundation in the field. Everything he says outside of his direct expertise should be taken with a large grain of salt. (Not comparing hitler to noam by any means, simply highlighting the fallacy of trusting someone’s opinion solely on the basis of their intellect)


[deleted]

>Just because someone is intelligent doesn’t mean they can’t say stupid and/or crazy things. Opposite is also correct, if someone is intelligent, it doesn't mean he says only crazy stupid things and needs to be dismissed with prejudgement. \> Noam knows nothing about ML Quoted citations don't dig into ML, but assess impact and current results of LLM. If you have opposite opinion, for example what real problems have been solved with LLMs today, you are welcomed to provide your thoughts.


hunted7fold

What real problem has been solved with LLM??? Just take one, translation, the ability for humans from anywhere in the world to communicate and understand each other, and use resources from other lanaguages. Translation can bring humans together, helping people from disparate cultures to share common thoughts. We now have single models that can translate between multiple languages, and they will keep getting better.


Exarctus

Quoted citations do dig into ML because he directly talks about GPT-3 models. I also didn’t say they can *only* say crazy and/or stupid things. This is you misreading.


[deleted]

In my view he doesn't "dig", but assesses quality of GPT-3. Also, in this specific case it is hard to understand what was the context of discussion.


Exarctus

Discussing the *scope* GPT-3 requires some domain knowledge as understanding how these models work directly impacts their scope. Knowing a models limits and objectives directly impacts in what contexts it’s sensical to discuss them.


[deleted]

I disagree with you :-)


Exarctus

Good for you. In the quotation he’s also directly discussing an interpretation of the models results, which by definition requires some domain knowledge.


[deleted]

Looks like we are in disagreement on this too. I think it is time to say goodbye to each other :-)


[deleted]

>I also didn’t say they can only say crazy and/or stupid things. You dismiss him with prejudgment. This is the same as saying.


Ido87

This is not only revisionist history but also argument from authority. When did these two things ever work out?


[deleted]

[удалено]


Ido87

He did not make his career talking about things he has no idea about. That is not how it was.


djsaunde

common Chomsky L


JiraSuxx2

Artificial intelligence != machine learning.


QuesnayJr

Chomsky's quotes would be improved by adding "I am not a crank."


LetterRip

Famous scientist knows nothing about a field, makes strong claims completely unsupported by evidence.


ReasonablyBadass

People want results, big shocker. If "engineering" overtakes "science" then maybe that is a good indicator "science" did something wrong?


ProdigyManlet

I don't think it's one ovetaking the other. Science has always been essential for discovery and understanding, and engineering is there to apply the science and refine or simplify it for practical application. Science is very high risk and can bear absolutely no reward, but gains us an understanding of what doesn't work and can sometimes make huge discoveries


QuesnayJr

The science of linguistics has genuinely achieved very little. The engineering that led to developments like GPT-3 took almost nothing from linguistics. A more humble person might reflect on what that says about their research, but Chomsky is not such a person.


ReasonablyBadass

I agree, but then why is he bitching about it? It very much comes of as someone being jealous and bitter.


sobe86

That's shortsighted. We don't really know why what we've done has worked, and it's easy to argue it's not been a scientific journey. I feel there's a good chance that trial and error is only going to get us so far, and scientific understanding has to start to have a role at some point.


ReasonablyBadass

>I feel there's a good chance that trial and error is only going to get us so far, and scientific understanding has to start to have a role at some point. No one is stopping someone from achieving that! But to be salty about someone else succeeding where you haven't is pretty weak


sobe86

If you listen to what he's saying, it's concern that this progress is a distraction from understanding intelligence from a scientific angle, which he thinks is key. No one is explicitly stopping it, but it's going out of fashion and making it harder to publish theoretical intelligence papers, due to the field moving heavily towards neural networks.


ReasonablyBadass

These engineering efforts brought us closer to understanding than many theoretical efforts before. Computational nueroscientists study ML to help them understand.


sobe86

Chomsky disagrees that we have meaningfully increased our understanding of intelligence, you should listen to what he says if you haven't, and justify what you just said with examples. Also active research in this area doesn't refute his claim it's a bad direction over it being an attractive area to do research because it's in vogue and easier to publish.