ChatGPT在做什么...为什么它会有效?(十一)

What Really Lets ChatGPT Work?

什么让ChatGPT真正起作用?

Human language—and the processes of thinking involved in generating it—have always seemed to represent a kind of pinnacle of complexity. And indeed it’s seemed somewhat remarkable that human brains—with their network of a “mere” 100 billion or so neurons (and maybe 100 trillion connections) could be responsible for it. Perhaps, one might have imagined, there’s something more to brains than their networks of neurons—like some new layer of undiscovered physics. But now with ChatGPT we’ve got an important new piece of information: we know that a pure, artificial neural network with about as many connections as brains have neurons is capable of doing a surprisingly good job of generating human language.

人类的语言--以及产生语言的思维过程--一直代表着一种复杂性的巅峰。事实上,人类的大脑中“仅仅”有大约1000亿个神经元(和大约100万亿个连接)的网络对此负责,这似乎是非常了不起的事。或许,人们可能会想象,大脑除了神经网络之外还有其他未被发现的新物理层面的东西。但是现在通过ChatGPT,我们得到了一个重要的新信息:我们知道了,一个具有与大脑神经元数量差不多的纯人工神经网络,可以在生成人类语言方面做的出乎预料的好。

And, yes, that’s still a big and complicated system—with about as many neural net weights as there are words of text currently available out there in the world. But at some level it still seems difficult to believe that all the richness of language and the things it can talk about can be encapsulated in such a finite system. Part of what’s going on is no doubt a reflection of the ubiquitous phenomenon (that first became evident in the example of rule 30) that computational processes can in effect greatly amplify the apparent complexity of systems even when their underlying rules are simple. But, actually, as we discussed above, neural nets of the kind used in ChatGPT tend to be specifically constructed to restrict the effect of this phenomenon—and the computational irreducibility associated with it—in the interest of making their training more accessible.

而且,是的,它仍然是一个庞大而复杂的系统,其神经网络的权重与目前世界上可用的文字的数量相当。但在某种程度上,人们仍然很难相信,语言的所有丰富性以及它所能谈论的事情都可以包含在这样一个有限的系统中。其中一部分原因无疑是普遍现象的反映(这在规则30的例子中首次显现),即计算过程实际上可以极大地放大系统的表观复杂性,即使其基础规则很简单。但是,正如我们在之前讨论的那样,ChatGPT中使用的神经网络通常是特别构建的,以限制这种现象的影响以及与之相关的计算不可约化,以使其训练更容易进行。

So how is it, then, that something like ChatGPT can get as far as it does with language? The basic answer, I think, is that language is at a fundamental level somehow simpler than it seems. And this means that ChatGPT—even with its ultimately straightforward neural net structure—is successfully able to “capture the essence” of human language and the thinking behind it. And moreover, in its training, ChatGPT has somehow “implicitly discovered” whatever regularities in language (and thinking) make this possible.

那么,像ChatGPT这样的系统如何能够在语言方面做到这一点呢?我认为,基本的答案是,语言在基本层面上比它看起来要简单得多。这意味着即使是像ChatGPT这样最终结构简单的神经网络结构,也能够成功地“捕捉”人类语言及其背后的思维本质。而且,在其训练中,ChatGPT已经“隐含地发现”了使之成为可能的语言(和思维)规律。

The success of ChatGPT is, I think, giving us evidence of a fundamental and important piece of science: it’s suggesting that we can expect there to be major new “laws of language”—and effectively “laws of thought”—out there to discover. In ChatGPT—built as it is as a neural net—those laws are at best implicit. But if we could somehow make the laws explicit, there’s the potential to do the kinds of things ChatGPT does in vastly more direct, efficient—and transparent—ways.

我认为,ChatGPT的成功为我们提供了一个基本且重要的科学证据:它表明我们可以期望有重大的“语言法则”,以及有效的 "思维法则",等待我们去发现。在ChatGPT中,作为神经网络构建的这些规律最多是隐含的。但是,如果我们能够以某种方式明确这些规律,那么就有可能以更直接、高效和透明的方式完成ChatGPT的工作。

But, OK, so what might these laws be like? Ultimately they must give us some kind of prescription for how language—and the things we say with it—are put together. Later we’ll discuss how “looking inside ChatGPT” may be able to give us some hints about this, and how what we know from building computational language suggests a path forward. But first let’s discuss two long-known examples of what amount to “laws of language”—and how they relate to the operation of ChatGPT.

但是,好吧,那么这些规律可能长什么样呢?归根结底,它们必须为我们提供一些有关语言以及我们用它表达的事物是如何组合的指导。稍后我们将讨论“观察ChatGPT内部”如何能够给我们一些线索,以及从构建计算语言方面所知道的信息,如何为我们提供前进的道路。但首先让我们讨论两个众所周知的类似于“语言规律”的例子,以及它们与ChatGPT的运行方式的关系。

The first is the syntax of language. Language is not just a random jumble of words. Instead, there are (fairly) definite grammatical rules for how words of different kinds can be put together: in English, for example, nouns can be preceded by adjectives and followed by verbs, but typically two nouns can’t be right next to each other. Such grammatical structure can (at least approximately) be captured by a set of rules that define how what amount to “parse trees” can be put together:

第一个是语言的语法。语言不是一个随机的词语拼凑。相反,有着相对明确的语法规则,规定不同种类的单词如何组合:例如,在英语中,名词可以由形容词修饰,并跟随动词,但通常两个名词不能直接并列在一起。这种语法结构可以(至少近似地)通过一组规则来捕捉,这些规则定义了可以如何组合成“解析树”的内容:

ChatGPT doesn’t have any explicit “knowledge” of such rules. But somehow in its training it implicitly “discovers” them—and then seems to be good at following them. So how does this work? At a “big picture” level it’s not clear. But to get some insight it’s perhaps instructive to look at a much simpler example.

ChatGPT 并没有任何明确的“知识”来描述这些规则。但在其训练中,它隐含地 "发现"了这些规则,并且似乎很擅长遵循它们。那么,这是如何做到的呢?在“大局”层面上还不是很清楚。但为了得到一些见解,也许看一个更简单的例子会有所启示。

Consider a “language” formed from sequences of (’s and )’s, with a grammar that specifies that parentheses should always be balanced, as represented by a parse tree like:

考虑一个由 ( 和 ) 序列组成的“语言”,其语法规定括号应该始终保持平衡,如下所示的解析树所表示的那样:

PS:你可以想象将一段话,拆分成由很多小括号所包含的样子,类似于Java中的大括号。这里的重点是,每有一个(,就一定会有一个)与之匹配。

Can we train a neural net to produce “grammatically correct” parenthesis sequences? There are various ways to handle sequences in neural nets, but let’s use transformer nets, as ChatGPT does. And given a simple transformer net, we can start feeding it grammatically correct parenthesis sequences as training examples. A subtlety (which actually also appears in ChatGPT’s generation of human language) is that in addition to our “content tokens” (here “(” and “)”) we have to include an “End” token, that’s generated to indicate that the output shouldn’t continue any further (i.e. for ChatGPT, that one’s reached the “end of the story”).

我们能训练一个神经网络生成“符合语法”的小括号序列吗?在神经网络中处理序列有各种不同的方法,但让我们使用transformer网络,就像ChatGPT所做的那样。给定一个简单的transformer网络,我们可以开始将符合语法的括号序列作为训练样本输入。一个微妙之处(实际上也出现在ChatGPT生成人类语言时)是,除了我们的“内容token”(这里是“(”和“)”),我们还必须包含一个“End”token,用于表示输出不应继续进行(即对于ChatGPT,已经到达了“故事的结尾”)。

If we set up a transformer net with just one attention block with 8 heads and feature vectors of length 128 (ChatGPT also uses feature vectors of length 128, but has 96 attention blocks, each with 96 heads) then it doesn’t seem possible to get it to learn much about parenthesis language. But with 2 attention blocks, the learning process seems to converge—at least after 10 million or so examples have been given (and, as is common with transformer nets, showing yet more examples just seems to degrade its performance).

如果我们建立一个transformer网络,只用一个有8个头的注意力块和长度为128的特征向量(ChatGPT也使用128长度的特征向量,但有96个注意力块,每个注意力块有96个头),则似乎不可能使其对小括号语言有很好的学习效果。但是如果有2个注意力块,学习过程似乎会收敛——至少在给出了大约1000万个例子之后(和transformer网络常见的情况一样,展示更多的例子似乎只会降低其性能)。

So with this network, we can do the analog of what ChatGPT does, and ask for probabilities for what the next token should be—in a parenthesis sequence:

因此,有了这个网络,我们可以做与ChatGPT类似的工作,并询问在括号序列中下一个token的概率:

And in the first case, the network is “pretty sure” that the sequence can’t end here—which is good, because if it did, the parentheses would be left unbalanced. In the second case, however, it “correctly recognizes” that the sequence can end here, though it also “points out” that it’s possible to “start again”, putting down a “(”, presumably with a “)” to follow. But, oops, even with its 400,000 or so laboriously trained weights, it says there’s a 15% probability to have “)” as the next token—which isn’t right, because that would necessarily lead to an unbalanced parenthesis.

在第一种情况下,该网络“非常确信”序列不能在此处结束——这很好,因为如果这样做,括号将会不平衡。但是,在第二个案例中,它“正确地识别”序列可以在此处结束,尽管它还“指出”可以“重新开始”,放下一个“(” ,估计后面还有一个")"。但糟糕的是,即使它有经过了400,000次左右艰苦训练的权重,也有15%的概率认为下一个标记是“)”,这是不正确的,因为这必然会导致括号不平衡。

Here’s what we get if we ask the network for the highest-probability completions for progressively longer sequences of (’s:

下面是我们得到的结果,如果我们要求网络给出逐渐变长的( 序列提供最高概率的完成情况:

And, yes, up to a certain length the network does just fine. But then it starts failing. It’s a pretty typical kind of thing to see in a “precise” situation like this with a neural net (or with machine learning in general). Cases that a human “can solve in a glance” the neural net can solve too. But cases that require doing something “more algorithmic” (e.g. explicitly counting parentheses to see if they’re closed) the neural net tends to somehow be “too computationally shallow” to reliably do. (By the way, even the full current ChatGPT has a hard time correctly matching parentheses in long sequences.)

这个网络在一定长度范围内表现得不错。但是当序列变得更长时,它就开始失败了。这是一种神经网络(或机器学习)在这样“精确”的情况下出现的很典型的现象。人类 "一眼就能解决"的问题,神经网络也能够解决。但是对于那些需要进行“更算法化”的操作(例如明确地计算括号是否关闭),神经网络往往由于“计算不够深入”而无法可靠地解决。(顺便说一下,即使是目前的完整ChatGPT,在长序列中正确匹配括号也很困难。)

So what does this mean for things like ChatGPT and the syntax of a language like English? The parenthesis language is “austere”—and much more of an “algorithmic story”. But in English it’s much more realistic to be able to “guess” what’s grammatically going to fit on the basis of local choices of words and other hints. And, yes, the neural net is much better at this—even though perhaps it might miss some “formally correct” case that, well, humans might miss as well. But the main point is that the fact that there’s an overall syntactic structure to the language—with all the regularity that implies—in a sense limits “how much” the neural net has to learn. And a key “natural-science-like” observation is that the transformer architecture of neural nets like the one in ChatGPT seems to successfully be able to learn the kind of nested-tree-like syntactic structure that seems to exist (at least in some approximation) in all human languages.

那么,这对像ChatGPT和英语这样的语言的语法意味着什么呢?小括号语言是“朴素的”的,更像是一种“算法的故事”。但在英语中,能够在局部选词和其他提示的基础上 "猜测 "什么是符合语法的要现实得多。而且,是的,神经网络在这方面做得更好——尽管它可能会错过一些“形式上正确”的情况,这也可能会被人类忽略。但主要的一点是,语言中存在一种总体的句法结构——以及它所暗示的所有规律性——从某种意义上限制了神经网络需要学习的“程度”。而一个关键的“类似自然科学”的观察结果是,像ChatGPT中的神经网络的transformer架构,似乎成功地学习了类似嵌套树状的句法结构,这种结构似乎在所有人类语言中都存在(至少在某种近似情况下)。

Syntax provides one kind of constraint on language. But there are clearly more. A sentence like “Inquisitive electrons eat blue theories for fish” is grammatically correct but isn’t something one would normally expect to say, and wouldn’t be considered a success if ChatGPT generated it—because, well, with the normal meanings for the words in it, it’s basically meaningless.

语法为语言提供了一种限制。但语言中显然还有更多东西。像“Inquisitive electrons eat blue theories for fish”这样的句子在语法上是正确的,但不是人们通常期望说的话,如果ChatGPT生成了这样的句子,不会被认为是成功的,因为将这些单词组合在一起基本上毫无意义。

But is there a general way to tell if a sentence is meaningful? There’s no traditional overall theory for that. But it’s something that one can think of ChatGPT as having implicitly “developed a theory for” after being trained with billions of (presumably meaningful) sentences from the web, etc.

但是,有没有一种通用的方法来判断一个句子是否有意义呢?传统上没有一个总体理论可以回答这个问题。但是,我们可以认为ChatGPT在经过了来自Web等数十亿个(可能有意义的)句子的训练之后,已经隐含地 "发展出了一套理论"。

What might this theory be like? Well, there’s one tiny corner that’s basically been known for two millennia, and that’s logic. And certainly in the syllogistic form in which Aristotle discovered it, logic is basically a way of saying that sentences that follow certain patterns are reasonable, while others are not. Thus, for example, it’s reasonable to say “All X are Y. This is not Y, so it’s not an X” (as in “All fishes are blue. This is not blue, so it’s not a fish.”). And just as one can somewhat whimsically imagine that Aristotle discovered syllogistic logic by going (“machine-learning-style”) through lots of examples of rhetoric, so too one can imagine that in the training of ChatGPT it will have been able to “discover syllogistic logic” by looking at lots of text on the web, etc. (And, yes, while one can therefore expect ChatGPT to produce text that contains “correct inferences” based on things like syllogistic logic, it’s a quite different story when it comes to more sophisticated formal logic—and I think one can expect it to fail here for the same kind of reasons it fails in parenthesis matching.)

这个理论会是什么样子呢?好吧,有一个小小的角落,基本上两千年来一直为人所知,这就是逻辑。当然,在亚里士多德发现的三段论形式中,逻辑学基本上是一种说法,即遵循某些模式的语句是合理的,而其他语句则不是。因此,例如,说“所有的X都是Y。这不是Y,所以它不是X”是合理的(如“所有的鱼都是蓝色的。这不是蓝色的,所以它不是一条鱼。”)。就像人们可以有点异想天开地想象,亚里士多德通过("机器学习式")大量修辞的例子发现了三段论逻辑一样,人们也可以想象在ChatGPT的训练中,它能够通过查看网络上的大量文本等来“发现三段论逻辑”(是的,虽然人们可以预期ChatGPT将产生包含"正确推论"的文本,基于类似三段论逻辑,但当涉及到更复杂的形式逻辑时,情况就完全不同了。我认为人们可以期待它在这里失败的原因,与小括号匹配失败的原因相同)。

But beyond the narrow example of logic, what can be said about how to systematically construct (or recognize) even plausibly meaningful text? Yes, there are things like Mad Libs that use very specific “phrasal templates”. But somehow ChatGPT implicitly has a much more general way to do it. And perhaps there’s nothing to be said about how it can be done beyond “somehow it happens when you have 175 billion neural net weights”. But I strongly suspect that there’s a much simpler and stronger story.

但除了逻辑这个狭隘的例子之外,有没有关于如何系统地构建(或识别)具有合理含义的文本的方法?是的,有一些东西,比如 Mad Libs,使用非常具体的“短语模板”。但一些情况下,ChatGPT隐含地拥有一个更普遍的方法。也许除了“当你拥有1750亿个神经网络权重时,它就会以某种方式发生”之外,没有其他可以说的。但我强烈怀疑还有一个更简单、更强大的故事。


这篇文章是我在网上看见的一篇关于ChatGPT工作原理的分析。作者由浅入深的解释了ChatGPT是如何运行的,整个过程并没有深入到具体的模型算法实现,适合非机器学习的开发人员阅读学习。

作者Stephen Wolfram,业界知名的科学家、企业家和计算机科学家。Wolfram Research 公司的创始人、CEO,该公司开发了许多计算机程序和技术,包括 Mathematica 和 Wolfram Alpha 搜索引擎。

本文先使用ChatGPT翻译,再由我进行二次修改,红字部分是我额外添加的说明。由于原文很长,我将原文按章节拆分成多篇文章。想要看原文的朋友可以点击下方的原文链接。

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

如果你想亲自尝试一下ChatGPT,可以访问下图小程序,我们已经在小程序中对接了ChatGPT3.5-turbo接口用于测试。

目前通过接口使用ChatGPT与直接访问ChatGPT官网相比,在使用的稳定性和回答质量上还有所差距。特别是接口受到tokens长度限制,无法进行多次的连续对话。

如果你希望能直接访问官网应用,欢迎扫描下图中的二维码进行咨询,和我们一起体验ChatGPT在学习和工作中所带来的巨大改变。

0条留言

留言