ChatGPT在做什么...为什么它会有效?(十二)

Meaning Space and Semantic Laws of Motion

意义空间和语义运动定律

We discussed above that inside ChatGPT any piece of text is effectively represented by an array of numbers that we can think of as coordinates of a point in some kind of “linguistic feature space”. So when ChatGPT continues a piece of text this corresponds to tracing out a trajectory in linguistic feature space. But now we can ask what makes this trajectory correspond to text we consider meaningful. And might there perhaps be some kind of “semantic laws of motion” that define—or at least constrain—how points in linguistic feature space can move around while preserving “meaningfulness”?

我们之前讨论过,在ChatGPT中,任何一段文本都可以被有效地表示为一个数字数组,我们可以把它看作是某种“语言特征空间”中的点的坐标。因此,当ChatGPT延续一段文本时,这相当于在语言特征空间中描绘出一条轨迹。但现在我们可以问一下,是什么让这条轨迹对应于我们认为有意义的文本?也许存在某种“语义运动定律”,定义或者至少约束了语言特征空间中的点如何移动以保持“意义”不变?

So what is this linguistic feature space like? Here’s an example of how single words (here, common nouns) might get laid out if we project such a feature space down to 2D:

那么,这个语言特征空间是什么样子的呢?如果我们将这样一个特征空间投影到二维空间,以下是单词(这里是普通名词)的布局示例:

We saw another example above based on words representing plants and animals. But the point in both cases is that “semantically similar words” are placed nearby.

我们在前面看到了另一个以代表植物和动物的单词为基础的例子。但这两种情况的共同点是,“语义上相似的单词”被放在附近。

As another example, here’s how words corresponding to different parts of speech get laid out:

作为另一个例子,以下是代表不同词性的单词如何排列:

Of course, a given word doesn’t in general just have “one meaning” (or necessarily correspond to just one part of speech). And by looking at how sentences containing a word lay out in feature space, one can often “tease apart” different meanings—as in the example here for the word “crane” (bird or machine?):

当然,一个单词通常不仅仅有“一个含义”(或不仅仅对应一个词性)。通过观察包含一个单词的句子在特征空间中的排布,我们通常可以“区分出”不同的含义,就像下图文本中的“crane”(是指鸟还是机器?):

OK, so it’s at least plausible that we can think of this feature space as placing “words nearby in meaning” close in this space. But what kind of additional structure can we identify in this space? Is there for example some kind of notion of “parallel transport” that would reflect “flatness” in the space? One way to get a handle on that is to look at analogies:

好的,所以我们至少可以认为这个特征空间是将“意思相近的单词”放在这个空间中相近的地方。但是我们能够在这个空间中识别出什么样的额外结构呢?例如,是否存在某种“平移”的概念,反映了空间的“平坦性”?掌握这个问题的一个方法是看一下类比:

And, yes, even when we project down to 2D, there’s often at least a “hint of flatness”, though it’s certainly not universally seen.

即使我们将维度降低到二维,有时仍然可以看出“平坦”的迹象,尽管这并不普遍。

So what about trajectories? We can look at the trajectory that a prompt for ChatGPT follows in feature space—and then we can see how ChatGPT continues that:

那么轨迹呢?我们可以观察 ChatGPT 的提示在特征空间中遵循的轨迹,然后看看 ChatGPT 如何延续该轨迹:

There’s certainly no “geometrically obvious” law of motion here. And that’s not at all surprising; we fully expect this to be a considerably more complicated story. And, for example, it’s far from obvious that even if there is a “semantic law of motion” to be found, what kind of embedding (or, in effect, what “variables”) it’ll most naturally be stated in.

这里肯定没有 "几何上明显" 的运动规律,这一点也不奇怪;我们完全希望这是一个相当复杂的故事。例如,即使存在“语义运动定律”,也很难确定它最自然的嵌入(或实际上是哪些“变量”)。

In the picture above, we’re showing several steps in the “trajectory”—where at each step we’re picking the word that ChatGPT considers the most probable (the “zero temperature” case). But we can also ask what words can “come next” with what probabilities at a given point:

在上图中,我们展示了“轨迹”中的几个步骤,其中在每个步骤中,我们选择ChatGPT认为最有可能的单词(“零温度”情况)。但我们还可以询问,在给定点上,下一个可能的单词以及其概率是什么:

And what we see in this case is that there’s a “fan” of high-probability words that seems to go in a more or less definite direction in feature space. What happens if we go further? Here are the successive “fans” that appear as we “move along” the trajectory:

在这种情况下,我们可以看到高概率单词形成了一个“扇形”,在特征空间中似乎沿着一个或多个明确的方向移动。如果我们继续前进,会发生什么呢?以下是沿着轨迹前进时出现的连续“扇形”:

Here’s a 3D representation, going for a total of 40 steps:

以下是一个40步的三维表示:

And, yes, this seems like a mess—and doesn’t do anything to particularly encourage the idea that one can expect to identify “mathematical-physics-like” “semantic laws of motion” by empirically studying “what ChatGPT is doing inside”. But perhaps we’re just looking at the “wrong variables” (or wrong coordinate system) and if only we looked at the right one, we’d immediately see that ChatGPT is doing something “mathematical-physics-simple” like following geodesics. But as of now, we’re not ready to “empirically decode” from its “internal behavior” what ChatGPT has “discovered” about how human language is “put together”.

在这里我们看到的是一团乱麻,对于人们期望通过经验性地研究 "ChatGPT在里面做什么" 来确定 "类似数学物理学的" “语义运动定律"的想法没有任何鼓励作用。但也许我们只是在看“错误的变量”(或错误的坐标系),如果我们找到了正确的变量,我们就会立即看到ChatGPT正在做一些 "数学-物理学的简单 "的事情,如沿着测地线运动的行为。但是到目前为止,我们还没有准备好从它的“内部行为”中“实证解码”出ChatGPT已经“发现”的有关人类语言“如何组合”的知识。


这篇文章是我在网上看见的一篇关于ChatGPT工作原理的分析。作者由浅入深的解释了ChatGPT是如何运行的,整个过程并没有深入到具体的模型算法实现,适合非机器学习的开发人员阅读学习。

作者Stephen Wolfram,业界知名的科学家、企业家和计算机科学家。Wolfram Research 公司的创始人、CEO,该公司开发了许多计算机程序和技术,包括 Mathematica 和 Wolfram Alpha 搜索引擎。

本文先使用ChatGPT翻译,再由我进行二次修改,红字部分是我额外添加的说明。由于原文很长,我将原文按章节拆分成多篇文章。想要看原文的朋友可以点击下方的原文链接。

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/

如果你想亲自尝试一下ChatGPT,可以访问下图小程序,我们已经在小程序中对接了ChatGPT3.5-turbo接口用于测试。

目前通过接口使用ChatGPT与直接访问ChatGPT官网相比,在使用的稳定性和回答质量上还有所差距。特别是接口受到tokens长度限制,无法进行多次的连续对话。

如果你希望能直接访问官网应用,欢迎扫描下图中的二维码进行咨询,和我们一起体验ChatGPT在学习和工作中所带来的巨大改变。

0条留言

留言