ChatGPT在做什么...为什么它会有效?(七)
“Surely a Network That’s Big Enough Can Do Anything!”
一个足够大的网络可以做任何事情!
The capabilities of something like ChatGPT seem so impressive that one might imagine that if one could just “keep going” and train larger and larger neural networks, then they’d eventually be able to “do everything”. And if one’s concerned with things that are readily accessible to immediate human thinking, it’s quite possible that this is the case. But the lesson of the past several hundred years of science is that there are things that can be figured out by formal processes, but aren’t readily accessible to immediate human thinking.
类似ChatGPT这样的模型所展示出来的能力似乎令人印象深刻,以至于人们可能会想象,如果不断地训练更大的神经网络,那么它们最终将能“做任何事情”。如果一个人关心的是人类可以直接思考的事物,那么这可能是正确的。但是,过去数百年的科学经验告诉我们,有些事情可以通过形式化的过程来解决,但人类的直接思维却不容易理解。
Nontrivial mathematics is one big example. But the general case is really computation. And ultimately the issue is the phenomenon of computational irreducibility. There are some computations which one might think would take many steps to do, but which can in fact be “reduced” to something quite immediate. But the discovery of computational irreducibility implies that this doesn’t always work. And instead there are processes—probably like the one below—where to work out what happens inevitably requires essentially tracing each computational step:
非平凡数学就是一个重要的例子。但一般的情况实际上是计算。最终的问题是计算不可约性的现象。有些计算人们可能认为需要很多步骤才能完成,但实际上可以“简化”为非常直接的事情。但是计算不可约性的发现表明,这并不总是行得通。相反,可能像下面的过程一样,必须实质性地追踪每个计算步骤才能弄清发生了什么:

The kinds of things that we normally do with our brains are presumably specifically chosen to avoid computational irreducibility. It takes special effort to do math in one’s brain. And it’s in practice largely impossible to “think through” the steps in the operation of any nontrivial program just in one’s brain.
我们通常使用大脑做的事情,可能是为了避免计算不可约性而特别选择的。在大脑中进行数学运算需要特殊的努力。而在实践中,要在大脑中“思考”非平凡程序的每一步操作几乎是不可能的。
PS:我自己对上面内容的理解是,可以将大脑完成的任务分为两类:一类是可以凭借“直觉”在无意识中完成的任务,例如看东西,说话,骑自行车等等,这类是大脑擅长的计算可约性任务。另一类是必须有意识参与,并且按照一定步骤依次解决的任务,例如做数学题,这类是大脑不擅长的计算不可约性任务。
But of course for that we have computers. And with computers we can readily do long, computationally irreducible things. And the key point is that there’s in general no shortcut for these.
当然,我们有计算机来解决这个问题。有了计算机,我们可以轻松地完成长时间的、计算上不可简化的任务。而关键点是,在一般情况下,这些任务没有捷径可走。
Yes, we could memorize lots of specific examples of what happens in some particular computational system. And maybe we could even see some (“computationally reducible”) patterns that would allow us to do a little generalization. But the point is that computational irreducibility means that we can never guarantee that the unexpected won’t happen—and it’s only by explicitly doing the computation that you can tell what actually happens in any particular case.
是的,我们可以记住某个特定计算系统中发生的许多具体例子。或许我们甚至能够发现一些(“可计算化简的”)模式,从而做一些概括。但关键在于,计算不可约性意味着我们永远无法保证意外不会发生——只有明确地进行计算,你才能知道在任何特定情况下实际会发生什么。
PS:使用归纳法总结的规律,永远无法保证黑天鹅不会出现。
And in the end there’s just a fundamental tension between learnability and computational irreducibility. Learning involves in effect compressing data by leveraging regularities. But computational irreducibility implies that ultimately there’s a limit to what regularities there may be.a
最终,学习和计算不可约性之间存在一种根本的张力。学习实际上通过利用规律来压缩数据。但是计算不可约性意味着最终规律的适用范围是有限的。
As a practical matter, one can imagine building little computational devices—like cellular automata or Turing machines—into trainable systems like neural nets. And indeed such devices can serve as good “tools” for the neural net—like Wolfram|Alpha can be a good tool for ChatGPT. But computational irreducibility implies that one can’t expect to “get inside” those devices and have them learn.
实际上,我们可以想象将元胞自动机或图灵机等小型计算设备构建到神经网络等可训练系统中。事实上,这些设备可以作为神经网络的好“工具”,就像 Wolfram|Alpha 可以成为 ChatGPT 的好工具。但是计算不可约性意味着我们不能期望“进入”这些设备并让它们学习。
Or put another way, there’s an ultimate tradeoff between capability and trainability: the more you want a system to make “true use” of its computational capabilities, the more it’s going to show computational irreducibility, and the less it’s going to be trainable. And the more it’s fundamentally trainable, the less it’s going to be able to do sophisticated computation.
换句话说,在能力和可训练性之间有一个最终的权衡:你希望一个系统真正发挥其计算能力的程度越高,它表现出的计算不可约性就越高,训练起来的难度也就越大。而一个系统的可训练性越高,它做复杂计算的能力就越低。
(For ChatGPT as it currently is, the situation is actually much more extreme, because the neural net used to generate each token of output is a pure “feed-forward” network, without loops, and therefore has no ability to do any kind of computation with nontrivial “control flow”.)
(对于目前的ChatGPT来说,情况实际上更为极端,因为用于生成每个输出的token的神经网络是一个纯粹的“前馈”网络,没有循环,因此没有能力做任何使用非平凡“控制流”的计算。)
Of course, one might wonder whether it’s actually important to be able to do irreducible computations. And indeed for much of human history it wasn’t particularly important. But our modern technological world has been built on engineering that makes use of at least mathematical computations—and increasingly also more general computations. And if we look at the natural world, it’s full of irreducible computation—that we’re slowly understanding how to emulate and use for our technological purposes.
当然,人们可能会想,能够进行不可简化计算是否真的很重要。实际上,在人类历史的大部分时间里,它并不特别重要。但我们的现代技术世界是建立在至少使用数学计算的工程之上的,而且越来越多地使用更普遍的计算。如果我们观察自然界,就会发现它充满了不可简化的计算,而我们正在慢慢理解如何模拟和利用这些计算,以服务于我们的技术目的。
Yes, a neural net can certainly notice the kinds of regularities in the natural world that we might also readily notice with “unaided human thinking”. But if we want to work out things that are in the purview of mathematical or computational science the neural net isn’t going to be able to do it—unless it effectively “uses as a tool” an “ordinary” computational system.
没错,神经网络当然可以注意到自然界中我们也能轻松注意到的那些规律。但是,如果我们想要解决属于数学或计算科学范畴的事情,神经网络是无法做到的——除非它能有效地将一个“普通”的计算系统作为“工具”来使用。
PS:就目前ChatGPT所表现出来的行为来看,它已经具有了一定的逻辑推理能力(虽然不清楚原理是什么),所以它是否真的不能解决属于数学或计算科学范畴的事情,可能需要打一个问号?别忘了,它本身就是一个在“普通”计算系统中执行的程序。
But there’s something potentially confusing about all of this. In the past there were plenty of tasks—including writing essays—that we’ve assumed were somehow “fundamentally too hard” for computers. And now that we see them done by the likes of ChatGPT we tend to suddenly think that computers must have become vastly more powerful—in particular surpassing things they were already basically able to do (like progressively computing the behavior of computational systems like cellular automata).
但是,这其中有一些潜在的混淆。在过去,我们认为有很多任务——包括写文章——对计算机来说是“根本性的困难”。现在我们看到像 ChatGPT 这样的模型可以完成这些任务,我们往往会突然认为计算机一定是变得更强大了——尤其是超过了它们已经基本能够完成的任务(例如逐步计算细胞自动机等计算系统的行为)。
But this isn’t the right conclusion to draw. Computationally irreducible processes are still computationally irreducible, and are still fundamentally hard for computers—even if computers can readily compute their individual steps. And instead what we should conclude is that tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought.
但是这并不是正确的结论。计算不可简化的过程仍然是不可简化的,对于计算机来说仍然是根本性的困难,即使计算机可以轻松地计算它们的各个步骤。相反,我们应该得出这样的结论:我们人类可以做到,但我们认为计算机无法完成的任务,例如写文章,实际上在某种意义上比我们想象中的计算更容易些。
In other words, the reason a neural net can be successful in writing an essay is because writing an essay turns out to be a “computationally shallower” problem than we thought. And in a sense this takes us closer to “having a theory” of how we humans manage to do things like writing essays, or in general deal with language.
换句话说,神经网络之所以能够成功地写一篇文章,是因为写作实际上是一个比我们想象中更加“易于计算”的问题。从某种意义上说,这让我们更接近“拥有一种理论”,用来解释我们人类是如何完成像写文章或是处理语言等任务的。
If you had a big enough neural net then, yes, you might be able to do whatever humans can readily do. But you wouldn’t capture what the natural world in general can do—or that the tools that we’ve fashioned from the natural world can do. And it’s the use of those tools—both practical and conceptual—that have allowed us in recent centuries to transcend the boundaries of what’s accessible to “pure unaided human thought”, and capture for human purposes more of what’s out there in the physical and computational universe.
如果你拥有一个足够大的神经网络,那么,是的,你可能能够做到人类能够轻松做到的任何事情。但是你不会捕捉到自然界的能力,或者我们从自然界塑造出来的工具所能做到的事情。正是这些工具的使用——包括实用的和概念性的——使我们在近几个世纪里能够超越 "纯粹的人类独立思想"所能达到的界限,捕捉到了更多存在于物理和计算宇宙中的东西,并为人类的目的所用。
PS:上篇和这篇文章涉及太多概念性东西,虽然自己大概能看懂,但是翻译出来的效果一言难尽,更推荐看英文原文。或许将这部分内容基于自己的理解重新写成一篇文章,更容易看懂。
这篇文章是我在网上看见的一篇关于ChatGPT工作原理的分析。作者由浅入深的解释了ChatGPT是如何运行的,整个过程并没有深入到具体的模型算法实现,适合非机器学习的开发人员阅读学习。
作者Stephen Wolfram,业界知名的科学家、企业家和计算机科学家。Wolfram Research 公司的创始人、CEO,该公司开发了许多计算机程序和技术,包括 Mathematica 和 Wolfram Alpha 搜索引擎。
本文先使用ChatGPT翻译,再由我进行二次修改,红字部分是我额外添加的说明。由于原文很长,我将原文按章节拆分成多篇文章。想要看原文的朋友可以点击下方的原文链接。
https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
如果你想亲自尝试一下ChatGPT,可以访问下图小程序,我们已经在小程序中对接了ChatGPT3.5-turbo接口用于测试。

目前通过接口使用ChatGPT与直接访问ChatGPT官网相比,在使用的稳定性和回答质量上还有所差距。特别是接口受到tokens长度限制,无法进行多次的连续对话。
如果你希望能直接访问官网应用,欢迎扫描下图中的二维码进行咨询,和我们一起体验ChatGPT在学习和工作中所带来的巨大改变。

0条留言