This blog began with the idea of treating snarxiv.org's fake physics papers seriously.
Last year I found a powerful language model on the web (GPT-J), and tried generating whole papers using titles and abstracts generated by snarxiv.org. This was my first attempt; then this was my repeated exploration of a particular topic.
Five months later, the era of ChatGPT began. (At the same time, the free web version of GPT-J stopped working.) Like millions of other users, I have been conducting many experiments with ChatGPT and Bing.
But I only just thought of returning to the snarxiv challenge, using this new generation of tools. Here's my first attempt: "Quintessence at the Intermediate Scale Extremizes the Strong CP Problem"
The resulting "paper" has a palpably different flavor to those generated by GPT-J-6B. But first, let me describe how the paper was generated. The paper is too big for a single output from ChatGPT, so first I gave it the title and abstract, and asked it to generate a table of contents, then I manually asked it to generate each item listed in the table of contents, one after the other.
You will note that it doesn't actually contain any equations or references. The style of the whole paper, in fact, resembles an abstract - merely declaring that certain things will be explained or shown, but not actually delivering on anything promised.
On the other hand, the texts produced by GPT-J regularly contained both equations and references, but were far less coherent than what ChatGPT has written.
The difference between GPT-J and ChatGPT is that ChatGPT, after being "pre-trained" on a vast corpus of writings, has then been conditioned so as to consistently present itself in the persona of a helpful assistant. GPT-J, on the other hand, was (I assume) a raw language model with pre-training only: presented with an input, it would immediately attempt to continue in the style and structure implied. As a result, GPT-J would directly output a (fictitious, incoherent) arxiv paper, complete with LaTeX markup.
ChatGPT is far more logical and coherent in its output, thanks to intensive fine-tuning. As a result, its paper has a genuinely logical structure, but it also doesn't spontaneously produce equations and references, the way that GPT-J did. However, I'm sure it has the capacity to do so, if prompted appropriately.
In June 2022, I wrote:
"I suspect that in less than ten years, you'll be able to input a snarxiv abstract into an AI, and almost instantly get back an essay which really does its best to deliver coherently on the promised content."
It's now nine months later, and I think that a little experimentation with the ChatGPT API would rapidly yield papers combining the logical coherence of ChatGPT with the detailed creativity of GPT-J. How close one could come to the quality of a good arxiv paper is a deep question.
