ついにご家庭にやってきたシンギュラリティ。AIサイエンティストが勝手に仮説を立て、実験して、論文を書く

2024年8月13日 19:37

SakanaAIがまた面白い研究を発表した。その名も「AI Scientist(AI科学者)」
AIの研究をする科学者ではなく、科学者をやるAIである。

しかもすごいのは、基本的に難しいAIはすべてクラウド上で動作する(GPT-4oやOpenRouter対応LLMなど)ので、ご家庭で手軽に遊ぶことができる。AIの実験をする場合はGPUくらいは欲しいところだが、AIじゃないものを研究する場合はGPUすら不要だ。

実際に動かしてみると様々な罠があるぞ(まだ)

AI-Scientistの使い方は簡単・・・とまではいかないが、templateを書いて「こういう仮説があるんだよね」というアイデアをいくつかseed_ideas.jsonに書いてあとは電子レンジでチンするが如く実行するだけだ。

ただ、記事執筆時点(2024/8/13 18:50 JST)では、リポジトリが不完全なのか書かれた通りにコマンドを打っても動かない。

まず「data」ディレクトリがないので、これを自分で作る必要がある。
また、dataディレクトリの中身にenwik8とtext8が必要なのだが、これも存在しないので持ってくる必要がある。とりあえず秋葉さんに質問を投げてみたが、待ちきれなかったのでその辺に落ちてるenwik8とtext8を探してきて入れてみる。(と思ったら19時ごろにdataディレクトリが追加されてるのでいらないかも)

enkwik8/prepare.py

text8

そしてgithubのReadme.mdに書かれているような方法で呼ぶとうまく動かないのでこうした

$ cd data/enwik8
$ python prepare.py
$ cd ../../text8
$ python prepare.py
$ mv text8/text8/* .

これでいいのかどうかわからないがとりあえず動いているので良しとする。
実際に三つのサンプル(NanoGPT、2D Diffusion、 Grokking)を同時に動かしてみると、2D Diffusionが最初に終わった。

$ cd git/AI-Scientist/template/2d_diffusion
$ python plot.py

するとこんな感じでベースラインの実行結果が出てきた。

これを先に実行してみる。果たしてどんな理論が生まれるのか?

$ cd ../..
$ python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment 2d_diffusion --num-ideas 3

これで「2d_diffusion」テンプレートを使った仮説の提唱と確認のための実験が始まる。

実行中の様子はこんな感じ

Using GPUs: [0, 1]
Using OpenAI API with model gpt-4o-2024-05-13.

Generating idea 1/3
Iteration 1/3
{'Name': 'conditional_mode_embeddings', 'Title': 'Conditional Mode Embeddings: Enhancing Controllable Generation in Diffusion Models', 'Experiment': 'We introduce a conditional embedding mechanism to the MLPDenoiser to allow the model to focus on different modes of the data distribution. This involves adding a new embedding layer that takes a mode label as input and concatenates this embedding with the existing inputs before passing them through the network. We will train this conditional model on the existing datasets with additional mode labels and evaluate its performance in generating samples biased towards specific modes.', 'Interestingness': 8, 'Feasibility': 7, 'Novelty': 7}
Iteration 2/3
{'Name': 'conditional_mode_embeddings', 'Title': 'Conditional Mode Embeddings: Enhancing Controllable Generation in Diffusion Models', 'Experiment': 'We introduce a conditional embedding mechanism to the MLPDenoiser to allow the model to focus on different modes of the data distribution. This involves adding a new embedding layer that takes a mode label as input and concatenates this embedding with the existing inputs before passing them through the network. Mode labels will be predefined based on the known structure of the datasets. We will train this conditional model on the existing datasets with these mode labels and evaluate its performance in generating samples biased towards specific modes by visualizing the generated samples and computing mode-specific metrics such as mode coverage and quality.', 'Interestingness': 8, 'Feasibility': 7, 'Novelty': 7}
Idea generation converged after 2 iterations.

Generating idea 2/3
Iteration 1/3
{'Name': 'learnable_embeddings', 'Title': 'Learnable Embeddings: Enhancing Diffusion Models with Adaptive Encoding Mechanisms', 'Experiment': 'We replace the fixed SinusoidalEmbedding with a LearnableEmbedding class, where the embeddings are initialized as learnable parameters. The MLPDenoiser is updated to use this new embedding mechanism. We will then train the model on the 2D datasets and evaluate its performance using metrics such as training loss, evaluation loss, and KL divergence to determine the effectiveness of learnable embeddings compared to sinusoidal embeddings.', 'Interestingness': 7, 'Feasibility': 8, 'Novelty': 6}
Iteration 2/3
{'Name': 'learnable_embeddings', 'Title': 'Learnable Embeddings: Enhancing Diffusion Models with Adaptive Encoding Mechanisms', 'Experiment': 'We replace the fixed SinusoidalEmbedding with a LearnableEmbedding class, where the embeddings are initialized as learnable parameters. The LearnableEmbedding class will initialize a set of parameters matching the dimensions of the sinusoidal embeddings. The MLPDenoiser is updated to use this new embedding mechanism. We will then train the model on the 2D datasets and evaluate its performance using metrics such as training loss, evaluation loss, and KL divergence to determine the effectiveness of learnable embeddings compared to sinusoidal embeddings.', 'Interestingness': 7, 'Feasibility': 8, 'Novelty': 6}
Iteration 3/3
{'Name': 'learnable_embeddings', 'Title': 'Learnable Embeddings: Enhancing Diffusion Models with Adaptive Encoding Mechanisms', 'Experiment': 'We replace the fixed SinusoidalEmbedding with a LearnableEmbedding class, where the embeddings are initialized as learnable parameters. The LearnableEmbedding class will initialize a set of parameters matching the dimensions of the sinusoidal embeddings. The MLPDenoiser is updated to use this new embedding mechanism. We will then train the model on the 2D datasets and evaluate its performance using metrics such as training loss, evaluation loss, and KL divergence to determine the effectiveness of learnable embeddings compared to sinusoidal embeddings. The same datasets and training configurations will be used for fair comparisons.', 'Interestingness': 7, 'Feasibility': 8, 'Novelty': 6}

Generating idea 3/3
Iteration 1/3
{'Name': 'semantic_embeddings', 'Title': 'Semantic Embeddings: Ensuring Semantic Consistency in Diffusion Models', 'Experiment': 'We introduce a SemanticEmbedding class that captures specific semantic attributes of the data. This involves creating a new embedding layer that learns to encode these attributes. The MLPDenoiser is updated to incorporate these semantic embeddings alongside the existing sinusoidal time embeddings. We train this modified model on the 2D datasets and evaluate its performance in generating samples that maintain the semantic attributes. Metrics such as structural similarity index (SSI) and visual inspections will be used to assess semantic consistency.', 'Interestingness': 9, 'Feasibility': 7, 'Novelty': 8}
Iteration 2/3
{'Name': 'semantic_embeddings', 'Title': 'Semantic Embeddings: Ensuring Semantic Consistency in Diffusion Models', 'Experiment': 'We introduce a SemanticEmbedding class that captures specific semantic attributes of the data using predefined geometric properties. This involves creating a new embedding layer that encodes these attributes based on simple heuristics. The MLPDenoiser is updated to incorporate these semantic embeddings alongside the existing sinusoidal time embeddings. We train this modified model on the 2D datasets and evaluate its performance in generating samples that maintain the semantic attributes. Metrics such as structural similarity index (SSI) and Fréchet Inception Distance (FID) adapted for 2D datasets will be used to assess semantic consistency.', 'Interestingness': 9, 'Feasibility': 8, 'Novelty': 8}
Iteration 3/3
{'Name': 'semantic_embeddings', 'Title': 'Semantic Embeddings: Ensuring Semantic Consistency in Diffusion Models', 'Experiment': 'We introduce a SemanticEmbedding class that captures specific semantic attributes of the data using predefined geometric properties. This involves creating a new embedding layer that encodes these attributes based on simple heuristics. The MLPDenoiser is updated to incorporate these semantic embeddings alongside the existing sinusoidal time embeddings. We train this modified model on the 2D datasets and evaluate its performance in generating samples that maintain the semantic attributes. Metrics such as structural similarity index (SSI) and Fréchet Inception Distance (FID) adapted for 2D datasets will be used to assess semantic consistency.', 'Interestingness': 9, 'Feasibility': 8, 'Novelty': 8}
Idea generation converged after 3 iterations.

Checking novelty of idea 0: learning_rate_schedule
Response Status Code: 429
Response Content: {"message": "Too Many Requests. Please wait and try again or apply for a key for higher rate limits. https://www.semanticscholar.org/product/api#api-key-form", "code": "429"}
Backing off 0.6 seconds after 1 tries calling function search_for_papers at 19:00:30
Response Status Code: 200
Response Content: {"total": 3572, "offset": 0, "next": 10, "data": [{"paperId": "ea63924b08e31755cafb9bb261fa6b04c17739b9", "title": "Speed-accuracy trade-off for the diffusion models: Wisdom from nonequilibrium thermodynamics and optimal transport", "abstract": "We discuss a connection between a generative model, called the diffusion model, and nonequilibrium thermodynamics for the Fokker-Planck equation, called stochastic thermodynamics. Based on the techniques of stochastic thermodynamics, we derive the speed-
Response Status Code: 429
Response Content: {"message": "Too Many Requests. Please wait and try again or apply for a key for higher rate limits. https://www.semanticscholar.org/product/api#api-key-form", "code": "429"}
Backing off 0.7 seconds after 1 tries calling function search_for_papers at 19:00:37
Response Status Code: 200
Response Content: {"total": 573, "offset": 0, "next": 10, "data": [{"paperId": "52e8102e070dbed745c39fd518f4f6aa3daffb3c", "title": "No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models", "abstract": "Recent research has shown the existence of significant redundancy in large Transformer models. One can prune the redundant parameters without significantly sacrificing the generalization performance. However, we question whether the redundant parameters could hav

正直何をしてるのかはよくわからないが、GPT-4oに「お前リクエスト送りすぎだ」と怒られているのだけは辛うじて理解できる。

これで放置しておくと明日の朝あたりに何か知らない新理論ができてるのだろうか?ワクワクしかない。

8/14 0:52 実際に書かれた論文

19時からスタートして、わずか4時間で3つの実験が終了し、論文が書かれてきた。驚異的である。マシンはドスパラ謹製A6000x2マシン一台。

他にもまだ実験を継続している論文もある。
どの論文もそれなりに読み応えがありそうだ。
原理的にはマシンを増やせば増やすほど同時並行的にいろんな仮説を検証することができる。金(API利用料金)さえ払えば研究し(させ)放題だ。しかも、一晩動かしても、かかるお金はインターンの時給よりも安い。

文系の方のためにこれの何がすごいのか解説

SakanaAIは、当初から大規模言語モデルを直球で作ったり微調整したりするのではなく、うまく組み合わせながら新しい使い方を探る、ということが非常に上手い会社という印象だった。

今回の「AI Scientist」はまさにSakanaAIらしいアイデアに溢れたツールだと言えるだろう。

この方法の第一のメリットは、設備投資が(ほとんど)いらないことである。
重要な推論の大部分を外部のLLM(GPT-4oなど)が行うため、数千万円から数億円すると言われる大規模な計算環境は必要ない。

AIの研究をする場合にはこのツールを動かす環境にもGPUやらなんやらが必要だが、そうでない場合は不要。

あとはただじっと待つだけでいい。

もっと大きなインパクトは、今までは人間の研究者が何百人も必要だったが、少なくともコンピュータ上で完結する実験に関しては、人間の研究者はごく少数で良いことになる。機械の研究者に勝手に考えさせるだけでいい。人間は機械の研究者に研究させるアイデアだけを与えれば良く、あとは寝てればいい。

こうすると次はどうなるかというと、学生アルバイトやインターンの大半は必要なくなってしまう。

皮肉なのは、実際問題、この「AI Scientist」が内部的にやってることと、僕が最近自分の仕事のためにやっていることはほとんど変わらない。作りたいコードをAIderに指示して、出てきたコードを動かして、エラーが出たらエラーをAIderに見せて修正させる。ただそれだけだ。

「AI Scientist」も内部でAIderを呼んでるので僕のやってる仕事を単に自動化しただけとも言える。目的は論文を書くことなのでそれよりは少し複雑なアプローチになっているが、やってることそのものは変わらない。

研究というのはアイデアを考える時は面白いが、アイデアが間違っているように見えたり、それを実際に実験で確かめたりするといったところは非常に面倒臭い。多分どんな学者も実験をそれほど楽しいと思ってないのではないだろうか。もちろんそれが楽しいという人もいるだろうが、普通に考えると仮説を確かめるためだけに実験をするというのは苦痛だ。

仮説の設定と実験、その検証が自動化されるだけで、多くの研究者は本当に自分のやりたいことに集中できるようになるだろう。

いわば、マシンさえあれば誰でも自分個人の研究室を持てるような状態だ。
研究室の研究能力は研究者個人の努力ではなく才能やインスピレーション、いかに他分野からの知見やアイデアを貪欲に応用できるかという教養といったものになるだろう。

また、これによってただでさえ多いAI関連の論文が爆増する危険性が考えられる。既に査読不能なくらい多い論文をどのように絞るかは非常に難しい問題だ。しかもAIが書いたからといって、戯言であるとは限らず、むしろ本質的に重要な指摘をしている可能性があることにも注意したい。

少なくともこれでたいていの人間よりも機械の方が研究者として有用になるので、これはもうシンギュラリティがついに始まったと言ってもいいのではないだろうか。

告知 9/14 24時間AIハッカソン東京大会(優勝賞金10万円)

来る9/14に24時間AIハッカソンを開催いたします。
審査員として、Business Insider Japan編集長の伊藤有さんと、ソニーでnnablaの開発者として知られる小林由幸さんをお招きしております。僕も司会として参加させていただきます。ぜひお誘い合わせの上お越しください。