Falcon 180BとGPT-4の性能比較

2023年9月11日 22:48

Falcon 180Bは先週、UAEに拠点を置くTII（研究施設）によってオープンソースとしてリリースされました。このモデルは1800億のパラメータと3.5兆のトークンのトレーニングを誇り、現時点で利用可能な最大のオープンソースモデルとなっています。Huggingfaceがホスティングを担当しており、400GB以上のメモリが必要ですが、手軽に利用することが可能です。今回は、この新しいFalcon 180BをGPT-4と比較してみました。

評価項目

次の5つの項目に関して、2つのモデルの性能を比較します。

コーディング能力
推論
クリエーティビティー
知識
多言語能力

各セクションは10点満点で評価され、得点を合計します。
評価はあくまで筆者の主観に基づき行いました。

1. コーディング能力

プロンプト

Write a function that takes a list of integers as an input 
and returns the sum of all even numbers in the list.
This function should be implemented in Python.
(訳: リストを入力として受け取り、リスト内のすべての偶数の合計を返す関数を書く。
この関数はPythonで実装する必要があります。)

Falcon 180B

GPT-4

このタスクは非常にシンプルです。どちらのモデルも良い仕事をしていますが、GPT-4がより多くのソリューションを提供し、それらを説明しています。

スコア

Falcon 180B: 8/10
GPT-4: 10/10

2. 推論

プロンプト

Blueberries cost more than strawberries.
Blueberries cost less than raspberries.
Raspberries cost more than strawberries and blueberries. 
If the first two statements are true, the third statement is?
(訳: 
ブルーベリーはイチゴよりも高価です。
ブルーベリーはラズベリーよりも安価です。
ラズベリーはイチゴとブルーベリーよりも高価です。
最初の2つのステートメントが真実であれば、3つ目のステートメントは？)

Falcon 180B

GPT-4

両方とも正しい答えを出しています。

スコア

Falcon 180B: 10/10
GPT-4: 10/10

3. クリエイティビティ

プロンプト

Write a poem about AI in 32 words.
(訳: AIについて32語の詩を書きなさい。)

Falcon 180B

GPT-4

両方とも単語数は正確ではありません（Falconは多く、GPT-4は少ない）。
Falconの詩は少し恐ろしい感じです。

スコア

Falcon 180B: 8/10
GPT-4: 10/10

4. 知識

プロンプト

When did the world war 2 end?
(訳: 第二次世界大戦はいつ終わったのですか？)

Falcon 180B

GPT-4

両方とも正確な答えを提供します。GPT-4はより詳しい情報を提供しますが、パラメータ調整によりFalconも同様の結果が期待できます。

スコア

Falcon 180B: 10/10
GPT-4: 10/10

5. 多言語能力

プロンプト

I am about to give you some instructions in Hindi, 
respond to that in Japanese - 
「दुनिया में कितने देश हैं?」

(訳: ヒンディー語で指示を与えるので、日本で返答してください - 「दुनिया में कितने देश हैं?」)

Falcon 180B

GPT-4

尋ねた質問は「世界にはいくつの国がありますか」という意味です。Falconは質問を日本語に翻訳するだけで答えを提供せず、さらにヒンディー語と日本語をローマ字で出力します。GPT-4は正しい答えを提供し、全ての言語を理解していることを示します。

スコア

Falcon 180B: 6/10
GPT-4: 10/10

結論

オープンソースモデルの開発は非常に速く進んでいます。Falcon 180Bは多言語能力を除いて、あらゆる面で非常に優れた性能を示しています。GPT-4は現在もトップのモデルですが、オープンソースコミュニティも大きく進化しています。Falcon 180Bがどのように受け入れられ、どのように進化するかを見るのは楽しみです。今後が期待されます。

おまけ

この全記事をFalcon 180bとGPT4に入力し、結論を生成するように頼みました。結果は以下の通りです：

Falcon 180B

Conclusion: Based on the tests conducted above, it’s evident that TII’s Falcon 180B is highly capable of performing at par with GPT 4 with a little room for improvement in coding tasks and providing better answers to the questions. Considering that this model is an open-source alternative for large language models, the results of this comparison are remarkable. The “open-source king” definitely has what it takes to compete with the best. With TII continuing to develop such models, we can expect even better performance from the open-source community in the future.

GPT-4

Conclusion
In a face-off between Falcon 180B, the largest open-source model recently released by TII, and GPT4, the results show that size isn’t the only determinant of capability. Across the five domains of Coding ability, Reasoning, Literature, Knowledge, and Multilingual abilities, GPT4 consistently scored higher.
While Falcon 180B boasts a massive parameter count and has impressive capacities, it doesn’t quite match GPT4 in terms of versatility, in-depth understanding, and multilingual comprehension. Especially notable was the multilingual test, where Falcon showed gaps in multi-language translation and understanding.
Both models exhibit proficiency in coding, showcasing their ability to generate Python code. The reasoning task revealed that both models can understand logical statements, and their knowledge of historical events like the end of World War 2 is commendable.
However, when it comes to creative tasks such as writing poetry, GPT4 seems to have a slight edge. While both didn’t stick to the word limit, GPT4’s rendition was found more appealing.
Overall, while Falcon 180B is a monumental achievement as an open-source model and certainly has its merits, GPT4 remains more versatile across various tasks. It’ll be exciting to see how these models evolve and what further innovations the AI community will bring in the future.

Falcon 180BさんとGPT4さんの意見は面白いです。

Falcon 180BとGPT-4の性能比較

評価項目

1. コーディング能力

プロンプト

Falcon 180B

GPT-4

コメント

スコア

2. 推論

プロンプト

Falcon 180B

GPT-4

コメント

スコア

3. クリエイティビティ

プロンプト

Falcon 180B

GPT-4

コメント

スコア

4. 知識

プロンプト

Falcon 180B

GPT-4

コメント

スコア

5. 多言語能力

プロンプト

Falcon 180B

GPT-4

コメント

スコア

結論

おまけ

Falcon 180B

GPT-4