【SEC Insight：OpenAI技術を活用した財務解析ツール】英語解説を日本語で読む【2023年10月19日｜@Prompt Engineering】

2023年10月22日 09:34

このYouTube動画では、10Kフォームの企業財務状況を理解するためのツール、LlamaIndexとSEC Insightについて説明されています。SEC Insightは財務ドキュメントのテキストや画像を解析し、質問に答える機能を持つオープンソースのプロジェクトです。デモでは、企業を指定し、財務データに関する情報を取得する様子が紹介されています。このツールは、ReactやFastAPIを使用しており、コードはGitHubで公開されています。SEC Insightは財務データの解析にLlamaIndexの能力を活用しており、試すことを推奨します。
公開日：2023年10月19日
※動画を再生してから読むのがオススメです。

If you want to learn about a company's financial situation, one of the best places is to start from their annual financial statements, such as 10K forms.

もしご興味があれば、企業の財務状況を学ぶ最良の場所の一つは、10Kフォームなどの年次財務諸表から始めると良いです。

These forms hold a lot of information regarding the company's financial situation, where they're spending and making money, and what are different types of risks.

これらのフォームには、企業の財務状況、どこでお金を使い収益を得ているのか、さまざまなリスクの種類に関する多くの情報が含まれています。

But these financial statements are not a fun read.

しかし、これらの財務諸表は読むのが楽しいものではありません。

That's why you can really use the power of llms to derive insights from these complex documents.

だからこそ、これらの複雑なドキュメントから洞察を得るためにllmsの力を本当に活用できます。

Recently, I have started looking at Lama index.

最近、私はLama indexを調べています。

They actually have a very interesting open source project called SEC Insight that uses retrieval augmented generation capabilities of LLaMA index to answer questions about SEC 10K and 10q documents.

実際、SEC 10Kや10qドキュメントに関する質問に答えるためにLLaMA indexの情報取得拡張生成機能を利用したSEC Insightという非常に興味深いオープンソースプロジェクトを持っています。

This project actually highlights the different capabilities of Lama index, and that's why I have been actually focusing on LLaMA index more recently.

このプロジェクトは実際にLama indexのさまざまな機能を強調しており、それが最近私がLLaMA indexに重点を置いている理由です。

Now, using llms for financial documents is a very complex task because financial documents contain text, images, and tables.

金融文書にllmsを使用することは非常に複雑な作業であり、金融文書にはテキスト、画像、表が含まれています。

It's a very hard problem to solve, and I think that's why it's a perfect application for rag pipelines.

それは非常に難しい問題であり、それがragパイプラインの完璧な応用であると思います。

Here is the system architecture of the SEC Insight.

SEC Insightのシステムアーキテクチャはこちらです。

It has a full front end as well as backend implementation.

フルフロントエンドとバックエンドの実装があります。

There is an S3 bucket for storing the PDF files as well as the vector store, but that is in a private bucket.

PDFファイルやベクターストアを保存するためのS3バケットがありますが、それはプライベートバケット内にあります。

And it's making calls to OpenAI Services as well as some other APIs for retrieving financial data.

また、金融データの取得のためにOpenAIサービスやその他のAPIへの呼び出しを行っています。

In this video, I will give you a quick demo of the application on how it works, but in a later video, I'll break it down, and we will specifically focus on how the retrieval is done with a special focus on how the data is being read from tables.

この動画では、アプリケーションの動作方法に関する簡単なデモをお見せしますが、後の動画では詳しく説明し、テーブルからのデータの読み取り方法に特に焦点を当てて情報取得方法について詳しく見ていきます。

Okay, so they are hosting this application on a website called secinside.ai.

では、このアプリケーションはsecinside.aiというウェブサイトでホスティングされています。

I'll put a link in the description of the video.

動画の説明欄にリンクを掲載します。

So let's quickly look at how this works.

さて、どのように動作するのか簡単に見てみましょう。

So currently, you cannot upload your own documents, but here is a list of companies that you can use.

現在、独自のドキュメントをアップロードすることはできませんが、使用できる企業のリストがあります。

So for example, we can select something like Tesla.

例えば、Teslaのようなものを選択することができます。

Then what type of form we want?

どのようなフォームが欲しいですか？

So let's say we were looking at the annual report.

年次報告書を見てみましょうと言う場合はどうでしょうか。

Year, so it has data for the last three years.

年次データは過去3年間のものがあります。

So let's select 2022.

2022年を選択しましょう。

We can add this.

これを追加できます。

And let's say another one is NVIDIA, same annual report, 2022, and add this as well.

さらに、NVIDIAも同じ年次報告書、2022年で、これも追加しましょう。

And you can, I think, upload up to or select up to 10 different companies.

最大10社まで選択またはアップロードすることができると思います。

But we are going to just select three different companies.

しかし、私たちは3つの異なる企業だけを選択するつもりです。

So I think I included Amazon, NVIDIA, Tesla, or maybe let's make it four. We're going to do Apple as well.

私はAmazon、NVIDIA、Tesla、あるいは4つにしてみましょう。Appleも含めることにしますので。

All right, once you select your company, then simply click on Start your conversation.

企業を選択したら、単純に「会話を開始する」をクリックします。

Okay, so you're going to be presented with something like this.

では、こんな感じで提示されることになります。

On the right-hand side, you can actually see different 10K forms for different companies.

右側には、異なる企業の異なる10Kフォームを実際に見ることができます。

And here are simple links to each one of them.

そして、それぞれにはシンプルなリンクがあります。

Okay, so on the left-hand side, here are some prepopulated questions for you, or you can simply start typing your question in here.

左側には、事前に用意された質問がいくつかありますが、または質問をこちらで入力することもできます。

So let's use the prepopulated question.

事前に用意された質問を利用しましょう。

So for example, here's one: Which company had the highest revenue?

例えば、これが一つの質問です：どの会社が最も高い収益を上げていましたか？

So now, if we click on this, so basically, it has to go to the revenue of each of these four companies.

これをクリックすると、これらの4つの会社の収益に移動する必要があります。

And then, figure out which company had the highest revenue.

そして、どの会社が最も高い収益を上げていたのかを判断する必要があります。

So, let's simply ask this question.

この質問を簡単に聞いてみましょう。

We're going to hit enter.

エンターキーを押します。

It basically goes through the whole process.

基本的には全体のプロセスを通過します。

If you look here, it's actually asking this question individually for each of the earning reports that we have provided.

こちらを見ると、提供した各稼得報告書に対してこの質問を個別にしていることがわかります。

Right?

良いですか？

And it is getting an answer for each of the company individually.

そして、それぞれの会社に対して個別の答えを得ています。

And then, there is an agent which compares all of these together to give us the final answer.

そして、これらすべてを比較して最終的な答えを与えてくれるエージェントがあります。

Now, you can see based on the individual answers, it's able to deduce that Amazon has the highest revenue among the companies mentioned.

個別の回答に基づいて、上記の会社の中でAmazonが最も高い収益を上げていることがわかります。

Now, this is a really great application of RAG.

これはRAGの本当に素晴らしい応用例です。

Let's ask it another question.

別の質問をしてみましょう。

So, we're going to ask it what were different risk factors for each of the companies.

それでは、各企業の異なるリスク要因は何でしたかと聞いてみましょう。

And let's see if it can come up with an answer based on the annual reports that we have provided.

提供した年次報告書に基づいて答えが得られるかどうか見てみましょう。

Okay, here is the answer that we got.

はい、私たちが得た答えはこちらです。

It says the risk factors for each company are as follows: risk factors for Amazon, Apple, NVIDIA, and even for Tesla as well.

それによると、各企業のリスクファクターは以下の通りです：Amazon、Apple、NVIDIA、そしてTeslaにもあります。

Now, the thing that I'm interested in is this progress section.

私が興味を持っているのはこの進行状況のセクションです。

So, if you look here, what it did was based on a question, it generated a subquery and then it queried the document.

したがって、こちらをご覧いただければと思いますが、質問に基づいてサブクエリを生成し、その後文書を検索しました。

Let's say it generated a subquery for Amazon and then it basically queried the information from that specific document and it came up with the answer.

Amazonのためにサブクエリを生成し、それからその特定のドキュメントから情報をクエリして、答えが出てきたとしましょう。

Similarly, it did it for Apple, NVIDIA, and Tesla as well.

同様に、Apple、NVIDIA、Teslaに対してもそれを行いました。

Now, one great feature that they have included is the answers are actually grounded in the documents.

彼らが取り入れた素晴らしい機能の1つは、答えが実際に文書に基づいていることです。

So, for example, if you see here, these are the highlighted sections.

例えば、こちらをご覧になると、これらはハイライトされたセクションです。

So, let's say if you click on this, now here you can see which part or chunk of text from the original document was used to generate the answer.

これをクリックすると、元の文書のどの部分やテキストが答えを生成するために使用されたのかがわかります。

This is very critical when you're creating answers based on documents provided.

提供された文書に基づいて答えを作成する際に、これは非常に重要です。

So, this ensures that the answers are actually grounded in the context rather than the L&M is generating something from its own knowledge base.

このことで、答えが文脈に基づいていることが確実になります。LLMが自身の知識ベースから何かを生成しているわけではありません。

Here is another example.

もう1つの例をご紹介します。

So, in this case, we are looking at the answer for Tesla.

この場合、Teslaの答えを見ています。

And actually, I wanted to see which portion of the document or like the annual report was used.

そして、実際に、文書や年次報告書のどの部分が使用されたのかを確認したかったのです。

So, you can come back here and this is the part of the document that was used to generate the answer.

したがって、こちらに戻ると、この部分が答えを生成するために使用された文書の部分です。

Now, for SEC Insight, the great thing is this whole thing in terms of both the front end as well as backend is open source.

SEC Insightに関して素晴らしいことは、フロントエンドもバックエンドも両方ともオープンソースであることです。

So, you can actually go to their GitHub repo and look at the code.

実際に、彼らのGitHubリポジトリにアクセスしてコードを見ることができます。

Here's the tech stack that they're using.

彼らが使用している技術スタックをご紹介します。

So, the front end is based on React and Tailwind CSS.

フロントエンドはReactとTailwind CSSを基盤としています。

In the back end, you have a FastAPI.

バックエンドにはFastAPIを使用しています。

OpenAI is used for actually both the embedding as well as the LLM part.

OpenAIは、埋め込みとLLMの部分の両方で実際に使用されています。

Right?

そうでしょう？

There's a Vector store in there and everything is put together using LAMA index.

中にはベクトルストアがあり、LAMAインデックスを使用してすべてが組み合わされています。

And for infrastructure, they're using that.

インフラとして、それを使用しています。

And learn more about it, I'll be adding more videos to it.

もっと詳しく知りたい方のために、もっと多くのビデオを追加する予定です。

I'll highly recommend to check out the SEC Insight project and play around with it.

SEC Insightプロジェクトをチェックして、それを試してみることを強くおすすめします。

I hope you found this video useful. If you did, consider liking the video and subscribe to the channel.

この動画が役に立ったと思ったら、動画にいいねをしてチャンネル登録を考えてみてください。

Thanks for watching and as always, see you in the next one.

ご視聴いただきありがとうございます。次回もお会いしましょう。

この記事が気に入ったらサポートをしてみませんか？