【Health-LLM：医療AIの進化】英語解説を日本語で読む【2024年2月3日｜@David Shapiro】

2024年2月4日 10:38

この動画は、AIの進化と大規模言語モデルの実装方法について掘り下げています。特に、健康問題の正確な診断を目指すHealth-LLMというアーキテクチャに焦点を当てています。Health-LLMは、83.3%の診断精度を実現し、GPT-3.5 Turboを上回っています。このアーキテクチャは、患者の症状を考慮して診断を行うための複合技術を利用していますが、完全な患者情報や遺伝情報を取り込むことができないという課題があります。
公開日：2024年2月3日
※動画を再生してから読むのがオススメです。

The AI space is really heating up and picking up speed.

AIの領域は本当に盛り上がってきて、スピードを上げています。

Now, one of the things that happens though, as new technologies kind of get out into the space and get embedded into all the little nooks and crannies, is that some of the big advances really kind of overshadow the little incremental improvements.

新しい技術が出現し、あらゆる小さな隙間に組み込まれていく過程で、しばしば起こることの一つは、大きな進歩が小さな段階的な改善を本当に影に隠れさせてしまうことです。

And this is true for technology all across time and space.

これは、時間と空間を超えた技術全般に当てはまります。

You know, as a former virtualization engineer, I remember when virtualization was the big new shiny, sexy thing, and then it just becomes kind of mundane status quo, everyday sort of thing.

以前の仮想化エンジニアとして、仮想化が大きな新しい輝かしい、セクシーなものだったことを覚えていますが、それはただの日常的なものになりました。

And so AI, Large Language Models, going through the same phase right now, we're actually finding more and more ways to implement them.

そして、AI、大規模言語モデルも同じフェーズを経ています。実際には、それらを実装する方法がますます見つかっています。

And the nuts and bolts of this, the nitty gritty, is not quite as shiny and as exciting.

そして、これの具体的な内容は、輝かしさや興奮ほどではありません。

So that's what I'm here to do today is to kind of talk about how this particular study is really interesting because we are working towards more established best practices in terms of cognitive architectures to do things such as accurately diagnose health issues.

だから、今日は、この特定の研究が非常に興味深い理由について話すためにここにいるのです。それは、正確な診断を行うための認知アーキテクチャの確立されたベストプラクティスに向けて取り組んでいるからです。

So this study is one of many like it, but this one just came out on archive.

この研究は、そのようなものの1つであり、ただし、これはアーカイブで発表されたものです。

So it's a preprint, but it looks very compelling.

それはプレプリントですが、非常に説得力があります。

So the TLDR, the very bottom line is that Health-LLM is an architecture that was able to achieve 83.3% accuracy of diagnosis.

要するに、Health-LLMは83.3％の診断精度を達成できたアーキテクチャです。

It was able to beat GPT-3.5 Turbo and 4, even with a variety of techniques.

さまざまな技術を用いて、GPT-3.5 Turboや4を上回ることができました。

So to jump right into it, I won't bore you with all the dirty details, but this is the overall workflow of Health-LLM.

それでは、詳細な説明は省きますが、これがHealth-LLMの全体的なワークフローです。

And so it uses a combination of information retrieval, RAG, so retrieval augmented generation.

したがって、情報検索とRAG（検索補完生成）の組み合わせを使用しています。

It also does a bunch of chunking and in-context learning.

また、チャンキングやコンテキスト学習も行っています。

It also does feature extraction and internal kind of question answering in order to really think about the patient's case.

それはまた、特徴抽出と内部の質問応答を行い、患者の症例について本当に考えるためのものです。

And so this is a series of a lot of stuff.

これは、多くの要素からなるシリーズです。

If you're watching this video and you're in the AI space, you're probably familiar with a lot of these techniques.

このビデオを見ている方で、AIの分野にいる方は、これらの技術の多くに詳しいかもしれません。

But when you see all of these techniques all together, where it generates feature lists, it uses semantic embedding, it uses sequence to sequence transformers, it uses conventional ML.

しかし、これらの技術がすべて組み合わさったときに、特徴リストを生成し、意味埋め込みを使用し、シーケンス・トゥ・シーケンス・トランスフォーマーを使用し、従来の機械学習を使用すると、さまざまなことができます。

So you see we've got XGBoost here.

ここにはXGBoostもあります。

So it does all kinds of stuff.

さまざまなことを行います。

And okay, so quick backstory on me.

では、私の簡単なバックストーリーについて説明します。

I've got a lot of doctors in my family.

私の家族には多くの医師がいます。

My uncle's an anesthesiologist.

私のおじは麻酔科医です。

My in-laws are both physicians.

義理の両親も医師です。

And so like kitchen table conversation is diagnostic process.

だから、キッチンテーブルの会話は診断プロセスです。

And so one of the things is, and I haven't seen this misconception in a while, but a lot of people, myself included, if you're not familiar with what goes into accurately diagnosing stuff in medicine, it's actually really hard. It's not just a matter of matching symptoms.

そして、一つのことは、私自身を含めて、医学の正確な診断に必要な要素を理解していない人々が、最近はあまり見かけないですが、多くの人々が、症状の一致だけではなく、実際には非常に難しいという誤解を持っていることです。

Like you probably have been on Health MD and those symptom checker websites and stuff.

たとえば、Health MDやその他の症状チェッカーウェブサイトを利用したことがあるかもしれません。

And the thing is, is even just matching symptoms, that is maybe 20 to 50% of the information that is required.

そして、症状の一致だけでも、必要な情報の20〜50%かもしれません。

Because then there's also all kinds of other stuff, such as patient history.

なぜなら、患者の経歴など、さまざまな要素もあるからです。

There's contraindicators as to helping you identify what the actual problem is versus false positives.

実際、実際の問題と誤検知を区別するのに役立つ反対指標があります。

And so this is a non-trivial set of things.

これは、簡単な問題ではありません。

And I've demonstrated projects in the past.

私は過去にプロジェクトを実証しました。

A while back, I did the medical intake form, which is able to kind of use the language model to ask counterfactual questions.

以前、医療受付フォームを作成しました。これは言語モデルを使用して反事実的な質問をすることができます。

But this does a little bit more because, and this is where I would have taken it, honestly, if I had the time and energy and focus to follow through with things, but I have ADHD.

しかし、これは少し違います。私が時間とエネルギー、焦点を持って取り組んだ場合、私はこれを進めていたでしょうが、私はADHDを持っています。

So anyways, so they connected medical database, medical questionnaire.

とにかく、彼らは医療データベースと医療アンケートを結びつけました。

So this is like the part that I did months ago.

これは、数ヶ月前に私が行った部分です。

And they did all the rest of this.

そして、彼らはこれ以外のすべてを行いました。

So anyways, sorry, going down a rabbit hole.

話が脱線してしまってすみません。

So this whole thing basically approximates some of what an actual medical professional does.

この全体的なシステムは、実際の医療専門家が行うことの一部を近似しています。

So you talk about going to med school, a lot of it is rote memorization.

医学校に行くことについて話すと、それはほとんど暗記です。

You have to memorize all the different body systems and medications and interactions and how to.

さまざまな体系や薬物、相互作用、方法などを覚えなければなりません。

And then you have to memorize all the diagnostic processes that are available and all the treatment options that are available.

そして、利用可能な診断プロセスと治療オプションをすべて覚えなければなりません。

And so all of that memorization is why retrieval-augmented generation and information retrieval is critical because human brains, we have the ability to recall stuff very quickly and our recall is mostly automatic based on the context that you find yourself in, which is why it's so important that this takes in context.

この記憶の全てが、検索強化生成と情報検索が重要である理由です。なぜなら、私たち人間の脳は、非常に迅速に物事を思い出す能力を持っており、私たちの思い出しは、主に自分が置かれている状況に基づいて自動的に行われるからです。そのため、このプロセスが文脈を取り入れることが非常に重要なのです。

It takes in medical questionnaires.

医療アンケートを取り入れます。

I don't know if, I didn't see anything mentioning that this takes in patient charts.

私は患者のチャートを取り込むということについては何も見ていません。

So that would be the next step.

それが次のステップになるでしょう。

I think that's maybe one gap.

私はそれについて言及するものは見かけませんでした。

And this study is if you can ingest the entire patient chart such as their full medical history, and I know that there's other studies that have done this, that is gonna be like the next level.

この研究は、患者の完全な診療記録などを摂取できるかどうかについてのものであり、他の研究でもこれを行っていることがわかっています。それは次のレベルになるでしょう。

So anyways, you can see the architecture here.

とにかく、ここにアーキテクチャがあります。

It's a lot of familiar stuff.

それは非常に身近なことです。

There's examples of kind of breaking down the patient's chief complaints and their symptoms into features.

患者の主訴や症状を特徴に分解する例があります。

And so if you think in terms of machine learning, you say, okay, a symptom and a diagnosis and another fact like, do they have insomnia?

そして、機械学習の観点で考えると、症状と診断、そして他の事実、例えば、不眠症があるかどうかなどといったことです。

Are they not sleeping well?

彼らは十分に眠れていないのでしょうか？

What are the other conditions?

他の状態はどうなっていますか？

And you can also have symptoms that are not relevant or confounding symptoms, which can make it even harder if you have multiple things wrong with you.

また、関係のない症状や混乱を招く症状もあります。それが複数の問題を抱えている場合、診断がさらに困難になります。

That makes diagnosis even harder because then you can go down false rabbit holes and stuff.

それによって、診断がさらに難しくなることもあります。なぜなら、間違った推測に基づいて行動してしまうことがあるからです。

Okay, so long story short, they did a lot of tests.

さて、長い話を短くすると、彼らは多くのテストを行いました。

And so the data here is, it's kind of understated, but they did their homework, they dotted their eyes, they crossed their T's.

このデータは、控えめに言っていますが、彼らは宿題をきちんとやって、点を打ち、Tを交差させました。

And so they did a bunch of different experiments in terms of doing side-by-side comparisons to really show that this framework stands head and shoulders above just using these models.

彼らは、このフレームワークが単にこれらのモデルを使用するだけよりもはるかに優れていることを実際に示すために、横並びの比較を行うさまざまな実験を行いました。

So GPT-3.50 shot had an accuracy of one third.

したがって、GPT-3.50の正確性は1/3でした。

That's not the worst in the world, especially if you just throw a bunch of stuff at it, like wet spaghetti at the wall, and it can accurately diagnose you a one third of the time.

それは世界で最も悪いことではありません、特に、まるで壁に湿ったスパゲッティを投げつけるように、いろいろな情報を投げつけるだけで、それがあなたの病気を正確に1/3の確率で診断できる場合には。

That's not bad for a model that is that cheap and that fast.

それは安価で速いモデルにとっては悪くありません。

But even going up to GPT-4 with few shot and information retrieval, it was only two thirds accurate.

GPT-4を使用して、少数の例示と情報検索を行ったとしても、精度は3分の2程度に過ぎなかったのです。

However, this system, which includes a few more steps in terms of breaking down the problem and looking at it from different perspectives, they were able to get up to 83.3% accurate diagnosis.

しかし、このシステムでは、問題を分解し、異なる視点から見るといういくつかのステップを追加した結果、正確な診断率は83.3%にまで向上しました。

Now this is obviously still not good enough to be reliable as a medical device, but we're getting there.

これは明らかに医療機器として信頼性がある程度まで達しているわけではありませんが、そこに向かって進んでいます。

And this is before we even have LLaMA 3 and GPT-5 and bigger context windows and that sort of thing.

これは、まだLLaMA 3やGPT-5、より大きな文脈ウィンドウなどが存在しない段階での話です。

And so you're seeing these really kind of gravimetric shifts as we approach the ability to get accurate, reliable, fast and cheap medical diagnosis.

そして、私たちは正確で信頼性のある、迅速かつ安価な医学的診断能力に近づくにつれて、これらの重要な変化を目の当たりにしています。

And again, I didn't see anything, I might've missed it, I read this pretty closely, but I didn't see anything about ingesting entire patient charts.

もう一度言いますが、私は何も見ませんでした。見落とした可能性もありますが、これをかなり注意深く読みましたが、患者の全診療記録を取り込むことについては何も見当たりませんでした。

You add that, you add genetic information, you add family history and other stuff about patient history.

それに加えて、遺伝情報や家族歴など、患者の経歴に関する他の情報も追加されるでしょう。

And I wouldn't be surprised if within six to 12 months, we see closer to 90, 95 or even 99% accurate diagnosis.

そして、6〜12ヶ月以内に、90%、95%、または99%に近い正確な診断が見られることも驚くことではありません。

Now, also one thing to caution is that their dataset was only 61 different diseases.

ただし、彼らのデータセットは61種類の異なる疾患のみでした。

There are tens of thousands of potential diagnoses out there.

実際には、数万種類の潜在的な診断が存在します。

If you look at the ICD-10, 11, for instance, I think there's 17,000 possible diagnoses and many of which don't even have good tests for.

例えば、ICD-10、11を見ると、17,000以上の可能な診断があり、そのうち多くは十分な検査方法が存在しないものもあります。

So for instance, as someone who's personally been going through dietary issues, there isn't a blood test for a lot of these things.

私自身も食事に関する問題を抱えているため、これらの多くの病気に対して血液検査はありません。

You can only figure it out by virtue of keeping a food diary.

食事記録をつけることでしか問題を診断する方法はありません。

That is literally the only way to diagnose some issues.

それがいくつかの問題を診断する唯一の方法です。

So obviously it's gonna be harder to get some of these things.

したがって、これらのことを実現するのはより困難になるでしょう。

They even, so one thing that I wanna point out though, is that part of their test was they tested this system with and without retrieval.

実際、彼らのテストの一部は、検索の有無でこのシステムをテストしたということです。

And so without retrieval, it was 5% less accurate.

検索なしでは、正確性が5%低下しました。

And so they did a really good job of showing this architecture, this specific architecture in these methods, really do something different.

そして、彼らはこのアーキテクチャ、具体的なアーキテクチャ、およびこれらの手法が本当に異なる結果をもたらすことを非常によく示しました。

Now, if you remember back in the day, doing optical image recognition and RNNs and CNNs and those sorts of things, convolutional neural networks, like getting a 1% improvement was pretty big, but we're seeing, we're getting 20% improvement, 5% improvement.

さて、昔の話を思い出してみてください。光学画像認識やRNN、CNNなどを行う際、1%の改善はかなり大きなものでしたが、私たちは20%の改善、5%の改善を得ています。

And this is, again, relatively early in the grand scheme of things.

これは、大局的な観点から見れば、まだ比較的早い段階です。

So overall, I find this study to be pretty exciting and also pretty validating, pretty vindicating, because these kinds of cognitive architectures are the things that I've been working on for a few years now.

全体的に、私はこの研究が非常に興味深く、またかなりの検証を行っていると考えています。なぜなら、私が数年間取り組んできた認知アーキテクチャのようなものだからです。

But again, I made the choice to focus more on messaging and communication.

ただし、私はメッセージングとコミュニケーションに重点を置くことを選択しました。

But yeah, so this is very close to how I would have approached this problem.

しかし、これは私がこの問題に取り組む方法に非常に近いものです。

And they also came up with some stuff that I wouldn't have thought of, which, hey, that's why I'm here, that's why I'm communicating it, because these are some really great ideas.

彼らはまた、私が思いつかなかったいくつかのアイデアも出しています。それが私がここにいる理由であり、それを伝える理由です。これらは本当に素晴らしいアイデアです。

So thanks for watching.

ご視聴ありがとうございました。

I hope you got a lot out of this.

この動画から多くのことを学んでいただけたら嬉しいです。

Like, subscribe, et cetera, et cetera.

チャンネル登録やいいねをお願いします。

Check in the description.

詳細は説明欄をご確認ください。

I've got a lot of links to my other channels.

他のチャンネルへのリンクがたくさんあります。

Also, if you find my work compelling, come and jump over and join my patreon.

また、私の仕事に共感を持っていただける場合は、ぜひパトロンに参加してください。

Every little bit of support helps.

少しのサポートでも助かります。

Plus, you get access to my exclusive Discord community and my monthly webinar.

さらに、独占的なDiscordコミュニティや月次ウェビナーにアクセスできます。

So, cheers.

それでは、乾杯！

この記事が気に入ったらサポートをしてみませんか？