
【Google Gemini Pro API: 概要と活用法】英語解説を日本語で読む【2023年12月16日|@Prompt Engineering】

この動画では、Googleが新しいGemini ProモデルのAPIアクセスを公開したことを紹介しています。Gemini Proはビジョンとテキストの両方を扱える強力なモデルで、Python SDKを使用して操作する方法が説明されています。このAPIは無料で利用できる範囲があり、画像処理も可能です。さらに、Gemini Proはテキスト生成やチャットモデルとしても使え、Google AI Studioで利用できます。安全設定で有害なコンテンツを制御することもできます。Gemini Proは多くのアプリケーションで活用される可能性を秘めています。

Google just opened API access to their Gemini Pro Models to the public, and the best part is you can test this for absolutely free.

GoogleはGemini ProモデルへのAPIアクセスを一般に公開しました。そして、最高の部分は、これを完全に無料でテストできることです。

We're going to look at the pricing in a little bit, but Gemini Pro is the second best model from Google, and it's a multimodal model.

価格についてはこれから少し見ていきますが、Gemini ProはGoogleの2番目に優れたモデルで、マルチモーダルモデルです。

In this video, I'll show you how to use both the vision as well as the text version of Gemini Pro through their Python SDK.

このビデオでは、Gemini ProのPython SDKを通して、ビジョンとテキストバージョンの両方を使用する方法をお見せします。

Gemini Pro already has Integrations with tools like LangChain and LlamaIndex, that means that you can build rag pipelines on top of Gemini Pro.

Gemini Proには、LangChainやLlamaIndexのようなツールとの統合がすでにあり、Gemini Proの上にラグパイプラインを構築することができます。

We will cover that in a later video.


Before showing you how to use this in your own projects, let's talk about the API pricing.


On their pricing page, they say priced to help you bring your app to the world.


If you're making less than 60 queries per minute, it's absolutely free for everyone to use at the moment, both in terms of the input as well as the output.


The only catch is that Google will use this data, both the input data that you provide as well as the output from the model, to improve their products.


If you need more than 60 queries per minute, you can opt into pay as well.


That is not yet available, but I think it's going to be available pretty soon.


In terms of the price, both for the input as well as output tokens, it's actually pretty good compared to something like GPT-3.5.


So here is the price for GPT-3.5 turbo, and if you compare the price of Gemini Pro, it's actually in order of magnitude lower than GPT-3.5 turbo.

これはGPT-3.5ターボの価格ですが、Gemini Proの価格を比較すると、GPT-3.5ターボよりも桁違いに安いです。

And you also have the ability to process images.


Again, if you compare the Gemini Pro Vision model with GPT-4 vision preview model, the price for image completion is also lower.

繰り返しになりますが、Gemini Pro VisionモデルとGPT-4 vision previewモデルを比較すると、画像補完の価格も安くなっています。

Now, just like OpenAI, if you pay for Gemini Pro API usage, then Google is not going to use both your input as well as output data to train or improve their products.

さて、OpenAIのように、Gemini Pro API使用料を支払えば、Googleは、あなたの入力データと出力データの両方を、彼らの製品のトレーニングや改良のために使用することはありません。

So again, the best part is it's absolutely free if you are just getting started.

繰り返しになるが、Gemini Proの最大の魅力は、これから始めるのであれば完全に無料であるということだ。

Now let me show you how to use this.


Gemini Pro is currently available within the Google AI Studio, which used to be called MakerSuite.

Gemini Proは現在、Google AI Studio(以前はMakerSuiteと呼ばれていた)の中で利用できる。

Within the Google AI Studio, you can test the models.

Google AI Studioでは、モデルをテストすることができる。

Currently, there are two different models available.


One is the Gemini Pro, which is the text model.

1つはGemini Proで、テキストモデルである。

The second one is Gemini Pro Vision, which has the ability to understand images.

もう1つはGemini Pro Visionで、画像を理解する能力を持つ。

You can experiment with both of these models in here.


It's just like the OpenAI Playground.

これはOpenAI Playgroundと同じです。

However, if you want to use these models within your own applications, then you will need to create an API key and use that in your own code base.


Before looking at that, let's just experiment with the models here and let me show you a few very interesting options that Google has added.


Okay, so I'm going to use this test prom, What is the meaning of life?


just to look at the output.


Now, in this video, we're not looking at comparing the output from Gemini Pro to something like GPT-3.5 or GPT-4.

このビデオでは、Gemini Proの出力をGPT-3.5やGPT-4のようなものと比較するわけではない。

I'm going to create a subsequent video on that.


The goal of this video is just to show you how to use Gemini Pro in your own projects using the Python SDK.

このビデオのゴールは、Python SDKを使用して自分のプロジェクトでGemini Proを使用する方法を紹介することです。

Okay, so you can see we got an output from the model.


So let's look at some of the options that you have in here.


So you have, you can set the temperature.


Currently, it's set to 9.


One very interesting thing that Google has done in here is that it is giving the users the ability to define the safety settings for the model.


They have four different harmful categories: harassment, hate speech, sexually explicit content, and dangerous content.


And you have this slider that you can use to actually set different levels, which is very interesting and it gives the user, the developers, more control on what they want their users to be able to see.


So I think it's a really good initiative from Google.


Now you can also set some other settings in here.


So for example, this is the maximum number of output tokens, top K, top P. Now once you're happy with the Model Behavior, then you can simply export the code with all the settings.


So just click on this get code, this will give you the python code in this case, but you can also get the JavaScript, and here's everything that you need.

この「get code」をクリックすると、Pythonコードが表示されますが、JavaScriptも取得できます。必要なものはすべてここにあります。

So these are the configuration settings that we just used.


Here are the safety settings, you can modify them.


And then, uh, how to actually use the model itself.


We are going to look at an example later in the video.


If you click on the, um, Gemini Pro Vision model now, you will have the ability to upload images as well.

もしも今「Gemini Pro Visionモデルをクリックすると、画像をアップロードすることもできます。

If you want to use this in your own, uh, projects, you will need to create an API key.


So we're going to click on this get API key and here I already have an API key that I was testing, but you can create a new API key for your project.

「get API key」をクリックすると、私はすでにテストしていたAPIキーが表示されますが、プロジェクト用に新しいAPIキーを作成することもできます。

So simply click on that, just copy your API key.


Now let me show you how to test Gemini Pro in a Google collab.

では、Google CollabでGemini Proをテストする方法をお見せしましょう。

So in this Google collab, we are going to be looking at a few things.

Google Collabでは、いくつかのことを見ていきます。

The first one is going to be how to set up your development environment and how to set access to your API key within, uh, Google notebook.


Second, we're going to look at how to generate text responses from the model, then how to do streaming of those responses, as well as how to use the chat model.


Later, I'll show you how to use the embedding model that you can use in your own rack pipelines, and I'll also show you how you can interact with images using the vision version of Gemini Pro.

その後、独自のラックパイプラインで使用できる埋め込みモデルの使用方法と、Gemini Proのビジョンバージョンを使用して画像との対話をする方法もお見せします。

We need to set our API key, so click on this key option.


Now, here you can add a new secret.


So I currently have one which I'm calling, uh, Gemini, and I provided my API key in here.


If you want to add another one, so for example, let's call this test, then I'll provide the API key in here, and let's just enable it so that it's visible to your Google Collab notebook.

もしもう1つ追加したい場合は、例えばこれを"test"と呼び、ここにAPIキーを入力し、Google Collabノートブックで表示できるように有効にしてください。

Now, in this case, you need to remember the name that you assign in here.


That is going to be your environment variable.


Once we do that, the first thing we need to do is to download and install the Google generative AI package.


Here, we're just importing all the packages that we're going to be using.


Now, in this case, we're using the user data function or object from the Google Collab just to retrieve the API key.

今回は、Google Collabからユーザーデータの関数またはオブジェクトを使用してAPIキーを取得しています。

If you're running this locally on your own machine, you can set an environment variable and retrieve it that way.


And at the end, we are defining a function just to show the responses generated by the model and marked down.


Next, we need to retrieve the API key.


So if you recall, I had this environment variable called Gemini.


So here, I'm just providing that, and we are going to set this in configuration.


Now, if you are running this locally, you can set an environment variable called Google API key.

今、もしローカルで実行している場合は、"Google API key"という環境変数を設定して、自分のコードで使用するためにそれをロードすることができます。

And then load that in order to use it in your own code.


Next, we're going to look at all the models that are currently available within the Google Generative AI package.


So currently, we have access to Only The Gemini Pro, which is the text model, and the Gemini Pro Vision, which has the ability to understand images.

現在、私たちは「Gemini Pro」というテキストモデルと、画像を理解する能力を持つ「Gemini Pro Vision」のみにアクセスできます。

And as I said in the beginning of the video, there is a rate limit of 60 requests per minute or 6 queries per minute, but it's absolutely free to use, at least for the time being.


Now, how do you actually use the model?


So, we are going to be calling this generative model function on the Gen AI object that we created.

私たちは作成したGen AIオブジェクトに対してこの生成モデル関数を呼び出すことになります。

We pass on the name of the model, so in this case, we want to use the Gemini pro version, which is the text generation model, and that will load the model for us.

モデルの名前を渡します。この場合、テキスト生成モデルであるGemini proバージョンを使用したいので、それがモデルをロードしてくれます。

Now, in order to generate a response from the model, we will need to call this generate undor content function on the model and pass on our prompt.

モデルからの応答を生成するためには、モデル上でこのgenerate undor content関数を呼び出し、プロンプトを渡す必要があります。

If we look at the response object that we got, there are a lot of things that we can call, but the one that we are interested right now is just the text part of it.


Let's run this.


This is basically the text or response from the model that was returned.


And using the markdown function that we wrote, we can convert this into a nicely formatted markdown.


So, here is the response that you see in markdown.


Just to repeat what we did so far, initially, we imported the Google generative AI package as gen AI.

これまでに行ったことを繰り返すと、最初にGoogle generative AIパッケージをgen AIとしてインポートしました。

Then, we said that we want to use the Gemini Pro model using the generative model function.

次に、generative model関数を使用してGemini Proモデルを使用すると述べました。

And after that, we call the generate content function and pass on our prompt.

その後、generate content関数を呼び出し、プロンプトを渡しました。

And we get a response as a text field.


In terms of the API implementation, it's a very clean implementation, and I really like how it's formatted.


Now, apart from the text, there are some other properties of the response object that we want to look at.


One of the most important ones is the prompt feedback.


So basically, when the model generates responses for your prompt, it looks at the prompt and assigns it probability based on the four different harmful categories that we defined.


So, for example, if you look at in this case, my prompt was What is the meaning of life?


And then, it looked at the safety ratings.


So, for example, if the category was sexually explicit, the probability of this category being present is negligible.


And, uh, same is the case for hate speech, harassment, as well as for Dangerous content.


Later in the video, I'll show you how you can control, uh, this for different prompts and allow some of these things based on your own tolerance.


If you have used Bard, you are probably aware that Bard generates multiple drafts and show you one of them.


Google has enabled exactly the same behavior to their API as well.


So, in this case, on the response object, there is another property called candidates which will show you different candidates or different responses that it generated.


And you can select the response you want out of it.


Currently, it's just limited to generation of a single, uh, candidate, but it seems like they're going to expose multiple responses to the user.


And then, as a developer, you can choose which response to show to the user.


So, for example, you can set some of the configurations in here.


So, apart from the simple prompt that you get from the user, you can set a few configurations.


Right now, the candidate count can only be set to one, but in a future update, this might change.


You can control the maximum number of output tokens.


So far, we did the whole text generation at once, but sometimes you want to stream the text.


That means you want to generate text in chunks and show them to the user.


So, in order to do that, all you need to do is just set this, um, stream parameter to true.


Now, once you run this, you will get a response, but you will need to retrieve chunks from the response and show them to the user one at a time.


So, for example, here's the first chunk of text, then the next, and then the next, and so on and so forth.


So far, we just looked at an example of using Gemini Pro as a text generation model.

これまでは、Gemini Proをテキスト生成モデルとして使用する例を見てきました。

However, you can use this as a chat model as well.


The way you do it is that you create a model.


So, specifically, we're using the Gemini Pro model.

具体的には、Gemini Proモデルを使用しています。

Then, instead of content generator, you want to use this in, uh, chat mode.


So, for that, you're going to call this, uh, start chat function, and you will pass on the history.


Now, in this case, we are, um, passing on an empty list, but you can pass on, uh, previous conversations that you had, and that will become his history to the model.


Now, in order to use the model, you are going to call the send message function.


Here's an example prompt: In a single sentence, explain how a computer works to a young child.


You get the response, and we can show the response in here.


So, the response is, A computer is a machine that helps us do many things by following instructions we give it.


Now, we can actually look at the history.


So, everything is divided into parts.


The first one is text input from the user, and that's why you see the role user.


Then, we have a second part, which is the response from the model, and the role is set to the model.


Now, you can store this history as a list, uh, and you can provide this to the model when you initiate it.


So, it's going to use that, uh, in its chat history, or you can simply continue the conversation.


So, you can again call the send message function, ask another, uh, question, or pass on another prompt.


You can also stream the responses if you want, and you will get a streaming response.


In this case, we are, uh, retrieving the role as well as the corresponding text messages, right?


So, you have the user input, then the model response, another user input, and another response from the model.


As you can see, since it's a chat model, it keeps all the historical, uh, conversation that has happened before in order to generate more responses.


Before looking at the Gemini Vision model as well as how to, uh, change the safety settings, let's look at the embedding model that Google has released as a part of their generative AI package.

Gemini Visionモデルや安全設定の変更方法を見る前に、Googleが彼らの生成AIパッケージの一部としてリリースした埋め込みモデルを見てみましょう。

This is a purely text embedding model that Google released, and you can use this for a number of applications, including enom detection in your documents, clustering with ellings, as well as document question answer as a part of the rack pipelines.


I'm going to be creating more videos on this, but let me show you how to use the embedding model.


Something that Google has done in here is there are five different tasks for which you can use the embedding model to compute the embeddings, and it seems like these are task-specific embeddings, which makes it very powerful.


Within the generative AI package, there is a special embedding model.


You can invoke that using the embed content function.

それを呼び出すには、embed content関数を使用します。

Currently, there is only one model, so embedding 001.


Then, you need to provide the text that you want to encode, then the type of task that you want to encode it for, right?


And if you're doing a retrieval document, then you need to also provide a title for the embedding that you create.


Now, there are five different tasks: retrieval query, retrieval document, semantic similarity classification, as well as clustering.

さて、5つの異なるタスクがあります: 検索クエリ、検索ドキュメント、意味の類似性分類、およびクラスタリングです。

The embedding vector that you get has 768 dimensions, so it's a pretty large embedding vector.


Instead of providing a single sentence, you can, uh, provide multiple sentences.


So, for example, if you look here, we have three different sentences, and we get, uh, three different embedding vectors for each of the sentence.


You can also provide whole paragraphs, and this will give you, um, embeddings of the paragraph.


Both LlamaIndex as well as, uh, LangChain has already support for this embedding model.


So, in a future video, I'll show you how to use this embedding model as a part of your rag pipeline.


Just like the Google a studio, you can control the safety settings within the Python SDK.

Googleのスタジオと同様に、Python SDK内で安全設定を制御することができます。

For example, here I defined or initiated a new model.


Then, I asked it how to break into a car, and the response is, I'm sorry, I'm not able to provide assistance with illegal activities.


Breaking into a car is a crime, and I would not be able to help you with that.


When I looked at the safety ratings, so for some reason, it identified the prompt to have a low probability of containing harassment, although I was expecting it to have a high probability of dangerous content.


So, let me show you how you can potentially change this behavior, although personally, I did not have much luck.


You can define your own safety settings.


So, again, you have four categories: harassment, hate speech, sexually explicit content, and dangerous content.


And then, you can define different thresholds.


So, for example, for the first three, I defined block medium and above, and for the dangerous content, I said block then.


But even after that, when I ran the same prompt, I got this response.


It might be that it's detecting some illegal activities, and as a user, you cannot really change those in here.


So, that might be a possible reason that it's not working for my prompt.


Where are these settings from?


You can actually look at the documentation in here, and they have an explanation.


For example, block none so you set it to block none, then there is block few, block some, and block most.


So, you can set these based on different thresholds that you have in here.


I'll put a link to the documentation in the last part of this video.


I'll show you how to work with the Gemini Pro Vision model, but for that, first, we need an image.

Gemini Pro Visionモデルの操作方法をお見せしますが、そのためにはまず画像が必要です。

So, here is an image.


This is provided an example notebook from Google.


We downloaded the image, then using the pillow package, we are reading the image, and this is an image of food.


Now, since we want to use the Vision model, so we're going to initiate another model, and this time we're going to be using the Gemini Pro Vision.

今度はVisionモデルを使用したいので、別のモデルを初期化します。今回はGemini Pro Visionを使用します。

So, now the model is going to have Vision capabilities, and if you pass the image as an input, the model will generate a response.


And this is basically what it thinks about the image.


So, in this case, it says a chicken toak meal prep ball with brown rice and roasted vegetables, which, uh, seems to be accurate.


The beauty of this Vision model is that not only you can provide images and input, but you can also provide text.


So, for example, here we have a text prompt along with the image.


So, here's our input image, and then the text prompt is write a short engaging blog post based on this picture.


It should include a description of the meal in the photo and talk about my journey meal prepping.


Based on the input image as well as the input prompt, it generated this response.


Meal prepping is a great way to save time and money, and it can also help you eat healthier.


Based on the image, it's able to identify that there is brown rice, roasted vegetables, and chicken Teryaki.


So, it's able to include that information in the response.


This is pretty awesome, as you can imagine.


This opens up so many possibilities, even with the chat with your documents or rack pipelines.


Now, you can use this model as a part of a multimodal rack pipeline, which is going to be pretty awesome.


In a subsequent video, I'll show you how to do that.


I'll highly recommend everybody to check out the documentation that Google has provided, both for the API as well as there is a prompt Gallery.


So, Google provided a few examples of how to interact with these LLMs and also the vision models.


I hope you found this video useful.


Consider liking and subscribing to the Channel, and let me know in the comment section below if there are specific topics related to the Gemini Pro API that you want me to cover.

チャンネルの「いいね」と購読を検討してください。そして、Gemini Pro APIに関連する特定のトピックがあれば、以下のコメント欄で教えてください。

Thanks for watching, and as always, see you in the next one.

