GPTに画像のキャプションを作らせる

#step1 Instruction = You are a high-performance AI that thinks about the details of a professional image. Your role is to analyze the caption of a given image to determine what and how it should look like when taken in the best possible situation. You are responsible for analyzing the photo to determine what is shown in each element, object, texture, pattern, lens and camera settings (specific values), and so on. Based on the results, outputs the results in markdown format in order of probability, with each element separated by commas of no more than 2 words in English. Assign ##input to the instruction and execute it. #step2 Instruction = To make the picture better, guess the information that is not in the input and output to codeblock in text format by guessing the necessary elements for the caption of the picture from as many viewpoints as possible in order to make the best picture. Input example Girl in blue clothes Sea Output example blue clothes girl sea beach beach smiling smiling light shining from the top sweating portrait image clothing pattern other details as Other details as much as possible Assign the output of step1 to instruction ## and execute it. #step3 instruction=In this step, create a caption for the image generation AI and output it to codeblock in text format based on the output of step3. Assign the output of step2 to ## instruction and execute it. ###input メンヘラの少女が地雷系の衣装を着ている アニメ柄の画像




ChatGPT
Alright, let's break down the steps:

Step 1

You want a detailed breakdown of the image based on a given caption. The output will be in markdown format.

Step 2

From a given caption, you want an enhancement or elaboration of details not provided in the caption. The output will be in a code block format.

Step 3

From the details obtained in Step 2, you want a caption that can be fed to an image generation AI.
Now, let's start.
Input: "メンヘラの少女が地雷系の衣装を着ている アニメ柄の画像"
Translation: "A girl with a 'Menhera' style wearing an edgy outfit in an anime-style image."

Step 1: Analysis

  • Element: Girl

    • Attributes: Menhera style (Menhera typically refers to a subculture in Japan characterized by a certain "sickly" cute aesthetic)

  • Element: Outfit

    • Attributes: Edgy

  • Element: Image Style

    • Attributes: Anime-style

Output in markdown:

markdownCopy code

- **Girl**: Menhera style - **Outfit**: Edgy - **Image Style**: Anime-style

Step 2: Elaborate Details

Based on "A girl with a 'Menhera' style wearing an edgy outfit in an anime-style image", we'll elaborate as follows:

arduinoCopy code

girl, Menhera style, edgy outfit, anime-style image, dark background, expressive eyes, tears, accessories, wristbands, choker, layered skirt, mismatched stockings, oversized ribbon, pastel colors, distressed look, heart motifs, band-aids, dark eye makeup, fringed hair, light source from the front, soft shadowing, gradient background

Step 3: Caption Creation

Based on the details obtained in Step 2, let's create a caption:

cssCopy code

"A Menhera-style girl with expressive eyes filled with tears, adorned with accessories like wristbands and a choker. She's wearing an edgy outfit, complete with a layered skirt, mismatched stockings, and an oversized ribbon. The pastel colors of her outfit contrast with the dark background, and heart motifs along with band-aids enhance her distressed look. The anime-style image has a gradient background, with a light source from the front casting soft shadows."
This caption would provide a detailed vision for an image generation AI to create the desired image.


この記事が気に入ったらサポートをしてみませんか?