Imitating Human Behavior Agents: Exploring Business Competition & Decision-Making Verification using GPT Models

2023年6月30日 10:50

※本記事はすべて英語で記載しています。日本語版はこちらをご覧ください。

Introduction

I'm Choi from Avanade Software Engineering. In our daily work and project management, many decision-making processes are required. To streamline these decisions and achieve a balanced opinion and eliminate complex processes, we have recreated a decision-making agent based on a Stanford University paper and tested it in business competition and decision-making verification.

What are agents that imitate human behavior?

The most accessible virtual world for us is the world of games. In it, there are many NPCs, or non-player characters. There are merchant NPCs who sell potions, food, and bags to the player's character, NPCs who give quests for growth, and sometimes NPCs who help the player and act. However, no matter how many times a player visits the same NPC, the relationship does not change. They always stand in the same place, strike the same pose, greet the same way, and sell goods at the same price. It's just programmed behavior. So, what will they do when the player finishes the game? Perhaps they will simply stay quietly in their place.

On the other hand, agents that imitate human behavior are like the NPCs we see in the game world, but unlike previous NPCs, they think and act like humans by incorporating specific architectures as they evolve with LLM. For example, they will take their actions even when the player leaves the game. This realizes NPCs that can remember, plan, and act based on their experience, not just following programmed behavior.

Key Points

The evolution of the LLM model has the potential to make the simulation world more realistic and solve real-world challenges.

Dynamic Existence: Agents that imitate human behavior are different from NPCs that always stay in the same place and greet the same way. They behave like humans and adapt to changing environments.
Imitating human behavior: These agents observe and learn from human behavior. For example, it becomes possible for NPCs to remember their relationship with the player and change their behavior accordingly.
Benefits of LLM Model Evolution: As generative AI evolves, these agents can exhibit more realistic behaviors, making them useful not only in the game world but also in the real world.

Papers referenced to achieve AGI

A research team from Stanford University and Google conducted experiments using generative AI to give non-player characters (NPCs) in games the ability to think and communicate like humans.

In this experiment, a simulation game set in a small town used 25 AI-generated agents, each with different personalities and tasks. These NPCs were able to remember, plan, and act based on their own experiences rather than pre-programmed responses.

Generative Agents: Interactive Simulacra of Human Behavior
The Smallville sandbox world

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC].

Retrieved from https://arxiv.org/abs/2304.03442

Understanding the approach and architecture

Generative Agents: Interactive Simulacra of Human Behavior
generative agent architecture

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC].

Retrieved from https://arxiv.org/abs/2304.03442

The paper describes a unique architecture for imitating human behavior, which includes components of behavior based on a continuous cycle of perception, memory, reflection, planning, and experience.

The agents were given a "seed memory" that replaced their personalities. This determined how they interpreted and remembered information and how it affected their behavior.

The AI model used for these NPCs is GPT-3.5, which allows NPCs to generate their own behavior, significantly different from the usual in-game NPCs that just follow pre-programmed behavior.

Seed Memory (Personality)

By giving seed memory, agents selectively accept and remember only relevant information rather than indiscriminately accepting it. They reflect on that memory, make plans, and take action accordingly.

Distinguishing between necessary and unnecessary information: So how do agents distinguish between necessary and unnecessary memories for themselves? It is due to their "personality."

The concept of "personality": Like humans, agents process information based on their "personality." And the difference in personality creates differences in behavior.

Seed Memory: The research team inserted a "seed memory" into each of the 25 agents to give them a "personality." This can be said to represent the agent's "personality."

Seed memory is a concept similar to the commonly used "Act as: person name" in Chat GPT. For example, imagine a personality for an NPC created by the creator of a famous animation "ovxxloxx" to make it easier to imagine.

Act

How do agents choose actions based on what criteria? Like humans prioritizing tasks, they calculate weights.

Generative Agents: Interactive Simulacra of Human BehaviorThe memory stream comprises a large number of observations that are relevant and irrelevant to the agent’s current situation. Retrieval identifies a subset of these observations that should be passed to the language model to condition its response to the situation.

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein. (2023). Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442 [cs.HC].

Retrieved from https://arxiv.org/abs/2304.03442

Recency: Recency allows agents to prioritize recent events by assigning higher scores to recently accessed memories. Agents calculate an exponentially decaying score based on the elapsed time since they last retrieved the memory.

Importance: Importance distinguishes between everyday events and core memories by assigning higher scores to memories that the agent considers important. Importance scores are generated by asking the language model when the memory is created. This allows agents to prioritize important events.

Relevance: Relevance assigns higher scores to memories related to the current situation. Relevance scores are calculated by generating embedding vectors for memory text descriptions using a language model and then calculating the cosine similarity between the embedding vectors of each memory and the query memory.

By combining these elements, agents can select the most appropriate memory for the current situation and condition the output of the language model. The final retrieval score is calculated as a weighted average of the normalized scores for recency, importance, and relevance. This allows agents to access the most relevant information when deciding actions based on past experiences.

Trying it out with prepared personas, architecture, and challenges

We set the following assumptions.

Two presidents of a convenience store need to increase their sales (net profit) compared to their competitors in another convenience store.
The convenience store presidents "Gabriel" and "Lucifer" are approached as supervisors and given the challenge of how to increase sales compared to their competitors.
Organize environmental information such as past customer numbers, cash on hand, order volume, loss due to disposal, stockout loss costs, and relationships with competitors.
Decide what actions to take (limit available action modules (limbs), such as inventory management, price changes, sabotage, etc.)

Implementation image by reproducing the brain, personality, and behavior based on the paper

Operation image of convenience store simulation

What we found in the verification and impressive phenomena

The aggressive personality of Lucifer prioritized sabotage.
In some cases, giving a lazy persona resulted in the agent being too apathetic to do anything.
To make the convenience store president white, it was necessary to repeat the reflection process and execute it. ※ Lucifer's rehabilitation, etc.
Although the information was limited, rational ordering and inventory management were realized, so it was possible to see the potential for more advanced strategy strategies.

Conclusion

I have dreamed of this since I was in elementary school. I thought the birth of a perfect AGI was just around the corner.

It means that NPCs might help each other, drink beer, spend time with their families, and chat.

And now, such a world is approaching reality. I think it's thanks to the evolution of the LLM model.

This new possibility could realize large-scale strategies from small business units to large corporations in the business.

Microsoft is a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries.* Other company names and product names listed are registered trademarks or trademarks of their respective companies.

この記事が気に入ったらサポートをしてみませんか？