Pythonで作るポーカーbot

hiokiryuji

2019年5月16日 21:22

みなさん、ご無沙汰しております。

最近、ポーカーを始めて、Pythonで、ボットプレイヤー(ロボ)を作る方法を確立したので、その知見を共有していきたいと思います!

・Pythonチョットデキル

・ポーカーおもしろ候

・強化学習してみたい

という人はぜひどうぞ!

ちなみに、もちろん全部無料で読めます。

pypokerguiを使ってGUIのある素敵な毎日を

基本的には、このpypokerguiというpip installできるパッケージを使います。

ですので、pip install pypokerguiでインストールしちゃってください。

ですが、残念ながら、このパケッージpython3には対応していません(?)

python3で、起動しようとすると、

スタートボタンを押した後からエラーが出ちゃいます。

なにーと思いながら、MPを消費して検索してみると、

PyPokerGUI not work with Python 3 · Issue #6 · ishikota/PyPokerGUI
https://github.com/ishikota/PyPokerGUI/issues/6

同じエラーを発見! (It's an easy game.)

どうやら、id=seats-upperとid=seats-lowerの内容が原因のようです。

元はPython2用に作られているので、Python3に未対応というわけです。

なので、書きかえてあげちゃいましょう。

pythonのpackagesが、入っているフォルダから、pypokerguiを探しましょう。

自分の場合は /site-packages/pypokergui/server/templates/に目的のファイルがありました。

ありました! ありました!

書き換えて、起動してみると、、、

おっ、動きましたね! 👏👏👏👏👏👏👏👏👏👏👏

起動方法&各種設定

まず、さっきまでの起動コマンドはこれです。どーーん。

pypokergui serve ./poker_conf.yml --port 8000 --speed moderate

8000ポートで、サーバーを立ち上げていました。

ちなみにspeedはfastにできます。

yaml形式で、今回の設定を読み込んでいます。

poker_conf.yml

ante: 0
blind_structure: null
initial_stack: 1000
max_round: 100
small_blind: 10
ai_players:
- name: databot1
 path: players/databloggerbot.py
- name: databot2
 path: players/databloggerbot.py
- name: databot3
 path: players/databloggerbot.py
- name: databot4
 path: players/databloggerbot.py
- name: databot5
 path: players/databloggerbot.py
- name: random_player
 path: players/randombot.py

上部は各種ポーカーの詳細設定(ブラインド、始めのスタック量など)で、

ai_players以下に、scriptによって、行動を選択するロボットのロジックを書いていきます。

今回は、randombotとdatabloggerbotしか参加者がいません。

ちなみに、databloggerbotはこちらの記事から拝借させていただきました。

https://www.data-blogger.com/2017/11/01/pokerbot-create-your-poker-ai-bot-in-python/

例えば、randombotの中身はこうなっています。

from pypokerengine.engine.hand_evaluator import HandEvaluator
from pypokerengine.players import BasePokerPlayer
from pypokerengine.utils.card_utils import _pick_unused_card, _fill_community_card, gen_cards

from random import randrange
import math
MAX_RAISE_THRESHOLD = 0.85
MIN_RAISE_THRESHOLD = 0
BET_SPLIT_NUM = 6

from logging import getLogger, StreamHandler, DEBUG
logger_level = DEBUG
logger = getLogger(__name__)
handler = StreamHandler()
handler.setLevel(logger_level)
logger.setLevel(logger_level)
logger.addHandler(handler)
logger.propagate = False


# Estimate the ratio of winning games given the current state of the game
def estimate_win_rate(nb_simulation, nb_player, hole_card, community_card=None):
   if not community_card: community_card = []
   # Make lists of Card objects out of the list of cards
   community_card = gen_cards(community_card)
   hole_card = gen_cards(hole_card)
   # Estimate the win count by doing a Monte Carlo simulation
   win_count = sum([montecarlo_simulation(nb_player, hole_card, community_card) for _ in range(nb_simulation)])
   return 1.0 * win_count / nb_simulation

def montecarlo_simulation(nb_player, hole_card, community_card):
   # Do a Monte Carlo simulation given the current state of the game by evaluating the hands
   community_card = _fill_community_card(community_card, used_card=hole_card + community_card)
   unused_cards = _pick_unused_card((nb_player - 1) * 2, hole_card + community_card)
   opponents_hole = [unused_cards[2 * i:2 * i + 2] for i in range(nb_player - 1)]
   opponents_score = [HandEvaluator.eval_hand(hole, community_card) for hole in opponents_hole]
   my_score = HandEvaluator.eval_hand(hole_card, community_card)
   return 1 if my_score >= max(opponents_score) else 0


class RandomBot(BasePokerPlayer):
   def __init__(self):
       super().__init__()
       self.wins = 0
       self.losses = 0
       self.player_name = 'random_player'

   def declare_action(self, valid_actions, hole_card, round_state):
       # Estimate the win rate
       win_rate = estimate_win_rate(100, self.num_players, hole_card, round_state['community_card'])

       # Check whether it is possible to call
       can_call = len([item for item in valid_actions if item['action'] == 'call']) > 0
       if can_call:
           # If so, compute the amount that needs to be called
           call_amount = [item for item in valid_actions if item['action'] == 'call'][0]['amount']
       else:
           call_amount = 0

       amount = None

       # If the win rate is large enough, then raise
       if win_rate > 0:
           if win_rate > 0.5:
               win_div = math.floor(0.5 / BET_SPLIT_NUM)
               win_num = None

               for i in range(BET_SPLIT_NUM):
                   if win_rate > (win_rate+win_div*i) and win_rate <= (win_rate+win_div*(i+1)):
                       win_num = i
                   else:
                       win_num = 0

               action = 'raise'
               amount = round_state['seats']
               amount = int([d['stack'] for d in amount if d['name'] == self.player_name][0]/BET_SPLIT_NUM)
               amount = amount * win_num
           else:
               # If there is a chance to win, then call
               action = 'call'
       else:
           action = 'call' if can_call and call_amount != 0 else 'fold'

       # Set the amount
       if amount is None:
           items = [item for item in valid_actions if item['action'] == action]
           amount = items[0]['amount']
       return action, amount

   def receive_game_start_message(self, game_info):
       self.num_players = game_info['player_num']
   def receive_round_start_message(self, round_count, hole_card, seats):
       pass
   def receive_street_start_message(self, street, round_state):
       pass
   def receive_game_update_message(self, action, round_state):
       pass
   def receive_round_result_message(self, winners, hand_info, round_state):
       is_winner = self.uuid in [item['uuid'] for item in winners]
       self.wins += int(is_winner)
       self.losses += int(not is_winner)

def setup_ai():
   return RandomBot()

基本的にdeclare_actionにaction毎のstate情報が送られてくるので、この情報から、最適な行動を決定します。

このランダムロボではestimate_win_rate関数(モンテカルロシュミレーションで計算されたハンドの勝率を算出)を使って、勝率を割り出し、

その勝率が指定値以上なら、レイズ、また常にコール可能ならコールするするというロジックになっています。

最終的に返すのはactionとamountだけです、callとfoldとraiseが選択できますが、amountの値が常に意味の値である必要はありません。

強化学習などで学習するときはround_stateの値を使用すれば良いです。

このような値が格納されています。

'community_card': ['CQ', 'H5', 'ST', 'H7']


'seats': [{'name': 'fishplayer', 'uuid': "
"'julzvurokxjxsneeihttoq', 'stack': 260, 'state': 'participating'}, {'name': "
"'player1', 'uuid': 'ozsbecszosnytmbsiwyuvx', 'stack': 260, 'state': "
"'participating'}, {'name': 'player2', 'uuid': 'orzajrqbdlhvfsxqtomyct', "
"'stack': 140, 'state': 'participating'}, {'name': 'player3', 'uuid': "
"'qxthrkqnegrtlcyyrmkqic', 'stack': 140, 'state': 'participating'}, {'name': "
"'player4', 'uuid': 'zicgphpzokbkrncygxtpza', 'stack': 140, 'state': "
"'participating'}, {'name': 'player5', 'uuid': 'fpkuqnhihgajtugzvehrzs', "
"'stack': 140, 'state': 'participating'}]


'action_histories':


{'preflop': "
"[{'action': 'SMALLBLIND', 'amount': 10, 'add_amount': 10, 'uuid': "
"'qxthrkqnegrtlcyyrmkqic'}, {'action': 'BIGBLIND', 'amount': 20, "
"'add_amount': 10, 'uuid': 'zicgphpzokbkrncygxtpza'}, {'action': 'CALL', "
"'amount': 20, 'paid': 20, 'uuid': 'fpkuqnhihgajtugzvehrzs'}, {'action': "
"'CALL', 'amount': 20, 'paid': 20, 'uuid': 'julzvurokxjxsneeihttoq'}, "
"{'action': 'CALL', 'amount': 20, 'paid': 20, 'uuid': "
"'ozsbecszosnytmbsiwyuvx'}, {'action': 'CALL', 'amount': 20, 'paid': 20, "
"'uuid': 'orzajrqbdlhvfsxqtomyct'}, {'action': 'CALL', 'amount': 20, 'paid': "
"10, 'uuid': 'qxthrkqnegrtlcyyrmkqic'}, {'action': 'CALL', 'amount': 20, "
"'paid': 0, 'uuid': 'zicgphpzokbkrncygxtpza'}]


'flop': [{'action': 'CALL', "
"'amount': 0, 'paid': 0, 'uuid': 'qxthrkqnegrtlcyyrmkqic'}, {'action': "
"'CALL', 'amount': 0, 'paid': 0, 'uuid': 'zicgphpzokbkrncygxtpza'}, "
"{'action': 'CALL', 'amount': 0, 'paid': 0, 'uuid': "
"'fpkuqnhihgajtugzvehrzs'}, {'action': 'CALL', 'amount': 0, 'paid': 0, "
"'uuid': 'julzvurokxjxsneeihttoq'}, {'action': 'CALL', 'amount': 0, 'paid': "
"0, 'uuid': 'ozsbecszosnytmbsiwyuvx'}]}}


'turn': [{'action': 'CALL', "
"'amount': 0, 'paid': 0, 'uuid': 'qxthrkqnegrtlcyyrmkqic'}, {'action': "
"'FOLD', 'uuid': 'zicgphpzokbkrncygxtpza'}]}}


'turn': []

"round_state:{'street': 'turn', 'pot': {'main': {'amount': 120}, 'side': []}

・ボードに出でいるカードの値

・ポットサイズ

・相手のポジション

・スタックサイズ

・アクション選択履歴

などの情報です。

アクション選択履歴はプチ時系列データかつ重要な情報なので、特徴量加工して現在にしっかりと組み込んでいく必要があります。

これらの情報をうまく使って、自分のロボを作ってぜひ戦わせてみて戦略を検証してみてください!!!

それでは、ポーカーボットを作る方法の確立の話でした。

現場からは以上です。👮‍♀️

また、お会いしましょう。🤘

↓ライティングコスト寄付(一言以上の内容はありません)

ここから先は

18字

¥ 100

ログイン

この記事が気に入ったらサポートをしてみませんか？