[Unity Blocks 13]もう少し考えておくCPUを作る

2022年1月25日 15:15

次はランダムにピースを置いている部分をもう少し考えて置くようにしてみます。（さすがにスクリプト全部書けなくなった）

今回は二つの処理を追加してみます。
軽い処理と重い処理を使い分けてより良い場所にピースを置けるようにします。

結構スクリプトを変えました

前回との変更点として、ボードやピースの情報の複製が必要になってきました。

Boardクラスにコピーコンストラクタを実装するだけでよいかと思いきや、MonoBehaviourを継承したクラス（オブジェクトに追加したスクリプト）はnewにより複製を行うことができません。そのため、ボードにスクリプトを追加することをやめました。PieceController内でBoardクラスのインスタンスを生成することにしました。

また、ピースのオブジェクトとピースの情報（どの位置にピースがあるかみたいな情報。Boardにどこの位置にピースが来るかを渡すList<int[]>型の情報）を同じように保存すると、List<int[]>の複製がちょっと面倒になるため、ピースのオブジェクトにはPieceというスクリプトを追加し、その中でPieceInfoというList<int[]>型を持ったクラスのインスタンスを持つようにしました。

これでボードやピースの情報の複製にはコピーコンストラクタを使用できます。

１．（軽い処理）ボードの状態の評価関数を用いる

前の記事で現在のボードの状態からどのピースを、どこに、どの程度回転して、置くことができるか分かる関数を作りました（どのピースをどこに置けるかみたいなのを”手”と呼ぶことにします）。この関数を用いて、その手を実行した場合のボードを評価し、より高いスコアを取得した手を採用します。

評価関数です。


   // 自分のピースがある位置を順にCalcScoreに渡す
   public static int EvaluateBoardInfo(int[,]  BoardInfo, int PlayerNum)
   {
       int Score = 0;
       for (int IndexY = 0; IndexY < ConstList.BoardSize; IndexY++)
       {
           for (int IndexX = 0; IndexX < ConstList.BoardSize; IndexX++)
           {
               if (BoardInfo[IndexY, IndexX] == PlayerNum + 1)
               {
                   Score += CalcScore(IndexY, IndexX, PlayerNum);
               }
           }
       }
       return Score;
   }

   // 主な評価部分
   // スタート地点から遠く、できるだけ中央に寄っている状態を高く評価を行っている
   static int CalcScore(int IndexY, int IndexX, int PlayerNum)
   {
       switch (PlayerNum)
       {
           case 0:
               break;
           case 1:
               IndexX = ConstList.BoardSize - 1 - IndexX;
               IndexY = ConstList.BoardSize - 1 - IndexY;
               break;
       }
       return IndexY + IndexX - System.Math.Abs(IndexY- IndexX);
   }

かなり簡単な処理ですが、これでボードの状態を評価しています。
別に置いたピースの部分だけを評価してもいいんですが、一旦これで。

２．（重い処理）モンテカルロ法

重い処理としてモンテカルロ法を使います。今回のように置ける手の種類が多い場合は計算量が多くなるため最適化しないといけないですが、使ってみたいので使います。

モンテカルロ法は複数の手の中からどの手が一番良いかをシミュレーションして考える方法です。（厳密には多分違うかもしれないけど今回はそんなやり方）

ある一つの手のスコアを考える流れは単純で、
・その手以降の手を全てランダムに実行し、勝敗が決まるまで続ける。
・何度か勝敗が決まるまで実行し、その手の勝率がスコアとなる。
です。

検索するときは「囲碁」ってワードも入れると分かりやすいです。

最初のうちはシミュレーション終了まで時間がかかるため、ゲーム終盤から使います。

スクリプトはこんな感じ

   // モンテカルロ法の始まり部分
   // 全ての手を見ると処理に時間がかかるので、ランダムに10手を評価。ExtractionCountで10を定義している
   // AllPieceList：全プレイヤーの残りピース　AllInstructions：全ての手
   // Boardクラス：　ボードの状態　PlayerNum：プレイヤー番号　Turn：ターン数
   static object[] SimulationEvaluate(List<List<PieceInfo>> AllPieceList, List<object[]> AllInstructions, Board BoardInfo, int PlayerNum, int Turn)
   {
       List<int> Scores = new List<int>();
       List<object[]> NewAllInstructions = new List<object[]>();

       if (AllInstructions.Count == 0)
           return null;
       for(int Index = 0; Index < ExtractionCount; Index++)
           NewAllInstructions.Add(AllInstructions[Random.Range(0, AllInstructions.Count)]);

       foreach (object[] Instruction in NewAllInstructions)
       {
           int PieceIndex = (int)Instruction[0];
           int IndexY = (int)Instruction[3];
           int IndexX = (int)Instruction[4];
           List<int[]> PieceDesign = (List<int[]>)Instruction[5];

           Board NewBoardInfo = new Board(BoardInfo);
           List<List<PieceInfo>> NewAllPieceList = new List<List<PieceInfo>>();
           
           foreach(List<PieceInfo> PlayerPieceInfoList in AllPieceList)
           {
               List<PieceInfo> NewPieceList = new List<PieceInfo>();
               foreach(PieceInfo PieceInfo in PlayerPieceInfoList)
               {
                   NewPieceList.Add(new PieceInfo(PieceInfo.StartDesign));
               }
               NewAllPieceList.Add(NewPieceList);
           }

           NewBoardInfo.SetPiece(PieceDesign, new int[]{IndexY, IndexX}, PlayerNum);
           NewAllPieceList[PlayerNum].Remove(NewAllPieceList[PlayerNum][PieceIndex]);

           Scores.Add(SimulationWhileGameFinish(NewAllPieceList, BoardInfo, PlayerNum, Turn));
       }
       UnityEngine.Debug.Log(Scores.Max());
       return AllInstructions[Scores.IndexOf(Scores.Max())];
   }

   // 勝敗が決まるまでランダムに手を実行する
   // 勝ち：+2　引き分け：+1　負け：+0　のスコアとする
   // AllInstructions = GetAllInstructions(...)は全ての手を取得する（Turnに応じてある程度手を捨てる）
   // MaxLoopCountがその手でのシミュレーション回数
   static int SimulationWhileGameFinish(List<List<PieceInfo>> AllPieceList, Board BoardInfo, int PlayerNum, int Turn)
   {
       int MyPlayerNum = PlayerNum;
       int Score = 0;
       PlayerNum = IncreasePlayerNum(PlayerNum);
       for (int LoopCount = 0; LoopCount < MaxLoopCount; LoopCount++)
       {
           Board NewBoardInfo = new Board(BoardInfo);
           List<List<PieceInfo>> NewAllPieceList = new List<List<PieceInfo>>();
           
           foreach (List<PieceInfo> PlayerPieceInfoList in AllPieceList)
           {
               List<PieceInfo> NewPieceList = new List<PieceInfo>();
               foreach (PieceInfo PieceInfo in PlayerPieceInfoList)
               {
                   NewPieceList.Add(new PieceInfo(PieceInfo.StartDesign));
               }
               NewAllPieceList.Add(NewPieceList);
           }

           while (true)
           {
               int PassPlayerCount = 0;
               for (int PlayerCount = 0; PlayerCount < MaxPlayer; PlayerCount++)
               {
                   Turn++;
                   List<object[]> AllInstructions = GetAllInstructions(NewAllPieceList[PlayerNum], NewBoardInfo, Turn, PlayerNum);
                   object[] Instruction;
                   if (AllInstructions.Count == 0)
                   {
                       Instruction = null;
                       PassPlayerCount++;
                   }
                   else
                   {
                       Instruction = AllInstructions[Random.Range(0, AllInstructions.Count)];
                       NewBoardInfo.SetPiece((List<int[]>)Instruction[5], new int[]{(int)Instruction[3], (int)Instruction[4]}, PlayerNum);
                       NewAllPieceList[PlayerNum].Remove(NewAllPieceList[PlayerNum][(int)Instruction[0]]);
                   }
                   PlayerNum = IncreasePlayerNum(PlayerNum);
               }
               if (PassPlayerCount == MaxPlayer)
                   break;
           }
           List<int> Winner = NewBoardInfo.CheckWinPlayer(); // そのボードの勝者をList<int>型で返す。同率の場合は要素数が2以上となる
           if (Winner.Count != 1)
               Score += 1;
           if (Winner.Contains(MyPlayerNum))
               Score += 2;
           else
               Score += 0;
       }
       return Score;
   }

実際に動かした結果

先攻後攻を入れ替えつつ、3ゲームづつ行いました。

CPU1：11ターン目まで軽い処理、それ以降は重い処理
CPU2：最後まで軽い処理

モンテカルロ法のシミュレーション回数:2

で戦わせたところ、CPU1が5勝1分でした。きちんと強くはなっているみたいです。

ただし実行時間は長そうでした。1手に3秒ほどかかります。

処理時間について

特に処理時間がかかっているのはGetAllInstructionsの関数でした（まあそんな気がしてた）。全探索だったのでもちろん改善が必要です。

最初にモンテカルロ法を使用する部分で処理時間を測定してみました。

全体の処理時間3.896sに対し、GetAllInstructionsの処理時間が3.758sかかっていました。モンテカルロ法の中で、ランダムな一手を取るにも全ての組み合わせで確認しているので、この部分を改善する必要がありそうでした。

もちろん最後の手になってくると、ランダムな手を作る回数も減るので1秒で終わっていました。どのあたりからモンテカルロ法を実行するかは考える必要がありそうでした。

次回

この記事が気に入ったらサポートをしてみませんか？