文系マーケターの私が営業インセンティブデータをPythonで抽出した

2023年7月23日 12:35

マーケターをやっていると営業サイドの実績データを抽出しにいかなければならないことが多々あります．今回はその中でも私にとって少しややこしかった「営業インセンティブ」の実績データ抽出について書いていきたいと思います．

私は化粧品メーカーに所属しているので営業＝美容部員が販売インセンティブをいくらもらったか」なのですが，このデータをモデリングの変数にすることを想定しています．

インセンティブの支給金額は美容部員1人1人によって異なるので，集計時に必要な条件分岐が多いのが厄介です．例えば，

支給金額は販売実績に応じ，評価用テーブルと照らし合わせて判定する
個人への支給金額には上限がある
全員への支給金額には上限がある（湯水のようには支給できない）

などで，for文の中で色々と条件を設けて回さないといけませんでした．これらの条件の他にも，過去の美容部員の販売成績などによっても支給金額は変わりますが，今回はこれについては割愛してあります．

以下，順を追って書いた内容を紹介していきます．

# 取引ログを格納したデータフレームから今回用に必要行を抽出
train_df = train_df.loc[(train_df['ymd'] > dt.datetime(yyyy,mm,dd)) & (train_df['ymd'] < dt.datetime(yyyy,mm,dd))]

# 取引日付が古い順にソート
train_df = train_df.sort_values(by='ymd')

# 販売実績ごとに（閾値1, 閾値2, 閾値3, 閾値4, 閾値5）をテーブルを設定
threshold1 = 20000
threshold2 = 40000
threshold3 = 60000
threshold4 = 80000
threshold5 = 100000

# フラグの初期化
is_threshold1_exceeded = False  # threshold1を超えたかどうかのフラグ
is_threshold2_exceeded = False  # threshold2を超えたかどうかのフラグ
is_threshold3_exceeded = False  # threshold3を超えたかどうかのフラグ
is_threshold4_exceeded = False  # threshold4を超えたかどうかのフラグ
is_threshold5_exceeded = False  # threshold5を超えたかどうかのフラグ

# 美容部員ごとのカウンターの合計（変数output）を辞書として保持するための辞書
output_dict = {}

# フラグの初期化
is_threshold1_exceeded = False  # threshold1を超えたかどうかのフラグ
is_threshold2_exceeded = False  # threshold2を超えたかどうかのフラグ
is_threshold3_exceeded = False  # threshold3を超えたかどうかのフラグ
is_threshold4_exceeded = False  # threshold4を超えたかどうかのフラグ
is_threshold5_exceeded = False  # threshold5を超えたかどうかのフラグ

# 美容部員ごとのカウンターの合計（変数:output）を辞書として保持するための辞書
output_dict = {}

# 販売員ごとに処理を実行
for salesman, group in train_df.groupby('salesman_id'):
    cumulative_amounts = group['ordered_amount'].cumsum()
    output = []
    for amount in cumulative_amounts:
        if amount >= threshold1 and not is_threshold1_exceeded and counter <= 5:
            output.append(1)
            counter += 1
            is_threshold1_exceeded = True
        elif amount >= threshold2 and not is_threshold2_exceeded and counter <= 5:
            output.append(1)
            counter += 1
            is_threshold2_exceeded = True
        elif amount >= threshold3 and not is_threshold3_exceeded and counter <= 5:
            output.append(1)
            counter += 1
            is_threshold3_exceeded = True
        elif amount >= threshold4 and not is_threshold4_exceeded and counter <= 5:
            output.append(1)
            counter += 1
            is_threshold4_exceeded = True
        elif amount >= threshold5 and not is_threshold5_exceeded and counter <= 5:
            output.append(1)
            counter += 1
            is_threshold5_exceeded = True
        else:
            output.append(0)

    # 美容部員IDとoutputの辞書に追加
    output_dict[salesman] = output

    # カウンターをリセット（次の美容部員の処理に進むため）
    counter = 0
    is_threshold1_exceeded = False
    is_threshold2_exceeded = False
    is_threshold3_exceeded = False
    is_threshold4_exceeded = False
    is_threshold5_exceeded = False

# outputを新しい列としてDataFrameに追加
train_df = train_df.sort_values(by='salesman_id')
train_df['output'] = 0
for key, value in output_dict.items():
  train_df['output'].mask(train_df['salesman_id'] == key, value, inplace=True)

print(train_df)

# 全美容部員に支給した数（カウンターの合計）を求める
# 本来なら先のfor文の中でカウンターの合計が上限に達する前に処理を停止しなければならなかった
# 上限に到達する見込みが低いことがわかっていたため今回はfor文の外で実行しています．

output_sum_dict = {}

for key, value in output_dict.items():
    output_sum_dict[key] = sum(value)
    
print(sum(output_sum_dict.values()))

今回は上述のような書き方を実行しましたが，そもそもインセンティブ金額を判定するためのテーブル（＝複数の閾値）を変数として入れるやり方はおそらくイケてなさそうです．例えば，関数などをつくれたらもっと賢くできそうです．能力不足で現状が限界ですが…
ただ，現状のコードは後から見返したときに何をしているかがわかりやすくはなっているので部分的にはメリットもあるかなとは思いました．今後もし何かプロダクトに実装するだとか，そういったスケールを想定するなら書き方は今のままではいけないと思うので，，もっとよい書き方をスピーディーに実装できるようこれからも邁進していきたいと思っています💪

この記事が気に入ったらサポートをしてみませんか？