Pandas3　DataFrame1

Toshitaka

2021年5月3日 00:05

Pandasの第３回目となります。

Aidemyさんの講座を受講しながらアウトプットのためにブログを更新しております。

AidemyさんHP　→　https:// aidemy.net

前回PandasのSeriesを学習しましたので、今回からDataframeとなります。

【DataFrame　概要】

DataFrameは二次元配列というイメージです。

ちなみにSeriesは一次元配列のイメージです。

pd.DataFrame()にSeriesを渡すことでDataFrameを生成できます。

Seriesをいくつもつなげる事でDataFrameを形成できるイメージです。

行には0から昇順に整数の番号が自動で付きます。

pd.DataFrame([Series, Series, ...])

また、バリューがリスト型である辞書型で表記しても生成することができます。この際、各要素のリストの長さ(項目数)を揃えなくてはならない点に注意してください。

import pandas as pd

data = {"fruits": ["apple", "orange", "banana", "strawberry", "kiwifruit"],
       "year": [2001, 2002, 2001, 2008, 2006],
       "time": [1, 4, 5, 6, 3]}
df = pd.DataFrame(data)

print(df)


>>> 出力結果
      fruits  time  year
0       apple     1  2001
1      orange     4  2002
2      banana     5  2001
3  strawberry     6  2008
4   kiwifruit     3  2006

例文：Seriesを渡した場合

import pandas as pd

index = ["apple", "orange", "banana", "strawberry", "kiwifruit"]
data1 = [10, 5, 8, 12, 3]
data2 = [30, 25, 12, 10, 8]

series1 = pd.Series(data1, index=index)
series2 = pd.Series(data2, index=index)

# series1, series2からDataFrameを生成してdfに代入してください
df = pd.DataFrame([series1,series2])
print(df)



>>> 出力結果
    apple  orange  banana  strawberry  kiwifruit
0     10       5       8          12          3
1     30      25      12          10          8

【df代入：df.index】

DataFrameでは、行の名前をインデックス、列の名前をカラムと呼びます。
インデックスは0から昇順に整数が割り当てられます。
また、カラムは元データであるSeriesのindexや辞書型のキーになります。

DataFrame型の変数dfのインデックスはdf.indexに行数と同じ長さのリストを代入することで設定できます。dfのカラムはdf.columnsに列数と同じ長さのリストを代入することで設定することができます。

df.index = ["name1", "name2"]

例文

import pandas as pd

index = ["apple", "orange", "banana", "strawberry", "kiwifruit"]
data1 = [10, 5, 8, 12, 3]
data2 = [30, 25, 12, 10, 8]

series1 = pd.Series(data1, index=index)
series2 = pd.Series(data2, index=index)

df = pd.DataFrame([series1, series2])

# dfのインデックスが1から始まるように設定してください
df.index=[1,2]
print(df)


>>> 出力結果
 apple  orange  banana  strawberry  kiwifruit
1     10       5       8          12          3
2     30      25      12          10          8

【行の追加:append】

データを1行追加したいときは、append()を用います。
ここでは、DataFrame型のデータをdf、Series型のデータをseriesとします。
まず、DataFrameのカラムに、インデックスを対応させたSeries型のデータを用意します。そして、下記サンプルコードのようにappend()を記述すると、dfにデータが1行追加されます。

例文

import pandas as pd

index = ["apple", "orange", "banana", "strawberry", "kiwifruit"]
data1 = [10, 5, 8, 12, 3]
data2 = [30, 25, 12, 10, 8]
data3 = [30, 12, 10, 8, 25, 3]   #他のデータと比べ、一つデータ数が多い（３）

series1 = pd.Series(data1, index=index)
series2 = pd.Series(data2, index=index)

# dfにseries3を追加し、dfに再代入してください
index.append("pineapple")    #pineappleが追加
series3 = pd.Series(data3, index=index)    #追加内容を踏まえ、series3を作成
df = pd.DataFrame([series1, series2])

#dfに再代入してください
df = df.append(series3, ignore_index=True) #再代入、欠損値はNaNになり、エラーにならないよう設定

# dfと追加するSeriesのインデックスが一致しない時の挙動を確認しましょう
print(df)


>>> 出力結果
apple  orange  banana  strawberry  kiwifruit  pineapple
0     10       5       8          12          3        NaN
1     30      25      12          10          8        NaN
2     30      12      10           8         25        3.0  #data3にて代入した

【列の追加：df["　"]】

既存のDataFrameに新しい項目（列）を追加したい場合があります。
DataFrame型の変数がdfの場合、df["新しいカラム名"]にリストまたはSeriesを代入することで新しい列を追加できます。

data = {"fruits": ["apple", "orange", "banana", "strawberry", "kiwifruit"],
       "year": [2001, 2002, 2001, 2008, 2006],
       "time": [1, 4, 5, 6, 3]}

df = pd.DataFrame(data)
df["price"] = [150, 120, 100, 300, 150]   #新しい列を追加

print(df)


>>> 出力結果
       fruits  time  year  price
0       apple     1  2001    150
1      orange     4  2002    120
2      banana     5  2001    100
3  strawberry     6  2008    300
4   kiwifruit     3  2006    150

「new_colume」でindexを設定後、「mango」を追加しそこへ数値が入る。

data1 = [10, 5, 8, 12, 3]
data2 = [30, 25, 12, 10, 8]
series1 = pd.Series(data1, index=index)
series2 = pd.Series(data2, index=index)
new_column = pd.Series([15, 7], index=[0, 1])

# series1, seires2からDataFrameを生成
df = pd.DataFrame([series1, series2])

# dfの新しい列"mango"にnew_columnのデータを追加してください
df["mango"]=new_column  #new_columnへ代入した[15, 7]が入る
print(df)

>>> 出力結果
    apple  orange  banana  strawberry  kiwifruit  mango
0     10       5       8          12          3     15
1     30      25      12          10          8      7

今回はこの辺にさせていただきます。

お読みいただきましてありがとうございました。

この記事が気に入ったらサポートをしてみませんか？

Pandas3 DataFrame1

【DataFrame 概要】

【df代入：df.index】

【行の追加:append】

【列の追加：df[" "]】

Pandas3　DataFrame1

【DataFrame　概要】

【列の追加：df["　"]】