ðSpotifyã2023幎床ããã50ãçºè¡š!ãã¬ã³ãæ²ã®ããŒã¿åæïŒå¯èŠåãããŠã¿ãã
æ¯å¹ŽæäŸã«ãªã£ãSpotifyã2023幎ã®ããããã©ãã¯ã¹ãçºè¡šããŸãããããã¯2023幎äžã®ç·åçåæ°ããŸãšãããã®çµæã«åºã¥ããã©ã³ãã³ã°ãäœæããŸããä»åã¯Spotify APIãå©çšããŠãã©ãã¯ããŒã¿ãååŸãããã®æ å ±ãããšã«ããã€ãã®ã·ã³ãã«ãªããŒã¿åæãè¡ããŸãã2023幎ã¯ã©ããªæ²ãå€ãã®äººã«èŽãããã®ããä»å¹Žã圩ã£ãé³æ¥œã®å±æ§ãæ§è³ªã«é¢ããããŒã¿ãç°¡åã«å¯èŠåããŠèŠãŠãããããšæããŸãã
ã¡ãªã¿ã«ä»äœæäžã®Spotifyé³æ¥œããŒã¿ã¢ããªã§ãããããŒã¿ããã®ææ°çã«ã¢ããããŒãããŸããã
ðããŒã¿ã®ååŸ
ãŸãã¯Spotify APIã§å ¬åŒããŒã¿ãååŸããŸãã
# Extracted attributes
extracted_attributes = {
"artist": artists,
"track": track_info['name'],
"danceability": audio_features_data["danceability"],
"valence": audio_features_data["valence"],
"energy": audio_features_data["energy"],
"acousticness": audio_features_data["acousticness"],
"instrumentalness": audio_features_data["instrumentalness"],
"liveness": audio_features_data["liveness"],
"speechiness": audio_features_data["speechiness"],
"key": audio_features_data["key"],
"tempo": audio_features_data["tempo"],
"popularity": track_info.get('popularity', None),
"id": track_id
}
ååŸããå±æ§ããŒã¿ã¯ä»¥åã§ã䜿ã£ãå±æ§ïŒè©³ããã¯Spotifyãã¬ãžã³ãåç §ãã ããïŒïŒããã¥ã©ãªãã£ãŒã§ãããã ããã¥ã©ãªãã£ãŒã¹ã³ã¢ã¯æ¯æ¥å€åããæ°å€ã§ã幎éãéããŠã®ããŒã¿ãç·åã§èŠããæã«ã¯ããŸã圹ã«ç«ã¡ãŸããããäžå¿ãã®æãã®æã®äººæ°åºŠããã§ãã¯ãããšããäºã§ååŸããŸããã
ããããcsvãã¡ã€ã«ã«ä¿åããŠäžèº«ãéããšãããªæãã«ãªã£ãŠããŸãã
ç°å¢èšå®
ã¡ãªã¿ã«ä»æ¥ã¯ç°¡åãªåæããžã¥ã¢ã©ã€ãŒãŒã·ã§ã³ãªã®ã§ãPower BIãTableauãªã©ã®ããžã¥ã¢ã©ã€ãŒãŒã·ã§ã³ããŒã«ã¯äœ¿ããã«Jupyter Notebookå ã§å šãŠæžãŸããŸããä»åãPythonã䜿ã£ãŠãããããšæããŸãã
ðããŒã¿ã®äžèº«ãèŠãŠãããŸã
ãŸãã¯å¿ èŠãªã©ã€ãã©ãªãŒãåŒã³èŸŒã¿ãŸãã
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
ããŒã¿ã®äžèº«ãããŒã¿ãã¬ãŒã ãšããŠèŠãŠãããŸãã
file_path = '..\top_50_2023.csv'
df = pd.read_csv(file_path)
df.head()
`df` ãšããååã®ããŒã¿ãã¬ãŒã ãä¿åããŠãããªã³ãããŸãããããããäºã«ãã£ãŠãŸãåæ段éã®ããŒã¿ã€ã³ã¹ãã¯ããããŸãããªã«ãããããªãã¿ãŒã³ãªã©ããªããèŠãŸãã
df.info()
pandasã©ã€ãã©ãªãŒã®`info()`é¢æ°ã䜿ã£ãŠããŒã¿ã®æ§é ãç°¡æœãªæŠèŠãèŠãŠãããŸãããã®`Info`é¢æ°ã ãã§ãããŒã¿ã®æ§è³ªãããªãåãããšæããŸããååã®ããŒã¿åãéNullå€ã®æ°ãããã³ã¡ã¢ãªäœ¿çšéã«é¢æ°è©³çŽ°ãªã©ã衚瀺ãããããã«ãªã£ãŠããŸãã
ïŒïŒãšã³ããªãŒãïŒïŒæ²å«ãŸããŠãã®ã§ééããªãã§ããã€ã³ããã¯ã¹ã¯ïŒããïŒïŒã
ð¡ã¡ãªã¿ã«ããã°ã©ãã³ã°ã®ã»ãšãã©ã®å Žåã®ã«ãŠã³ãã¯ïŒããå§ãŸããŸããçµ±èšèšèªã§ããRã®å Žåã¯ã«ãŠã³ãã¯ïŒããã§ãã
Columns (å)ã¯å šéšã§ïŒïŒã§ãããã¯ã¢ãŒãã£ã¹ãåããã©ãã¯åãé³æ¥œã®å±æ§ãè¡šãé ç®ãæããŸãã
Non-Null CountïŒéNullå€ã®æ°ïŒãšããè¡šèšããããŸãããã¡ãªã¿ã«Nullãšã¯ããã°ã©ãã³ã°èšèªãããŒã¿è¡šçŸã§ããèŠãããã¯ãŒãã§ãäœãããŒã¿ãå«ãŸããŠããªãç¶æ ãæå³ããŸããã€ãŸãNon-Nullã¯äœçãã®ããŒã¿ãå«ãŸããŠãããšããæå³ã«ãªããŸãã50 Non-Nullãªã®ã§å šãŠã®ãã©ãã¯ã«ããŒã¿ãå«ãŸããŠãããšããããšã«ãªããŸããNullã®ãã§ãã¯ã¯ããªãéèŠãªããŒã¿åæã®ã¹ãããã§ãã
Dtypeã¯ããããã®åã®ããŒã¿åãè¡šèšããŠããŸããäŸãã°floatïŒæµ®åå°æ°ç¹æ°ïŒã¯3.5ãintïŒæŽæ°åïŒã¯ïŒãObjectã¯ãã®ä»ã®ã¹ããªã³ã°ãªã©ã®ããŒã¿åãæããŸãããã®Dtypeã¯ãäŸãã°å¹Žã®ã«ã©ã ãIntåã§ãªããã°ãããªãã®ã«Objectåã«ãªã£ãŠããå Žåããã®è¡šããããã«åãããŸãã
df.describe()
ãããŠæ¬¡ã«`describe()`é¢æ°ã§ããŒã¿ãã¬ãŒã å ã®ååã«å¯Ÿããå æ¬çãªçµ±èšæ°å€ãèŠãŠãããŸããæ°å€åã«ã€ããŠã¯ãã«ãŠã³ããå¹³åãæšæºåå·®ãæå°å€ã第ïŒïŒããŒã»ã³ã¿ã€ã«ãäžå€®å€ã第ïŒïŒããŒã»ã³ã¿ã€ã«ããããŠæ倧å€ãå«ãŸããŸããéæ°å€åã®å Žåã¯ãã«ãŠã³ãããŠããŒã¯å€ãæãé »åºŠãé«ãå€ããã®é »åºŠãã衚瀺ãããŸãã
# Check duplicates
df.duplicated().value_counts()
`duplicated()`é¢æ°ã§ããŒã¿å ã«éè€ããè¡ããããã©ãã確èªããŸãã
False 50
Name: count, dtype: int64
False 50ãšåºãŠããã®ã§éè€ããè¡ã¯ååšããªãããšãåãããŸãã
ðããžã¥ã¢ã«åããŠãããŸã
ã¢ãŒãã£ã¹ãå¥æ¥œæ²æ°ã©ã³ãã³ã°
ãŸãã¯ã©ã®ã¢ãŒãã£ã¹ããæãå€ãã®æ¥œæ²ãïŒïŒäœä»¥å ã«ã©ã³ã¯ã€ã³ããã®ãã«ãŠã³ãããŠãããŒãã£ãŒãã«ããŠã¿ãŸããã
import pandas as pd
artist_counts = df['artist'].value_counts()
df_artist_counts = pd.DataFrame({'Artist': artist_counts.index, 'Count': artist_counts.values})
df_sorted_artists = df_artist_counts.sort_values(by='Count', ascending=False)
df_top_5 = df_sorted_artists.head(5)
sns.barplot(x='Count', y='Artist', data=df_top_5)
plt.title('Number of Songs by Artist')
plt.show()
Spotifyå ¬åŒããŒãžã§ãçºè¡šãããéããã°ããŒãã«ã¢ãŒãã£ã¹ãã®æ å ã«èŒãããã€ã©ãŒã»ã¹ãŠã£ãããïŒäœã§ããã
ãã©ãã¯å±æ§ã®çžé¢ä¿æ°ãè¡šãããŒãããã
次ã«ãé³æ¥œã®æ§è³ªãè¡šãå±æ§å士ã®çžé¢é¢ä¿ãããŒããããã«ããŠã¿ãŠãããŸãã
ã¡ãªã¿ã«ããŒãããããšã¯å€æ°éã®çžé¢ã®åŒ·åŒ±ãæ°å€åããçžé¢è¡åãèŠèŠåããã°ã©ãã®äºãæå³ããŸãã以äžã®äŸã®ããŒããããã§ã¯ãèµ€ã匷ããªãã»ã©çžé¢é¢ä¿ã匷ãããšããèŠæ¹ãããŸãã
import matplotlib.pyplot as plt
import seaborn as sns
correlation_matrix = df[track_attributes].corr()
plt.figure(figsize=(10, 8))
heatmap = sns.heatmap(correlation_matrix,
vmin=-1,
vmax=1,
cmap='RdBu_r')
# Loop to add correlation values in each cell
for i in range(len(track_attributes)):
for j in range(len(track_attributes)):
text = "{:.2f}".format(correlation_matrix.iloc[i, j])
heatmap.text(j + 0.5, i + 0.5, text,
ha='center', va='center', color='black')
plt.xticks(range(len(track_attributes)), track_attributes, rotation=45)
plt.yticks(range(len(track_attributes)), track_attributes, rotation=0)
plt.title("Correlation Heatmap")
plt.show()
danceabilityãš valenceã®çžé¢é¢ä¿ãå°ãé«ããªã®ã§å¿µã®ãããã®ïŒã€ã®é¢ä¿æ§ãå¯èŠåããŠã¿ãŸãã
plt.figure(figsize=(10,6))
sns.regplot(x='valence',y='danceability',data=df)
plt.title("Regression: Danceability, Valence")
plt.show()
danceabilityã valenceãã©ã¡ããšãã¢ã¯ãã£ãããããžãã£ãããè¡šãæ°å€ãªã®ã§å€å°ã®çžé¢é¢ä¿ããã£ãŠãããããã¯ãªããšæããŸããããã§ã¯ããŒã¿ãã©ã®ããã«æ£ãã°ã£ãŠããã®ããäœãããããªããŒã¿ãã€ã³ãã¯ãªãã確èªããŸãã
調ïŒããŒïŒã®ããŒãã£ãŒã
ã©ããªæ²èª¿ãªã®ãå€å¥ããããã«ã¯æ¬ ãããªãäžå¿é³ã§ãã調ã«ã€ããŠãä»å¹Žæãã°ããŒãã«ã«èŽãããæ²ãã©ã®èª¿ãçšããŠããã®ãããŒãã£ãŒãã«ããŸããã
plt.figure(figsize=(10,6))
sns.countplot(x='key',
data=df,
palette='viridis',
order=df['key'].value_counts().index)
plt.title('Distribution of Keys')
plt.show()
ãŸãã©ã®æ代ã§ãã¿ã€ã ã¬ã¹ã§äœ¿ããã調ã¯Cç³»ã§ããïŒïŒïŒïŒå¹ŽãäŸå€ã§ã¯ãªããCïŒããã䜿ãããŸãããããããDã䜿ãããŸããã
ã¡ãªã¿ã«ïŒïŒïŒïŒå¹Žã«ããèŽãããCïŒã®æ²ã¯ãã¡ãã§ãã
df[df['key'] == 'C#']
The Weekndã¯CïŒãäžå¿é³ãšããæ²ããããªãªãŒã¹ããŠããå°è±¡ããããŸãã
ð¡ã¡ãªã¿ã«Creepin'ãšããä»å¹Žæµè¡ã£ãŠããæ²ã¯ïŒïŒïŒïŒå¹Žä»£ã«çºè¡šãããMario Winansã®I don't wanna knowãšããæ²ã®ãªã¡ãŒã¯æ²ã§ãã
ãã³ãïŒBPMïŒã®ååžå³
次ã¯æ²ã®ãã³ãã«ã€ããŠãå šäœåãèŠãããã«ååžå³ã«ããŠã¿ãŸãã
plt.figure(figsize=(10,6))
sns.histplot(df['tempo'], bins=50, kde=True)
plt.axvline(x=df['tempo'].mean(), color='red', linestyle='dashed', linewidth=2, label="Mean BPM")
plt.title('Distribution of Beats Per Minute (BPM)')
plt.xlabel('BPM')
plt.ylabel('Frequency')
plt.legend()
plt.show()
ãããïŒïŒã®å¹³åãã³ãã¯124.06ã§ããã
å±æ§ããŒã¿ã®ã¬ãŒããŒãã£ãŒã
ãããŠæåŸã«ããããã®å±æ§ãã¬ãŒããŒãã£ãŒãã«ããŠå¯èŠåããŠãããŸãããŸãã¯å±æ§ããŒã¿ãã©ã®ãããªããŒã¿ãæ°å€åããŠããããç°¡åã«ãŸãšããŠã¿ãŸããã
ðºðœdanceabilityãïŒãã³ã¹ã¢ããªãã£ïŒ
å
šäœçãªãã³ãããªãºã ã®å®å®æ§ãããŒãã®åŒ·ããªã©ã®èŠçŽ ãçµã¿åãããŠããã®æ¥œæ²ããã³ã¹ã«ã©ãçšé©ããŠããã瀺ãæ°å€ãïŒãèžãã«ãããïŒãæãèžããããã
ð€ valenceãïŒãŽã¡ã¬ã³ã¹ïŒ
楜æ²ã®å
šäœçãªæãããããžãã£ãããè¡šãæ°å€ãïŒã«è¿ã¥ãã«ã€ããŠæããæ²ã
â¡energyãïŒãšããžãŒïŒ
ãšãã«ã®ãã·ã¥ã ã£ããæ¿ãããè¡šãæ°å€ãïŒã«è¿ã¥ãã«ã€ããŠæ¿ãããå¢ãã
ðžacounsticnessãïŒã¢ã³ãŒã¹ãã£ãã¯ãã¹ïŒ
é³é¿ã§ãã£ãããç楜åšç³»ã®åºŠãè¡šãæ°å€ãïŒã«è¿ã¥ãã»ã©ç楜åšæ§ã匷ãæ²ã
ð»instrumentalnessãïŒã€ã³ã¹ãã«ã¡ã³ã¿ã«ãã¹ïŒ
楜åšã®ã¿ãããŒã«ã«ãå
¥ã£ãŠããªã床åã瀺ãæ°å€ãïŒã«è¿ã¥ãã»ã©ããŒã«ã«ã®å ããå²åãäœããéåžžã®ããããã¥ãŒãžãã¯ã¯å€§æµã¯ãã®ã€ã³ã¹ãã«ã¡ã³ã¿ã«ãã¹ãïŒã«è¿ãå Žåãå€ãã
ð€livenessãïŒã©ã€ããã¹ïŒ
楜æ²ã®ã©ã€ãæã®åºŠåã瀺ãæ°å€ãïŒã«è¿ã¥ãã»ã©ã©ã€ãæã匷ã楜æ²ã
ð£ïžspeechinessãïŒã¹ããŒããã¹ïŒ
ã¹ããŒãã話ããŠããæã®åºŠåã瀺ãæ°å€ãã©ãããããŒã¹ã®æ¥œæ²ã¯ãã®æ°å€ãé«ããªããéåžžã®æã£ãŠãã楜æ²ã ãšã»ãšãã©ãïŒã«è¿ãäºãå€ãã
fig = go.Figure()
fig.add_trace(go.Scatterpolar(
r=mean_values_top_50,
theta=att_list,
fill='toself',
name='Top 50',
line=dict(color='blue'),
))
fig.update_layout(
polar=dict(radialaxis=dict(visible=True, range=[0, 100])),
showlegend=True,
legend=dict(x=1, y=1, traceorder='normal',orientation='v'),
)
ããããã®å±æ§ããŒã¿ã®å¹³åå€(%)ã¯ãã¡ãã§ãã
mean_values_top_50 = df[att_list].mean()
mean_values_top_50
ðãŸãšãðïŒïŒïŒïŒå¹Žãã¬ã³ãæ²ã®ç¹åŸŽ
å¯èŠåããçµæãèŠãŠã¿ããšãå šäœçã«ã¯ãšãã«ã®ãã·ã¥ã§ãã³ã¹ãããããããªããããå°ãããžãã£ããããªæ²ãããèŽãããŠããŸããã
ã¢ã³ãŒã¹ãã£ãã¯ç³»ã楜åšç³»ãããããã©ã€ãæãã匷ãæ²ã¯å¥œãŸããŠãããšã¯èšãããã¹ããŒããã¹ãããªãäœãã§ããã©ããç³»ã®æ²ã¯ã¹ããŒããã¹ã®å€ãä»ã®æ²ãããé«ããªããšèšãããŠããã®ã§ãã©ããç³»ã®æ¥œæ²ã¯TOP50ã«ã¯ããŸãå«ãŸããŠããªãã£ããšèšããã§ãããã
ãã³ãã¯å¿å°ããéããé ããããšãã«ã®ãã·ã¥éããªãçšåºŠã®æ¥œæ²ãè¯ãèŽãããŠããããã§ãã
ãã¬ã³ãæ²ã«ã¯ãã€ã©ãŒã»ã¹ãŠã£ãããThe WeekndãSZAãè€æ°æ²ã©ã³ã¯ã€ã³ããŸããã
ãããŠããã¯ç§ãå®éã«Top50ã®æ¥œæ²ãèããŠã¿ãææ³ã§ãããå
šäœçã«90幎代ããYïŒKæ代ã®æããããæããããæ²èª¿ãå€ãã£ãæ°ãããŸããååºã®Creepin'ãšããæ²ã¯ãã¡ããããã³ã°ããããšãªã£ãŠããThe WeekndãšDaft Punkã«ããStarboyã幎éãããæ²ã«èŒãããã€ãªãŒã»ãµã€ã©ã¹ã®Flowersã2000幎代ã®ä»£è¡šæ²ã§ãããšããã ã®Mockingbirdãã©ã³ã¯ã€ã³ããŠãããã20幎åã®ãªãã€ãã«æãåæ ãããã©ã³ãã³ã°ãšããå°è±¡ãåããŸããã
ðïžäœãæ°ã¥ããããšããæèŠãªã©ãããã°ã³ã¡ã³ãæ¬ãããé¡ãããŸãããã£ã³ãã«ãã©ããŒããããããé¡ãããŸãã
ð çåã«æã£ãäºã¯ããŒã¿åããŠèªåãªãã®çããæ¢ããŠããããšæããŸãããããŠãã®çããã©ããã«ãã誰ãã®çºã«ãªãäºãé¡ã£ãŠããŸãã
ãã®èšäºãæ°ã«å ¥ã£ãããµããŒããããŠã¿ãŸãããïŒ