【Python】Webスクレイピングを実施

2021年3月29日 22:30
以下のコードで可能です。
※分かりにくい記載の仕方になっているかもしれないです。
■■事前準備

・コマンドプロンプト（PowerShell）起動して、環境設定 selenium
C:\Users\*****\AppData\Local\Programs\Python\Python37\Scripts\pip install selenium
※selenium インストールして、webスクレイピングをするための準備をする。

C:\Users\*****\AppData\Local\Programs\Python\Python37\Scripts\pip install chromedriver-binary
※chromedriver をインストールして、ドライバーを経由してChromeブラウザへ移動する。

C:\Users\*****\AppData\Local\Programs\Python\Python37\Lib\site-packages\chromedriver_binary\chromedriver
※chromedriverをそのまま実行して、情報が表示されればdriverが使用できる確認の完了です。

C:\Users\*****\AppData\Local\Programs\Python\Python37\Scripts\pip install pyautogui
※マウス操作などが実行できます。

・コマンドプロンプト（PowerShell）起動して、環境設定 beautifulsoup4
C:\Users\*****\AppData\Local\Programs\Python\Python37\Scripts\pip install beautifulsoup4

◆自動設定のコマンド（PowerShell）
C:\Users\*****\AppData\Local\Programs\Python\Python37\python C:\Users\tkikuci\Desktop\デスクトップ\odennwa.py
※Pythonを絶対パスで入力して、半角スペースを入力した後に、対象のファイルを絶対パスで指定してあげると実行できます。

■■スクリプト内容

・pythonを使用するためのエディターを開く
from selenium import webdriver
※webdriverにseleniumをインポートして、機能として使用できるようにする。

from selenium.webdriver.support.ui import select
※ドロップダウンメニューの操作の為にインポートする。

from bs4 import BeautifulSoup
※細かいWEBサイトのスクレイピングが可能になる。

driver = webdriver.Chrome()
or
driver = webdriver.Chrome(executable_path=r"C:\Users\******\AppData\Local\Programs\Python\Python37\Lib\site-packages\chromedriver_binary\chromedriver.exe")
※Chromeに入る（Chromeの"C"は必ず大文字でなければならない）
※(executable_path=r)のパスの「＝」の前に必ず『r』を入れなければいけない。

driver.get("----")
※URLをget("----")の『----』部分に入力して、対象サイトに入る。

◆Web上で操作を実行する。
element = driver.find_element_by_class_name("c-call-to-action")
※対象のプログラムの要素を取得する。

element.click()
※取得したプログラムの要素のインターフェースをクリックする。

element = driver.find_element_by_xpath("//ul[@class='l-nav__list']").find_element_by_xpath("//a[@href='https://idotdesign.net/price']").text
※親要素の配下の子要素を取得する方法
※最後に「text」コマンドによって、子要素の文字列を取得しています。

element = driver.find_element_by_xpath("//div[@id='jump-to-nav']")
webelement = element.find_element_by_xpath("//a[2]")
webelement.text
'検索に移動'
※親要素『div id=jump-to-nav』の配下の子要素『a ～』の2行目を取得する

i = i + 1
driver.save_screenshot("C:/Users/******/Desktop/デスクトップ/screen" + str(i) + ".png")
※スクリーンショットを取得して、保存する（連番になるように設定されている）。

■以下は、取得元のプルダウン情報で、『----』以下はプルダウン処理です。
<select name="color">
<option value="red">赤</option>
<option value="blue">青</option>
<option value="green">緑</option>
</select>
------------------------------------------------------------
element_selection = driver.find_element_by_name('color')
※いつも通りの要素取得です。

element_selection_color = select(element_selection)
※要素を選択します。

element_selection_color.select_by_value('blue')
※選択した要素の中でblueプルダウンを選択します。
------------------------------------------------------------
この記事が気に入ったらサポートをしてみませんか？