毎日ちょっとPython ~ 16日目 ~

2020年7月14日 20:20

とりあえずこれから20分間でやる。前回はキャレットとドル記号。

ワイルドカード

正規表現の「.（ドット）」はワイルドカードといい、改行以外の任意の文字とマッチします。

>>> at_regex = re.compile(r'.at')
>>> at_regex.findall('The cat in the hat sat on the flat mat.')
['cat', 'hat', 'sat', 'lat', 'mat']

ドットとアスタリスクであらゆる文字列されられます。

>>> name_regex = re.compile(r'First Name: (.*) Last Name: (.*)')
>>> mo = name_regex.search('First Name: Al Last Name: Sweigart')
>>> mo.group(1)
'Al'
>>> mo.group(2)
'Sweigart'

ちなみに .* は貪欲モードなので、できるだけ長い文字列とマッチします。非貪欲モードにしたいときは、 .*? とする必要があります。

>>> nongreedy_regex = re.compile(r'<.*?>')
>>> mo = nongreedy_regex.search('<To serve man> for dinner.>')
>>> mo.group()
'<To serve man>'
>>> greedy_regex = re.compile(r'<.*>')
>>> mo = greedy_regex.search('<To serve man> for dinner.>')
>>> mo.group()
'<To serve man> for dinner.>'

ドット文字を改行とマッチさせる

.*は改行以外のあらゆる文字列とマッチします。 re.compile() の第2引数として、 re.DOTALL を渡すと、ドット文字が改行を含むすべての文字とマッチするようになります。

>>> no_newline_regex = re.compile('.*')
>>> no_newline_regex.search('Serve the public trust.\nProtect the innnocent.\nUphold the law').group()
'Serve the public trust.'
>>> newline_regex = re.compile('.*',re.DOTALL)
>>> newline_regex.search('Serve the public trust.\nProtect the innnocent.\nUphold the law').group()
'Serve the public trust.\nProtect the innnocent.\nUphold the law'

正規表現に用いる記号のまとめ

? - 直前のグループの0回か1回の出現にマッチする。
* - 直前のグループの0回以上の出現にマッチする。
+ - 直前のグループの1回以上の出現にマッチする。
{n} - 直前のグループのn回の出現にマッチする。
{n,} - 直前のグループのn回以上の出現にマッチする。
{,m} - 直前のグループの0~m回の出現にマッチする。
{n,m} - 直前のグループのn~m回の出現にマッチする。
{n,m}?, *?, +? - 直前のグループの非貪欲マッチを行う。
^spam - 「spam」から始まる文字列とマッチする。
spam$ - 「spam」で終わる文字列とマッチする。
. - 改行文字以外の任意の1文字とマッチする。
\d, \w, \s - 数字、単語を構成する文字、空白文字にマッチする。
\D, \W, \S - 数字、単語を構成する文字、空白文字以外にマッチする。
[abc] - 角かっこの中の任意の1文字にマッチする。
[^abc] - 角かっこの中の文字以外の任意の1文字にマッチする。

最後に

これ始める前に過去書いたものをさっと5分くらいで一気に見返す時間を作ろうかな。

20min がちょうどいい。

この記事が気に入ったらサポートをしてみませんか？