#!/usr/bin/env python # coding: utf-8 # # Python Regular Expression # ### 1. 일반적인 문자는 그 문자 자체로 매칭이 된다. 특수 문자는 다음과 같다. # # \ 특수 문자 Escape (or start a sequence) # . 줄바꿈을 제외한 모든 문자 (참고 re.DOTALL) # ^ 문자열의 시작 (참고 re.MULTILINE) # $ 문자열의 마지막 (re.MULTILINE) # [] 문자 집합 # | 또는 # () Capture 그룹 생성 (우선순위 지정) # ### 2. 집합기호 '``[``' 문자 이후에 사용할 수 있는 특수 문자는 다음과 같다. # # ] 집합의 끝 # - 범위 (예 a-c 는 a, b 또는 c를 의미) # ^ Negate를 의미 # Quantifiers (append '``?``' for non-greedy): # # {m} Exactly m repetitions # {m,n} From m (default 0) to n (default infinity) # * 0 or more. Same as {,} # + 1 or more. Same as {1,} # ? 0 or 1. Same as {,1} # Special sequences:: # # \A Start of string # \b Match empty string at word (\w+) boundary # \B Match empty string not at word boundary # \d Digit # \D Non-digit # \s Whitespace - [ \t\n\r\f\v]과 동일 (참고: LOCALE, UNICODE) # \S Non-whitespace # \w Alphanumeric: [0-9a-zA-Z_], see LOCALE # \W Non-alphanumeric # \Z End of string # \g Match prev named or numbered group, # '<' & '>' are literal, e.g. \g<0> # or \g (not \g0 or \gname) # Special character escapes are much like those already escaped in Python string # literals. Hence regex '``\n``' is same as regex '``\\n``':: # # \a ASCII Bell (BEL) # \f ASCII Formfeed # \n ASCII Linefeed # \r ASCII Carriage return # \t ASCII Tab # \v ASCII Vertical tab # \\ A single backslash # \xHH Two digit hexadecimal character goes here # \OOO Three digit octal char (or just use an # initial zero, e.g. \0, \09) # \DD Decimal number 1 to 99, match # previous numbered group # # Extensions. Do not cause grouping, except '``P``':: # # (?iLmsux) Match empty string, sets re.X flags # (?:...) Non-capturing version of regular parens # (?P...) Create a named capturing group. # (?P=name) Match whatever matched prev named group # (?#...) A comment; ignored. # (?=...) Lookahead assertion, match without consuming # (?!...) Negative lookahead assertion # (?<=...) Lookbehind assertion, match if preceded # (? RegexObject # match(pattern, string[, flags]) -> MatchObject # search(pattner, string[, flags]) -> MatchObject # findall(pattern, string[, flags]) -> list of strings # finditer(pattern, string[, flags]) -> iter of MatchObjects # split(pattern, string[, maxsplit, flags]) -> list of strings # sub(pattern, repl, string[, count, flags]) -> string # subn(pattern, repl, string[, count, flags]) -> (string, int) # escape(string) -> string # purge() # the re cache # # RegexObjects (returned from ``compile()``):: # # .match(string[, pos, endpos]) -> MatchObject # .search(string[, pos, endpos]) -> MatchObject # .findall(string[, pos, endpos]) -> list of strings # .finditer(string[, pos, endpos]) -> iter of MatchObjects # .split(string[, maxsplit]) -> list of strings # .sub(repl, string[, count]) -> string # .subn(repl, string[, count]) -> (string, int) # .flags # int, Passed to compile() # .groups # int, Number of capturing groups # .groupindex # {}, Maps group names to ints # .pattern # string, Passed to compile() # # MatchObjects (returned from ``match()`` and ``search()``):: # # .expand(template) -> string, Backslash & group expansion # .group([group1...]) -> string or tuple of strings, 1 per arg # .groups([default]) -> tuple of all groups, non-matching=default # .groupdict([default]) -> {}, Named groups, non-matching=default # .start([group]) -> int, Start/end of substring match by group # .end([group]) -> int, Group defaults to 0, the whole match # .span([group]) -> tuple (match.start(group), match.end(group)) # .pos int, Passed to search() or match() # .endpos int, " # .lastindex int, Index of last matched capturing group # .lastgroup string, Name of last matched capturing group # .re regex, As passed to search() or match() # .string string, " # In[ ]: # In[ ]: