https://www.youtube.com/watch?v=7DG3kCDx53c&t=1s
That is a really nice video series for us to learn RegExp.
Introduction to the regular expression(RegExp)
i.Why should we use Regex? We can use Command + f to find anything we want.
Here is a text:
My number is 999-9999-9999 and his number is 888-8888-8888.
It is easy for you to find his or my number separately(literal字面上的), but it’s a big problem for us to fine all the numbers at the same time(more generalized pattern更广义的).
ii.We should use the meta-characters(元字符) to match the number we want.
We can remember the ‘\d’ is the meaning of 0-9, so we can use the Command + f (or control+ f) to find the numbers in this way(\d\d\d-\d\d\d\d-\d\d\d\d).
You should click the Regex radio button to use Regex like this.
It’s really magical.iii.We should also remember the mean of ‘.’ and ‘*’
‘.’ means any char
‘‘ means zero or more
So, ‘.‘ means everything, and ‘rainbow.*’ means i want to match rainbow followed by any char or any character.
Meta-characters
whole bunch of spercific meta characters
single char | quantifiers | position |
---|---|---|
\d -> 0-9 | * -> 0 or more | ^ -> begining of the line |
\w -> A-Z,a-z,0-9(essence is a part of word ,any letter or number) [\W means anything that is not] | + -> 1 or more | $ -> end |
\s -> white space,tab,enter [\S means anything that is not] | ? -> 0 to 1 [colou?rs?] | \b -> boundary [\b\w{4}\b] |
. -> any | {min,max} | — |
\d -> 0-9 | {n}:repeat-n | — |
\d -> 0-9 | — | — |
Character Classes
[abc]:a or b or c
[-]:dash at the begining. It means the literal dash.
[a-c]:dash not at the begining. It means find something between it.
[^abc]:caret symbol at the begining. It means find somthing not in abc.
[a^bc]:caret symbol not at the begining. It means find something in a^bc.
Alternation
(net|com|edu)
e.g: If you want to find all the email adress,you can use this.
[\w.]+@\w+.(net|com|edu)
Capturing Groups
999-999-9999 is the number
\d{3}-\d{3}-\d{3} is the RegExp
(\d{3})-\d{3}-\d{3} is capturing the first 3 digits. And it calls Group1. The whole sentence calls Group zero.
We can replace it by $1-XXX-XXX to save the group1
e.g: Change the name by first name + last name
Peter, Chen => Chen Peter
RegExp: (\w+),\s+(\w+) and replace it to \$2 \$1
How to aviod the “.*” to be greedy?
use the .*? to aviod it.
Back References(反向引用)
If you want to match two same world near in the text. You can use the
\b(\w+)\s\1\b
/1 means Group 1
In JavaScript
test() and match()
string.match() return the first of the match sentence
regExp.test() return true or false
/g and /i flag
/g means global
/i means case insensitive
when you use the /g you will miss the group object in ()
exec()
regExp.exec(string) will give you the whole group contains Group 0 group 1…
you can use while(r.exec(s) !== null) to loop and take out the result[0] to get the group 0
split()–Match a delimiter(分界符)
s.split(‘/\s/‘) will get a words array.
replace()
s.replace(/\w{6~8}/g,”kittens”) means you will replace all the word.length in between 6 to 8 to kittens
IN ES6
/u 能正确处理中文字节
/u 能处理相似字符
/y修饰符匹配必须从头部开始(隐含^)
|
|