2017-09-04

RegExp Expression Notes

https://www.youtube.com/watch?v=7DG3kCDx53c&t=1s
That is a really nice video series for us to learn RegExp.

Introduction to the regular expression(RegExp)

i.Why should we use Regex? We can use Command + f to find anything we want.

Here is a text:

My number is 999-9999-9999 and his number is 888-8888-8888.

It is easy for you to find his or my number separately(literal字面上的), but it’s a big problem for us to fine all the numbers at the same time（more generalized pattern更广义的）.

ii.We should use the meta-characters(元字符) to match the number we want.

We can remember the ‘\d’ is the meaning of 0-9, so we can use the Command + f (or control+ f) to find the numbers in this way(\d\d\d-\d\d\d\d-\d\d\d\d).

You should click the Regex radio button to use Regex like this.

It’s really magical.

iii.We should also remember the mean of ‘.’ and ‘*’

‘.’ means any char
‘‘ means zero or more
So, ‘.‘ means everything, and ‘rainbow.*’ means i want to match rainbow followed by any char or any character.

Meta-characters

whole bunch of spercific meta characters

single char	quantifiers	position
\d -> 0-9	* -> 0 or more	^ -> begining of the line
\w -> A-Z,a-z,0-9(essence is a part of word ,any letter or number) [\W means anything that is not]	+ -> 1 or more	$ -> end
\s -> white space,tab，enter [\S means anything that is not]	? -> 0 to 1 [colou?rs?]	\b -> boundary [\b\w{4}\b]
. -> any	{min,max}	—
\d -> 0-9	{n}:repeat-n	—
\d -> 0-9	—	—

Character Classes

[abc]:a or b or c

[-]:dash at the begining. It means the literal dash.

[a-c]:dash not at the begining. It means find something between it.

[^abc]:caret symbol at the begining. It means find somthing not in abc.

[a^bc]:caret symbol not at the begining. It means find something in a^bc.

Alternation

(net|com|edu)

e.g: If you want to find all the email adress,you can use this.

[\w.]+@\w+.(net|com|edu)

Capturing Groups

999-999-9999 is the number

\d{3}-\d{3}-\d{3} is the RegExp

(\d{3})-\d{3}-\d{3} is capturing the first 3 digits. And it calls Group1. The whole sentence calls Group zero.

We can replace it by $1-XXX-XXX to save the group1

e.g: Change the name by first name + last name
Peter, Chen => Chen Peter
RegExp: (\w+),\s+(\w+) and replace it to \$2 \$1

How to aviod the “.*” to be greedy?

use the .*? to aviod it.

Back References(反向引用)

If you want to match two same world near in the text. You can use the

\b(\w+)\s\1\b
/1 means Group 1

In JavaScript

test() and match()

string.match() return the first of the match sentence
regExp.test() return true or false

/g and /i flag

/g means global
/i means case insensitive

when you use the /g you will miss the group object in ()

exec()

regExp.exec(string) will give you the whole group contains Group 0 group 1…

you can use while(r.exec(s) !== null) to loop and take out the result[0] to get the group 0

split()–Match a delimiter(分界符)

s.split(‘/\s/‘) will get a words array.

replace()

s.replace(/\w{6~8}/g,”kittens”) means you will replace all the word.length in between 6 to 8 to kittens

IN ES6

/u 能正确处理中文字节

/u 能处理相似字符

/y修饰符匹配必须从头部开始(隐含^)

var s = "aaa_aa_a"
var r1 = /a+/y;
r1.exec(s);//aaa
r1.exec(s);//null