Current location - Plastic Surgery and Aesthetics Network - Plastic surgery and beauty - Who knows how to parse JavaScript regular expressions?
Who knows how to parse JavaScript regular expressions?
Regular expressions are objects that describe character patterns.

JavaScript's RegExp object and String object define the methods to perform powerful pattern matching, text retrieval and replacement functions using regular expressions. In JavaScript, regular expressions are represented by a RegExp object. Of course, you can use the RegExp () constructor to create a RegExp object. You can also create RegExp objects using the special syntax newly added in JavaScript 1.2. Just as the direct quantity of a string is defined as a character enclosed in quotation marks, the direct quantity of a regular expression is defined as a character enclosed in a pair of slashes (/). Therefore, JavaScript may contain the following code: var pattern =/s $/; This line of code creates a new RegExp object and assigns it to the variable parttern. This particular RegExp object matches all strings ending in the letter "s". You can also use RegExp () to define an equivalent regular expression with the following code: var pattern = new regexp ("s $"); Whether using regular expression direct quantity or using the constructor RegExp (), it is relatively easy to create a RegExp object. The more difficult task is to describe the pattern of characters with regular expression syntax. JavaScript uses a fairly complete subset of Perl's regular expression syntax. The pattern specification of a regular expression consists of a series of characters. Most characters (including all alphanumeric characters) describe characters that match literally. In this way, the regular expression /java/ matches all strings containing the substring "java". Although other characters in regular expressions are not literal matches, they all have special meanings. The regular expression /s$/ contains two characters. The first special character "s" exactly matches itself. The second character "$" is a special character that matches the end of the string. ..

. 1 string. We find that all alphabetic characters and numbers in regular expressions are literally matched. JavaScript's regular expressions also support some non-alphabetic characters through escape sequences beginning with backslashes (\). For example, the sequence "\n" matches a line break in a string. In regular expressions, many punctuation marks have special meanings.

________________________________

Alphanumeric characters themselves

\ f Page break

\ n newline character

\ r input.

\ t tab

\ v vertical tabs

\/a/ Direct quantity

\ \ a \ direct quantity

\. a. Direct quantity

\ * A * Direct quantity

\+One+Direct Quantity

\ ? One? Direct quantity

\ | a | Direct quantity

\ (1) Direct quantity

\) a) Direct quantity

\[A[ a] Direct quantity

\] a] Direct quantity

\ {a {Direct quantity

\} a} Direct quantity

\ XXX ASCII code character specified by decimal number XXX.

\ Xnn ASCII character specified by hexadecimal number nn

\ cX controls the character x, for example, \cI is equivalent to \t and \cJ is equivalent to \ nIf you want to use special punctuation marks in regular expressions, you must add a \ "before them. 2. Character classes can be combined into character classes by putting a single direct character in brackets. A character class matches any character it contains. So the regular expression /[ABC]/ matches any of the letters "a", "b" and "c". In addition, you can define a negative character class that matches all characters except those contained in parentheses. When defining a negative character prompt, use the symbol as the first character in the left parenthesis. The set of regular expressions is /[a-zA-z0-9]/. Because some character classes are very common, JavaScript's regular expression syntax contains some special characters and escape sequences to represent these common classes. For example, \s matches spaces, tabs and other spaces, while \s matches any character except spaces.

[...] Any character in parentheses

[...] Any character that is not in brackets

. Any character other than a newline is equivalent to [\ n]

\w Any single character, equivalent to [a-zA-Z0-9]

\W Any non-single character, equivalent to [a-za-z0-9]

\s Any blank space, equivalent to [\ t \ n \ r \ f \ v]

\S Any non-blank character, equivalent to [\ t \ n \ r \ f \ v]

\d Any number, equivalent to [0-9].

\D Any character except numbers, equivalent to [0-9].

[\b] Backspace Direct Quantity (special case) 3. Using the above general table syntax, you can describe a two-digit number as /\ d \ d/ This string consists of three characters and a number followed by a letter. These complex patterns use regular expression syntax to specify the number of times each element in the expression should be repeated. Specifies that the characters to be copied always appear after the pattern they act on. Because a certain type of copying is more common, there are some special characters. Used to represent them. For example, the+sign is a pattern that copies the previous pattern one or more times. The following table lists the copy syntax. Let's look at an example first:/\ d {2,4}//Match a number between 2 and 4. /\w{3} \d? ///Matches three single-word characters and an arbitrary number. /\ s+java \ s+// Matches the string "Java" with one or more spaces before and after it. /["] *//Matches zero or more non-quoted characters.

The character meaning {n, m} of the copied character of the regular expression matches the previous item at least n times, but not more than m times.

{n,} matches the previous item n times or more.

{n} exactly matches the previous item n times.

Matches the previous item 0 times or 1 times, which means that the previous item is optional. It is equivalent to {0, 1}.

+Match the previous item 1 times or more, which is equivalent to {1,}

* Matches the previous item 0 or more times. It is equivalent to {0,} 4. The syntax for grouping and referencing regular expressions also includes specifying options, grouping subexpressions, and special characters that reference the previous subexpression. Character | is used to separate the selected characters. For example, /ab|cd|ef/ matches the string "ab" or the string "cd" or "ef". D{3}|[a-z]{4}/ The match is a three-digit number or four lowercase letters. In regular expressions, parentheses have several functions. Its main function is to group individual items into subexpressions so that you can use *,+or? To handle these projects. For example: /java (script)? /Matches the string "java", which may or may not be followed by "script". /(ab | cd)+| ef)/ Matches one or more duplicates of the string "ef" or the string "ab" or "CD". In regular expressions, the second purpose of parentheses is to define a sub-pattern in a complete pattern. When the regular expression successfully matches the target string, we can extract the part that matches the subpattern in brackets from the target string. For example, suppose the pattern we want to search is one or more letters followed by one or more numbers, then we can use the pattern /[a-z]+\ d+/. But since we assume that we really care about the number at the end of each match, if we put the number part of the pattern in brackets (/[a-z]+(\d+)/), we can extract the number from any match retrieved, and then we will analyze it. Another purpose of parenthesis subexpression is to allow us to refer to the previous subexpression after the same regular expression. This is achieved by inserting a string \ This is achieved by adding one or more numbers. Numbers refer to the positions of parenthetical subexpressions in regular expressions. For example, \ 1 refers to the subexpression of the first bracket. \3 refers to the subexpression of the third bracket. Note that a subexpression can be nested within other subexpressions, so its position is the position of the left parenthesis. For example, the following regular expression is specified as \.

/([Jj]ava([Ss]script))\ sis \ s(fun \ w *)/

The reference to the previous sub-expression in a regular expression specifies not the pattern of the sub-expression, but the text that matches the pattern. In this way, reference is not only a shortcut to help you enter the repeated part of the regular expression, but also implements a convention. That is, each separated part of the string contains exactly the same characters. For example, the following regular expression matches all characters in single or double quotation marks. However, it requires that the opening and closing quotation marks match (for example, two double quotation marks or two single quotation marks):/['"] ['"] * ['"]/

If the quotation marks at the beginning and the end are required to match, we can use the following quotation marks:/(['"]] ['"] * \1/\1matches the pattern of the subexpression of the first bracket. In this example, it implements a convention that the quotation mark at the beginning must match the quotation mark at the end. Note that if the number after the backslash is more than the subexpression in parentheses, it will be interpreted as a decimal escape sequence, not a reference. You can insist on using three complete characters to represent the escape sequence, so as to avoid confusion. For example, use \044 instead of \44. The following are the selection, grouping and reference characters of regular expressions: character meaning.

| Select. The subexpression on the left or the subexpression on the right of the symbol matches.

(...) grouping. Divide several projects into a unit. This unit can be composed of *,+,? And |, you can remember the characters that match this group for future reference.

\n Matches the characters matched by the nth grouping. Grouping is a subexpression (possibly nested) in parentheses. The grouping number is the number of left parentheses calculated from left to right. 5. Specify the matching location. As we can see, only many elements in a regular expression can match a character in a string. For example, \s matches only one blank character. Some elements of the regular expression match spaces with a width of 0 between characters. For example, \b matches the boundary of a word, that is, the boundary between a /w word characters and \w non-word characters, instead of the actual characters. Characters like \b do not specify any characters in the matching string. They specify the legal location where the match occurs. Sometimes we call these elements anchors of regular expressions because they locate patterns at specific positions in the retrieval string. The most commonly used anchor element is ",which makes the pattern depend on the beginning of the string, while the anchor element $ locates the pattern at the end of the string. For example, to match the word "javascript", we can use the regular expression /javaScript $/. If we want to retrieve the word "Java" itself (instead of using it as a prefix as in "javascript"), then we can use the /\s java \s/ pattern. It requires spaces before and after the word java. But there are two problems with this. First, if "java" appears at the beginning or end of a character, the pattern will not match it unless there is a space at the beginning and end. Second, when the pattern finds a matching character, it returns a matching string with spaces before and after, which is not what we want. So we use the word boundary \b instead of the real space character \s to match. The resulting expression is /\b java \b/. The following are the positioning characters of regular expressions:

Character meaning

Matches the beginning of a character, while in multi-line retrieval, matches the beginning of a line.

$ matches the end of a character, while in multi-line retrieval, it matches the end of a line.

\b Match the boundaries of words. In short, it is the position between the characters \w and \w (note: [\b] matches backspace).

\B Matches characters that are not word boundaries. 6. Attributes There is one last element about the syntax of regular expressions, which is the attribute of regular expressions, which explains the rules of advanced pattern matching. Unlike other regular expression syntax, attributes are interpreted outside the/sign. That is, they do not appear between two diagonal lines. But after the second slash. Javascript 1.2 supports two attributes. Attribute I indicates that pattern matching should be case-insensitive. Attribute g indicates that pattern matching should be global. In other words, all matches in the retrieved string should be found. Together, these two attributes can perform a global, case-insensitive match. For example, perform a size-insensitive search to find the first specific value of the word "java" (or "java", "JAVA", etc.). ), we can use the size-insensitive regular expression/\ bjava \ b/i. If you want to find all the specific values of "java" in a string, you can also add the attribute g, that is,

Character meaning

I perform case-insensitive matching.

G to make a global match, in short, is to find all the matches, rather than stop when you find the first one. Regular expressions have no other attributes like attributes except attributes g and i. If the static property multiline of the constructor RegExp is set to true, pattern matching will be performed in multiline mode. In this mode, the positioning characters $ and $ not only match the beginning and end of the retrieval string, but also match the beginning and end of a line in the retrieval string. For example, the pattern /Java$/ matches "Java" but does not match "Java\nis fun". If we set the multiline property, we will also match the latter: RegExp.multiline = true Regular expression objects contain regular expression patterns. It has the properties and methods of matching or replacing a specific character (or character set) in a string with a regular expression pattern. To add attributes to a single regular expression, you can use the regular expression constructor. Whenever a preset regular expression is called, it has static properties (the predefined regexp object has static properties set whenever any regular expression is used. I don't know if the translation is correct, please list the original text and translate it yourself). Create:

Text format or regular expression constructor

Text format:/mode/logo

Regular expression constructor: new RegExp("pattern"[, "flags"]);

Parameter description:

Pattern-Regular expression text.

Flag-If it exists, it will be the following value:

Global matching

I: ignore case

Gi: the above combination

[Note] The parameters in text format do not need quotation marks, while the parameters in the constructor need quotation marks. For example, /ab+c/i new RegExp("ab+c ","i ") achieves the same function. In the constructor, you need to translate some special characters (put "\" before the special characters). For example: re = new RegExp("\\w+ ") The meaning of special characters in regular expressions.

\ as a change of meaning, that is, the characters usually after \ "are not interpreted as intended, such as /b/ matching the character" b ". When b is preceded by a slash, /\b/ indicates the boundary of the matching word.

-Or-

Restores the functional characters of a regular expression, such as "*" matching the metacharacter before it 0 times or more. /a*/ will match A, aa and AAA. After adding "\", /a\*/ will only match "a*".

Matches the input or the beginning of a line. /A/ Matches "a" but does not match "a"

$ matches the input or the end of the line, /a$/ matches "a" but does not match "a"

* Matches the previous metacharacter 0 or more times. /ba*/ will match b, ba, baa, baaa.

+Matches the previous metacharacter 1 time or more. /ba*/ will match ba, baa, baaa.

Matches the previous metacharacter 0 times or 1 times. /ba*/ will match b, ba.

(x) Match X and save X in a variable named $ 1 ... $9.

X|y matches x or y.

{n} exact matches

{n,} matched more than n times.

{n, m} matches n-m times.

[xyz] Character set, which matches any character (or metacharacter) in the set.

[XYZ] does not match any characters in the collection.

[\b] Matches a backspace character

\b Match the boundaries of words.

\B Matches the non-boundary of the word.

\ cxhere, x is the control character, /\cM/ matches ctrl-m.

\d Match a character with multiple words. /\d/ = /[0-9]/

\D Matches a non-word character. /\ d/=/[0-9]/

\n Matches a newline character.

\r match carriage return.

\s matches a blank character, including \n, \r, \f, \t, \v, etc.

\S matches a non-blank character equal to /[\ n \ f \ r \ t \ v]/

\ tpatch tab.

\v matches a redrive tab.

\w Matches a character (alphanumeric, my free translation, including numbers) that can form a word, including underscores. For example, [\w] matches 5 in "$5.98", which is equal to [a-zA-Z0-9].

\W Match characters that can't form words, such as [\W] matching $ in "$5.98" equals [a-za-z0-9].