• No results found

University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method

N/A
N/A
Protected

Academic year: 2021

Share "University Convocation. IT 3203 Introduction to Web Development. Pattern Matching. Why Match Patterns? The Search Method. The Replace Method"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

IT 3203

Introduction to Web Development

Regular Expressions

October 12

Copyright © 2007 by Bob Brown

Notice: This session is being recorded.

University Convocation

• Tuesday, October 13, 11:00 AM – 12:15 PM

• Student Center Theatre

• Convocation Speaker: Dr. John Palfrey

• Speaking on “Born Digital in a Network Society” • Professor at Harvard Law School

• Vice Dean for Library and Information Resources • Co-author of Born Digital: Understanding the First

Generation of Digital Natives and also Access Denied: The Practice and Politics of Internet Filtering

Pattern Matching

Pattern matching in JavaScript is based on regular expressions. Regular expressions are patterns that are compared with strings or substrings

In reality, regular expressions are a small formal language. Two approaches in JavaScript:

regexp object

methods of the string object

8

9

Why Match Patterns?

• Most data validation that can be done on the

client-side consists of testing data for

conformance to a pattern.

• Telephone numbers • Email addresses • Dates • Money amounts • … what else?

The Search Method

My_pos becomes 2.

/er/is a pattern. The search method searches for the pattern in the string.

Returns -1 if there is no match.

Search is a method of the ‘string’ object

var my_string = "Abernathy";

var my_pos = my_string.search(/er/);

The Replace Method

var bobs = "Bob, Bobbie"; bobs.replace(/Bob/g, "Bill");

The string bobs now contains

“Bill, Billbie”

/Bob/

is a pattern, but “Bill” is just a string.

The “g” means “global.”

(2)

The Match Method

Match is the most general of the methods

var fruit = "4 apples 3 oranges"; var my_nbrs = fruit.match(/\d/g);

my_nbrs contains [4, 3] (it’s an array)

g

all matches

no

g

first match, plus parenthesized subpatterns

\d

matches digits ( and \D

matches non-digits

.)

Forming Regular Expressions

/ /

enclose patterns

“normal” characters match themselves

(e.g. “rabbit”)

Metacharacters have special meanings

\ | ( ) [ ] { } ^ $ * + ? .

Metacharacters can be included in patterns by

escaping with a backslash, like

\$

A “real” dollar sign

Wildcard Matching

.

(period) matches any character except

newline

/snow./

matches snows, snowy

matches “snowi” in “snowing”

Classes

[ ]

(brackets) define classes

[abc]

/[abc]/

matches a or b or c

/[a-h]/

matches lower-case a through h

^

(circumflex) inverts a class

/[^aeiou]/

matches all except a,e,i,o,u

Predefined Classes

\x

backslash and class abbreviation

See your textbook or a JavaScript reference

\d

matches a digit: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9

/\d+\.\d*/

One or more digits

a period

zero or more digits

Word and Space Characters

Word characters: [a-zA-Z0-9_] \w

Non-word characters: [^a-zA-Z0-9_] \W

Space characters: space, tab, new line: Non-space characters:

\s \S

Capitalization reverses the sense of the predefined class names.

(3)

Boundary Matches

\b matches boundary between word and non-word Foo baz

/Fred\b/ Matches “Fred is” but not “Frederick is…”

/Fred\B/ Matches “Frederick is” but not “Fred is…” \bis\bmatches “is” in: This island is beautiful This allows a whole-words-only search.

zero-length match

Repetition

*zero or more

+one or more ?one or none

{ } a count (applies to pattern character on left)

/xy{4}z/ == /xyyyyz/ /X*y+z?/

Repetition Examples

*zero or more +one or more ?one or none /\d*\.\d+/ /\d*\.?\d*/

Repetition Exercise

/\d*\.\d+/ 1. 0.0 2. .25 3. 137 4. 137. 5. 4.5678 6. xyz.123

Can We Fix The Pattern?

/\d+\.?\d*/ 1. 0.0 2. .25 3. 137 4. 137. 5. 4.5678 6. xyz.123

Assume we are trying to match “valid” numbers

in various combinations with decimal point. Is

this any better? (Not much!)

Repetition Exercise: Case 2

/\d+\.?\d*/ 2. .25

This expression does match test case 2 at

position 1, the digit 2. But…

the decimal point is skipped by

\d

+,

which matches 25

\.?

makes (another) decimal optional

\d*

matches nothing

It also matches within:

.25.67

! Why?

What about:

.25.67.89

?

(4)

Repetition Exercise: Case 6

/\d+\.?\d*/ 6. xyz.123

This expression does match test case 6 at

position 4, the digit 1. But…

the decimal point is skipped by

\d+

,

which matches 123

\.?

makes (another) decimal optional

\d*

matches nothing

8

Another Repetition Exercise

/X*y+z?/ 1. Xyyyz 2. Xzzy 3. yyyyz 4. yyyy 5. wxyzz 6. zzzXyzz

Anchors

Specify where to start matching

/^pearl/ Match starts at beginning of string “pearls are...” but not “my pearls...”

Same character as pattern inversion, but different context, different meaning.

/gold$/Anchors to end of string“I like gold” but not “sunset is golden”

Grouping and Alternatives

Parentheses group items.

The pipe or vertical bar matches one of two or

more alternatives.

Matches ABCDEF or ABCXYZ

abc(def|xyz)

Now We Can Fix The Pattern

/^\d*(|\.\d*)?$/

Almost! We are trying to match either a digit or a

decimal point:

If a decimal point, then one or more digits

Otherwise, an optional decimal point followed by

zero or more digits.

Problem: This matches a decimal point all by

itself. To fix, we need conditional expressions,

which are beyond the scope of the course because

conditionals are not supported in JavaScript.

A Closer Look

• Anchored at the beginning of the string

• Zero or more digits

• A group containing either nothing, or a

decimal point and zero or more digits,

• Repeated zero or one times.

• Anchored at the end of the string

/^\d*(|\.\d*)?$/

(5)

Did That Work?

/^\d*(|\.\d*)?$/ 1. 0.0 2. .25 3. 137 4. 137. 5. 4.5678 6. xyz.123 7. .

Modifiers

Follow the pattern:

g global i case-insensitive

/buffalo/i

Matches “Buffalo” and “buffalo”

The Split Method

Splits a string into substrings

Returns an array of substrings

var my_str = "grapes:apples:oranges"; var fruit = my_str.split(":");

fruit is ["grapes", "apples", "oranges"] Split can take a regular expression as a delimiter

What about this?

var my_nbrs = "12, 3,4, 56"; nbr_array=my_nbrs.split(/\s*,\s*/);

Split with a Regular Expression

Splitting a comma-delimited string:

var my_nbrs = "12,34,56";

var nbr_array = my_nbrs.split(",");

How does this work?

var ok = phNum.search(/\d{3}-\d{4}/); What does the search method return for this?

555-1212

A 7-Digit Phone Number

A 7-Digit Phone Number

How does this work?

var ok = phNum.search(/\d{3}-\d{4}/);

(6)

A 7-Digit Phone Number

How does this work?

var ok = phNum.search(/\d{3}-\d{4}/);

What about this? 444555-12123456

var ok = phNum.search(/^\d{3}-\d{4}$/);

“Anchoring” the beginning and end gives an expression that works: No match here!

10-Digit Phone Number

Can it be extended for Atlanta-style phone numbers? var ok=phNum.search(/^\d{3}-\d{3}-\d{4}$/);

10-Digit Phone Number

Can the format be made less rigid? (Yes!)

/^\(?\d{3}\D*\d{3}\D*\d{4}$/

• Anchor at the beginning of the string • Optional left parenthesis

• Three digits • Optional non-digits • Three digits • Optional non-digits • Four digits

• Anchored at the end of the string.

Accepting Free-Form Phone Numbers Parentheses act as grouping and storage operators.

var ok = datum.search(/^\(?\d{3}\D*\d{3}\D*\d{4}$/); if (ok==0) {

var parts = datum.match

(/^\(?(\d{3})\D*(\d{3})\D*(\d{4})$/);

output.value='('+parts[1]+') '+parts[2]+'-'+parts[3]; }

Accepts: 404-555-1234, 4045551234, (404) 555-1234, etc. Returns: (404) 555-1234

Regular Expressions as NFAs

• “Nondeterministic Finite Automata”

• Nondeterministic is not the same as “random” • Each part of a regular expression will match as

much as it can.

matches to end of string!

• The regular expression engine backtracks when necessary, i.e. when a match would otherwise fail.

.*

Regular Expressions are Greedy

A regular expression will match as much of the target string as possible

19202122232425252627282930313233

/2.*2/

(7)

Stars by the <b>billions</b> and <b>billions</b>.

Regular Expressions are Greedy

Consider parsing HTML with a regular expression.

/<b>.*<\/b>/

Friedl, J.

Mastering Regular Expressions

Stars by the <b>billions</b> and <b>billions</b>.

Regular Expressions are Greedy

Consider parsing HTML with a regular expression.

The ? is also the “lazy” modifier:

/<b>.*?<\/b>/ /<b>.*<\/b>/

Friedl, J.

Mastering Regular Expressions

Questions

IP Addresses

4.56.123.156

/^(\d+)\.(\d+)\.(\d+)\.(\d+)$/

var octets=ip.match( );

check each octet for being ≤ 255

References

Related documents

It is the (education that will empower biology graduates for the application of biology knowledge and skills acquired in solving the problem of unemployment for oneself and others

Berdasarkan hasil wawancara dengan informan Koordinator Pengelola PKM- K dan mahasiswa penerima beasiswa Bidikmisi yang lolos seleksi PKM-K mengenai dana yang diberikan pada

Here, we uncover a FANCD2 methyl-binding domain, which specifically binds for H4K20me2 in order to recruit FANCD2 to sites on DNA damage and promote homologous

Exogenous rates of infant and child mortality, returns to labor market experience, skill premiums, various costs of children, and cohort income levels are model inputs used to

concept Classifier for SharePoint 2010 drives immediate value for end users for Search, Records Management, and Sensitive Information Removal.

In view of the present satisfactory level of computerisation in commercial bank branches, it is proposed that, ‘‘payment of interest on savings bank accounts by scheduled

- Habitat for Humanity International – Provided computer support for the direct mail, telemarketing, major donor, matching gift, and special event fundraising programs -

Control &lt;&lt; ButtonBase &gt;&gt; Button CheckBox RadioButton DataGridView DataGrid &lt;&lt; TextBoxBase &gt;&gt; TextBox RichTextBox GroupBox PictureBox StatusBar ToolBar TreeView