Content of this lecture. Regular Expressions in Java. Hello, world! In Java. Programming in Java

Download (0)

Full text

(1)

1

Regular Expressions in Java

2010-09-22

Birgit Grohe

2

Content of this lecture

• A very small Java program

• Regular expressions in Java

• Metacharacters

• Character classes and boundaries

• Quantifiers

• Backreferences

• Flag Expressions and Modifiers

• Summary

3

Programming in Java

• Object oriented programming language

• In some languages, the first step is to write small

programs from scratch (e.g. Perl).

• Learning Java is about to learn how to use

objects, classes

and

packages

, often before

you write your own.

• A Java program is first

compiled

into a .class file,

then you can run the program (remember lab1!)

• Different from Perl where a

interpreter

takes

care of both compilation and execution.

4

”Hello, world!” In Java

public class Hello {

public static void main (String[] args){

// Printing to a terminal window

System.out.println(”Hello, world!”);

}

}

>javac Hello.java

>java Hello

Hello, world!

Class definition comment method

(2)

5

Regular Expressions in Java

• The package

java.util.regex

consist of classes

Pattern, Matcher and PatternSyntaxException.

• A Pattern object is a compiled representation of

a regular expression.

• A Matcher object is the engine that interprets

the pattern and performs match operations

against an input string.

• For syntax errors: PatternSyntaxException.

6

Example

• The next slide shows Java code for a class for

regular expression processing:

• It reads an input string and a regular expression

from the user.

• The output are the matches, if any.

• The class is taken from a Java regular expression

tutorial:

http://download.oracle.com/javase/tutorial/essential/regex/index.html

The class will be used in lab 5!

Import..;

public class RegexTestHarness { public static void main(String[] args){

Console console = System.console();

if (console == null) { System.err.println("No console."); System.exit(1); }

while (true) {

Pattern pattern = Pattern.compile(console.readLine( "%nEnter your regex: ")); Matcher matcher = pattern.matcher(console.readLine(

"Enter input string to search: ")); boolean found = false;

while (matcher.find()) {

console.format("I found the text \"%s\" starting at " + "index %d and ending at index %d.%n",

matcher.group(), matcher.start(), matcher.end()); found = true;

}

if(!found){ console.format("No match found.%n"); } }

From a Java regexp tutorial, see

references.

Pattern pattern = Pattern.compile(console.readLine( "%nEnter your regex: "));

Matcher matcher = pattern.matcher(console.readLine( "Enter input string to search: "));

boolean found = false; while (matcher.find()) {

console.format("I found the text \"%s\" starting at " + "index %d and ending at index %d.%n",

matcher.group(), matcher.start(), matcher.end()); }

Enter your regex: foo

Enter input string to search: foo

I found the text "foo" starting at index 0 and ending at index 3. Enter your regex: cat.

Enter input string to search: cats

I found the text "cats" starting at index 0 and ending at index 4.

%n newline %s string %d number

(3)

9

Metacharacters

There are characters with a special meaning

within regular expressions in Java

To use their literal meanings:

• use the

escpape symbol

\

• or the

escape sequence

\Q <text> \E

. * ? + [ ] ( ) { } ^ $ |

\-10

Character Classes

• Simple character classes: [abc]

• Negation: [^abc]

• Ranges: [a-d]

• Union: [a-d[m-p]]

• Intersection: [a-z&&[def]]

• Subtraction: [a-z&&[^bc]]

negation d,e or f [ad-z] [a-dm-p] 11

Predefined Character Classes

• Digit: [0-9] or \d

• Non-digit: [^0-9] or \D

• Whitespace character: [ \t\n\x0B\f\r] or \s

• Word character: [a-zA-Z_0-9] or \w

• Other negations: \S \W

12

Boundary Matchers

• The beginning of a line: ^

• The end of a line: $

• Word boundary: \b

• The beginning of the input: \A

• The end of the previous match: \G

• The end of the input: \z

• For more matchers see literature!

Interesting since quantifiers in Java

work slightly differently compared

(4)

13

Quantifiers

zero or more times X*+ X*? X* X, exactly ntimes X{n}+ X{n}? X{n}

one ore more times X++

X+? X+

once or not at all X?+ X?? X? Possessive Reluctant Greedy More alternatives: X{n,} and X{n,m} 14

Greedy Quantifiers

Enter your regex: a?

Enter input string to search: aaaa

I found the text "a" starting at index 0 and ending at index 1. I found the text "a" starting at index 1 and ending at index 2. I found the text "a" starting at index 2 and ending at index 3. I found the text "a" starting at index 3 and ending at index 4. I found the text "" starting at index 4 and ending at index 4. Enter your regex: a*

Enter input string to search: aaaa

I found the text "aaaaa" starting at index 0 and ending at index 4. I found the text "" starting at index 4 and ending at index 4. Enter your regex: a+

Enter input string to search: aaaa

I found the text "aaaaa" starting at index 0 and ending at index 4.

Multiple matches!

Greedy!

? and * match ””

Greedy Quantifiers

Enter your regex: (cat){3}

Enter input string to search: catcatcatcatcatcat

I found the text ”catcatcat" starting at index 0 and ending at index 9. I found the text ”catcatcat" starting at index 9 and ending at index 18. Enter your regex: cat{3}

Enter input string to search: catcatcatcatcatcat No match found.

Enter your regex: a{3,5}

Enter input string to search: aaaaaaaa

I found the text "aaaaa" starting at index 0 and ending at index 5. I found the text "aaa" starting at index 5 and ending at index 8.

Greedy! Grouping strings for

quatifiers with ( )

Reluctant and Possessive

Quantifiers

Enter your regex: .*foo // greedy quantifier

Enter input string to search: xfooxxxxxxfoo

I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13. Enter your regex: .*?foo // reluctant quantifier

Enter input string to search: xfooxxxxxxfoo

I found the text "xfoo" starting at index 0 and ending at index 4. I found the text "xxxxxxfoo" starting at index 4 and ending at index 13. Enter your regex: .*+foo // possessive quantifier

Enter input string to search: xfooxxxxxxfoo No match found.

Tries to finish as early as possible

(5)

17

Summary Quantifiers

• The greedy quatifier

tries to match as much as it can

until the end of the string is reached. If it fails, it goes

back one letter at a time and tries again until a match is

found or the start of the input is reached (= no match).

• The reluctant quantifier

tries to match as early as

possible, increasing a letter at a time until a match is

found or the end of the input string is reached (= no

match).

• The possessive quantifier

consumes the entire string

once and if it did not suceed, it just stops without looking

back.

Fast performance!

18

Backreferences

Backreferences

work approximately the

same as in Perl, i.e. those parts of the

regular expression that are placed in ( ),

can be accessed with \1, \2 ...

19

Modifiers

In Java there exist similar features as the

modifiers

in Perl. There are two possibilities to implement

and use them:

Embedded Flag expression

(the flag is given

inside the regular expression)

Flags

and methods from the Pattern-class

(extra code and function calls required)

More modifies can be found in the Java Regexp

tutorial.

20

Embedded Flag Expressions

Example: Case insensitivity:

Enter your regex: (?i)foo

Enter input string to search: FOOfooFoO

I found the text "FOO" starting at index 0 and ending at index 3. I found the text "foo" starting at index 3 and ending at index 6. I found the text "FoO" starting at index 6 and ending at index 9.

(6)

21

Methods from the Pattern Class

Example: Case insensitivity

Pattern pattern = Pattern.compile(

console.readLine("%nEnter your regex: "),

Pattern.CASE_INSENSITIVE);

Enter your regex: dog

Enter input string to search: DoGDOg

I found the text "DoG" starting at index 0 and ending at index 3. I found the text "DOg" starting at index 3 and ending at index 6.

Modify the code!

22

Other Modifiers and Flags

The Pattern and Matcher classes support

similar features that are present in Perl,

e.g.

split

, several different substitution

methods (called ’

replacement

´ in Java),

comments, line versus file mode, etc.

Please read the Java Regexp tutorial for more details!

Summary

• Java provides a package for regular

expressions:

java.util.regex

• The syntax and usage of regular expressions in

Perl and Java are similar.

• There are minor differences in the regular

expression engine, e.g. on how the quantifiers

are implemented.

• Both Java and Perl provide similar features, e.g.

classes and functions and you will explore some

differences in lab 5.

Figure

Updating...

References

Related subjects : Java Programming