Java RegEx

 

Test Harness

import java.io.Console;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
 
public class RegexTestHarness {
 
    public static void main(String[] args){
        Console console = System.console();
        if (console == null) {
            System.err.println("No console.");
            System.exit(1);
        }
        while (true) {
 
            Pattern pattern = 
            Pattern.compile(console.readLine("%nEnter your regex: "));
 
            Matcher matcher = 
            pattern.matcher(console.readLine("Enter input string to search: "));
 
            boolean found = false;
            while (matcher.find()) {
                console.format("I found the text" +
                    " \"%s\" starting at " +
                    "index %d and ending at index %d.%n",
                    matcher.group(),
                    matcher.start(),
                    matcher.end());
                found = true;
            }
            if(!found){
                console.format("No match found.%n");
            }
        }
    }
}

String Literals

* Exact match:

Enter your regex: foo
Enter input string to search: foo
I found the text foo starting at index 0 and ending at index 3.

* Match multiple times:

Enter your regex: foo
Enter input string to search: foofoofoo
I found the text foo starting at index 0 and ending at index 3.
I found the text foo starting at index 3 and ending at index 6.
I found the text foo starting at index 6 and ending at index 9.

* Match metadata (in regex cat. dot means any character):

Enter your regex: cat.
Enter input string to search: cats
I found the text cats starting at index 0 and ending at index 4.

Metacharaters

<([{\^-=$!|]})?*+.>

* Escape metacharaters
– precede with a backslash \
– enclose with \Q and \E

Character Classes

* Character class is a set of characters enclosed with square brackets.

[abc]    a, b, or c (simple class)
[^abc]    any character except a, b, or c (negation)
[a-zA-Z]    a to z or A to Z, i.e. all alphabets, inclusive (range)
[a-d[m-p]]    a to d, or m to p, same as [a-dm-p] (union)
[a-z&&[def]]    d, e, or f (intersection)
[a-z&&[^bc]]    a to z, except for b and c, same as [ad-z] (subtraction)

Predefined Character Classes

.    any character
\d    a digit, same as [0-9]
\D    a non digit, same as [^0-9]
\s    a whitespace character, same as [ \t\n\x0B\f\r]
\S    a non-whitespace character, same as [^\s]
\w    a word character, same as [a-zA-Z_0-9]
\W    a non-word character, same as [^\w]

Quantifiers

* Specify the number of occurrences to match against.

References

* Official Java Tutorial: Regular Expressions

This entry was posted in regex and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


*

This site uses Akismet to reduce spam. Learn how your comment data is processed.