Contents
Test Harness
import java.io.Console; import java.util.regex.Pattern; import java.util.regex.Matcher; public class RegexTestHarness { public static void main(String[] args){ Console console = System.console(); if (console == null) { System.err.println("No console."); System.exit(1); } while (true) { Pattern pattern = Pattern.compile(console.readLine("%nEnter your regex: ")); Matcher matcher = pattern.matcher(console.readLine("Enter input string to search: ")); boolean found = false; while (matcher.find()) { console.format("I found the text" + " \"%s\" starting at " + "index %d and ending at index %d.%n", matcher.group(), matcher.start(), matcher.end()); found = true; } if(!found){ console.format("No match found.%n"); } } } }
String Literals
* Exact match:
Enter your regex: foo Enter input string to search: foo I found the text foo starting at index 0 and ending at index 3.
* Match multiple times:
Enter your regex: foo Enter input string to search: foofoofoo I found the text foo starting at index 0 and ending at index 3. I found the text foo starting at index 3 and ending at index 6. I found the text foo starting at index 6 and ending at index 9.
* Match metadata (in regex cat. dot means any character):
Enter your regex: cat. Enter input string to search: cats I found the text cats starting at index 0 and ending at index 4.
Metacharaters
<([{\^-=$!|]})?*+.>
* Escape metacharaters
– precede with a backslash \
– enclose with \Q and \E
Character Classes
* Character class is a set of characters enclosed with square brackets.
[abc] a, b, or c (simple class) [^abc] any character except a, b, or c (negation) [a-zA-Z] a to z or A to Z, i.e. all alphabets, inclusive (range) [a-d[m-p]] a to d, or m to p, same as [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a to z, except for b and c, same as [ad-z] (subtraction)
Predefined Character Classes
. any character \d a digit, same as [0-9] \D a non digit, same as [^0-9] \s a whitespace character, same as [ \t\n\x0B\f\r] \S a non-whitespace character, same as [^\s] \w a word character, same as [a-zA-Z_0-9] \W a non-word character, same as [^\w]
Quantifiers
* Specify the number of occurrences to match against.
References
* Official Java Tutorial: Regular Expressions