2014年7月4日 星期五

Open Sourced my JavaScript Regular Expression Generator - RegexGen.js

Hi there, I've open-sourced my new library, RegexGen.js, a JavaScript regular expression generator, please give it a try. Comments and issue reports are welcome. Thank you!

RegexGen.js - JavaScript Regular Expression Generator

RegexGen.js is a JavaScript Regular Expression Generator that helps to construct complex regular expressions, inspired by JSVerbalExpressions.

RegexGen.js is basically designed for people who know how the regular expression engine works, but not working with it regularly, i.e., they know how to make the regex works but may not remember every meta-characters that constructs the regex.

RegexGen.js helps people don't have to remember: meta-characters, shortcuts, what characters to escape and tricks about corner cases.

RegexGen.js helps reusing regex patterns. (checkout the [Matching an IP Address] example bellow.)

The Problems

RegexGen.js try to ease two problems.

  1. While creating a regular expression, it's hard to remember the correct syntax and what characters to escape.
  2. After done creating a regular expression, it's hard to read and remember what the regex do.

The Goals

RegexGen.js is designed to achieve the following goals.

  1. The written codes should be easy to read and easy to understand.
  2. The generated code should be as compact as possible, e.g., no redundant brackets and parentheses.
  3. No more character escaping reguired (except '\', or if you use regex overwrite.)
  4. If the generated code is not good enougth, bad parts can be easily replaced directly in the written codes.

Getting Started

The generator is exported as a regexGen() function.

To generate a regular expression, pass sub-expressions as parameters to the regexGen() function.

Sub-expressions as parameters which are separated by comma are concatenated together to form the whole regular expression.

Sub-expressions can either be a string, a number, a RegExp object, or any values generated by the owned functions of the regexGen() function object, i.e., the regex-generator() as the following informal BNF syntax.

Strings passed into the regexGen(), the text(), the maybe(), the anyCharOf() and the anyCharBut() functions, are always escaped as necessary, so you don't have to worry about which characters to escape.

The result of calling the regexGen() function is a RegExp object.

The basic usage can be expressed as the following informal BNF syntax.

  
RegExp object = regexGen( sub-expression [, sub-expression ...] [, modifier ...] )

sub-expression ::= string | number | RegExp object | term

term ::= regex-generator() [.term-quantifier()] [.term-lookahead()]

regex-generator() ::= regexGen.startOfLine() | regexGen.endOfLine()
    | regexGen.wordBoundary() | regexGen.nonWordBoundary()
    | regexGen.text() | regexGen.maybe() | regexGen.anyChar() | regexGen.anyCharOf() | regexGen.anyCharBut()
    | regexGen.either() | regexGen.group() | regexGen.capture() | regexGen.sameAs()
    | regex() | ... (see regexGen.js for all termGenerator()s.)

term-quantifier() ::= .term-quantifier-generator() [.term-quantifier-modifier()]

term-quantifier-generator() ::= term.any() | term.many() | term.maybe() | term.repeat() | term.multiple()

term-quantifier-modifier() ::= term.greedy() | term.lazy() | term.reluctant()

term-lookahead() ::= term.contains() | term.notContains() | term.followedBy() | term.notFollowedBy()

modifier ::= regexGen.ignoreCase() | regexGen.searchAll() | regexGen.searchMultiLine()

Please check out regexgen.js and wiki for API documentations, and check out test.js for more examples.