Greedy matching in FindExpr


arjun

Recommended Posts

I'd appreciate some help/clarification with the 'FindExpr' function.

FindExpr("123", "\d") evaluates to {0, 1}, which is as expected.

However,

FindExpr("123", "\d+") also evaluates to {0, 1} which is not as expected. I would expect that the second expression evaluates to {0, 3}.

Given the above behavior, how can I find the starting index and length of the longest possible number in a string? For example, if the string was "hello123world" the index and length of the longest possible number in the string are 5 and 3 respectively.

Thanks.

Link to comment
Share on other sites

  • 3 years later...

Does FindExpr allow for ranges in expressioins? I tried the following to check for alphanumeric FindExpr(totest, "[A-Za-z0-9]") with totest = "CheckThis" and it returns {-1,-1}

 

What I want to do is check for valid user input. So want to be able to test for alphanumeric, numeric, integer, passwords, usernames, etc.

 

Looked around on the Internet and seems like what I was trying was correct, but not getting the results I want in DAQFactory. Could you provide some examples for say for alphanumeric, numeric, integer to get me started?

Link to comment
Share on other sites

Here are the docs from the toolkit we use to evaluate the regular expression.  From that I'd say no to support for ranges.  I tested it, and if you type out all the valid letters instead of using a range, it works fine.

 

A tag is that part of a regular expression which consists of any of the symbols, or characters, listed in the table below. For example, the regular expression "s+.{2,4}" has two (2) tags: the "s+" which specifies that the "s" will be searched for one or more times; and ".{2,4}", which specifies that there must be two (2) to four (4) characters (except newline characters) following the occurence of the "s". In essence tags allow you to break up a rule into separate search components.

There are different formats and different implementations for regular expressions. This particular implementation is closest to the one described in the MSDN library which comes with Visual Studio 6.0. For more details search for "Regular Expressions" in the MSDN library.

Rules (or regular expressions) are comprised of standard characters plus the following symbols (which define tags within a rule):

Character   Description\           Marks the next character as special. Precede any special character that                  you would actually like to search for with this symbol.
^           A match occurs only if the character following this symbol is found at               the beginning of a line or input. This character cannot be defined in               the charset.
$           A match occurs only if the character following this symbol is found at               the end of the input, or line. This character cannot be defined in the               charset.
*           Searches for the preceding character zero or more times. For this                  implementation you must define more than one character.  If only one               character is specified in the regular expression then no matches will                   be found.  That means that /zo*/ matches z and zoo, but /z*/ will match                nothing because only one character has been specified.
+           Searches for the preceding character one or more times.
?           Searches for the preceding character either one time or not at all.  It               cannot be defined if only one character is specified in the regular               expression.
.           Matches any single character except '\n'.
(pattern)   Matches the specified pattern and remembers each match via unique                     indexes for each match found. Only characters enclosed within                 parentheses will be considered patterns.  The found substring can then               be retrieved by using '\0'-'\9' in the regular expression, where '0'-              '9' is the identifying index number of that particular pattern. For               example:                 '(re).*\0s+ion'  will match 'regular expression' because it first                   finds the pattern 're' and remembers the pattern with an index of                   0.  '.*' will then match 'gular exp' in 'regular expression'.  Then                   we look for the same pattern again by inserting '\0' into the                   regular expresson, which matches the second occurrence of 're' in                   'regular expression'.  Finally, 's+ion' matches 'ssion'.
x|y         Matches either character 'x' or 'y'. You can combine more than two             characters (e.g. 'x|y|z').
{n}         The preceding character must match exactly 'n' times (non-negative                values only).
{n,}        The preceding character must match at least 'n' times (non-negative                values only).
{n,m}       The preceding character must match at least 'n' times and at most 'm'               times. (n,m - non-negative numbers only).
[xyz]       A character set. A match occurs if any one of the enclosed characters is               found.
[^xyz]      A non-matching character set. A match occurs if any character that is               not in the set is found.
\b          Matches a word boundary.  A boundary occurs between two non-space characters.  Also,                     the characters "\f\n\r\t\v" do not define a word boundary.  For example, the expression               "me\n" only has one word boundary which occurs between the "m" and the "e".
\B          Searches for a non-word boundary. The exact opposite of the previous symbol ("\b").  A match               occurs for any boundary between space characters or between a non-space character and a               space character.   For example, the expression " me\n " has three (3) non-word boundaries:               between the first space and the "m"; between the "e" and the newline character; and between               the newline character and the last space.
\d          A match occurs for any digit /0-9/.
\D          Matches any non-digit character.
\f          Matches a formfeed.
\n          Matches a new-line character.
\r          Matches a carridge return character.
\s          Matches any white space character.
\S          Matches any non-white space character.
\t          Matches a tab character.
\v          Matches any vertical tab character.
\w          Matches any alphabetic character including underscores. [A-Z a-z 0-9 _].
\W          Matches any non-alphabetic character.
\num        Matches any characters defined as a pattern with a unique index between 0 and 9.  A               match occurs if the pattern identified by 'num' is found (see the pattern               description for an example).
/n/         A match occurs if a character is located with an ascii code of 'n'.  'n' must be               between 1 and 255.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.