Regex match at least one non-alpha char

Forum

Forum
General C++ Programming
Regex match at least one non-alpha char

Regex match at least one non-alpha char

I want to match "A1" or "A&B" but not "A". In other words, I want to match a contiguous range of chars that contains at least one non-alphabetic char.

Here is my boost regex string to match only numeric values and '&':

[[:digit:]&]+

I'm new to regex so this might be obvious. Thanks for any help,

Brian

sohguanh (1236)

Since you already have the code you can just compile and run to see if it achieves what you want correct ?

Duthomhas (13290)

Your description is a little vague, I think.

To match zero or more characters, use .*. To match any character other than an alphabetic, use [^[:alpha:]]

Hence, to "match a contiguous range of chars that contains at least one non-alphabetic char" use:

.*[^[:alpha:]].*

This will prove likely to match the entirety of any text you give it. So:

What exactly are you trying to match?

What regex engine (POSIX: BRE, ERE; Tcl ARE; Perl; etc) are you using?

Brian H (4)

Thank you for the replies.

I'm using Perl by supplying the flag boost::regex_constants::perl when building my pattern.

Yes, I have compiled my string and it only finds cases with digits or '&'.

Duoas, thanks for the fragment and you're right, it would match the entirety of text. What I'm ultimately trying to do is remove something like ", Room A&1" from "Anyplace, Room A&1", but not remove ", Room A" because the room designator is only alphabetic.

The following regex will match any contiguous range of chars following Room and some leading, internal and trailing white space/punctuation, but it doesn't enforce the presence of at least one non-alphabetic character in the room number:

(^|[ ,-]+){1}Room[ .,-]+[A-Za-z0-9&]+[ ,-]*

Before/After using boost::regex_replace(txt, pattern, "")

Science Hall Room - A&B -
Science Hall

Room 1C, Science Hall
Science Hall

Room C, Science Hall
<no change, as desired>

I think I need to AND sub-expressions to add an addtional condition like ([^A-Za-z]){1} that will also ensure at least one of those contiguous characters is non-alphabetic:

(^|[ ,-]+){1}Room[ .,-]+([A-Za-z0-9&#\"]+)([^A-Za-z]){1,}[ ,-]*

This regex doesn't work, however.

-Brian

Duthomhas (13290)

What exactly are the patterns you are trying to match? (Because your regular expressions don't match what you said.)

If I understand you, you have some text something like:
    sometext, Room A&1     anytext, Room B
and you want to change it to:
    sometext     anytext, Room B
If that is it, all you need is to match:
    ", *Room +[[:alpha:]]+\&[[:digit:]]+"
This will only match a string of the following kinds of things:
    ,Room aBc&123     , Room xyZ&7
Hope this helps.

Brian H (4)

Thanks for sticking with me on this. The variable makeup of the room "number" is what is tripping me up. It could be a combination of letters, numbers or punctuation like &, but not necessarily include all of those types, and not in any particular order. Here is a sample of the variety of strings I'm trying to match and the desired results:

anytext, Room 1
anytext Room 1A
anytext, Room B52
anytext, Room A&B
anytext, Room 5&H
anytext Room - Axb12&3
Room B52 - anytext
Room - 3C&6, anytext
<all match and reduce to "anytext">

anytext, Room Abc
anytext, Room - Abc
Room xyz,- anytext
<no change as room number is only alphabetic)

I now see that password validation expressions do something similar to what I want, but they are designed to suck up all the given text. I should probably take one of those and try to change it to limit the scope of text it matches to just the room "number".

Thanks again, I appreciate your help.
Brian

Duthomhas (13290)

This works on all your examples. I'm using an ARE to test, but I think it'll pass for Perl too. (Let me know if it doesn't.)

",? *Room +(?:- *)?[[:alpha:]]*[^[:alpha:][:space:],-]+[^[:space:]]*(?: *[-,] *)?"

Hope this helps.

PS My test harnass:


% proc p {} {
  foreach x $::xs { puts $x }
  }
% proc r e {
  foreach x $::xs {
    puts [regsub -- $e $x {}]
    }
  }
% p
anytext, Room 1
anytext Room 1A
anytext, Room B52
anytext, Room A&B
anytext, Room 5&H
anytext Room - Axb12&3
Room B52 - anytext
Room - 3C&6, anytext
sometext, Room Abc
sometext, Room - Abc
Room xyz,- sometext
% # notice that the ones we like have "sometext" instead of "anytext"
% r {,? *Room +(?:- *)?[[:alpha:]]*[^[:alpha:][:space:],-]+[^[:space:]]*(?: *[-,] *)?}
anytext
anytext
anytext
anytext
anytext
anytext
anytext
anytext
sometext, Room Abc
sometext, Room - Abc
Room xyz,- sometext
%

Last edited on

Brian H (4)

Thanks Duoas, that helps a lot! I was trying too hard to logically AND sub-expressions to check for at least one non-alphabetic char, but this is simpler and works great for boost C++ regexes in Visual Studio.

-Brian

Topic archived. No new replies allowed.