removing leading+trailing dashes from strings

Pages: 12
Mar 6, 2011 at 9:49pm
Hello everyone in this great forum, which i finally joined!

I parse wordlists:

1
2
3
4
5
6
7
8
9
10
11
ifstream StmFI("wordlist");
string s;

while (StmFI >> s) { // e.g. "-string-with-dashes--"
    while (s.find('-') == 0)
        s.erase(0, 1);

    while (s.rfind('-') == s.size() - 1)
        s.erase(s.size()-1, 1);
    // ...
}

if string is "-" ==>boom!

s.size() - 1 == 0 - 1 == -1 and also string::npos == -1
==>s.erase(-1, 1) !!!

circumstantially solution:

1
2
    while (s.size() && s.rfind('-') == s.size() - 1)
        s.erase(s.size()-1, 1);


Has anyone a more elegant solution?
Last edited on Mar 6, 2011 at 9:55pm
Mar 6, 2011 at 10:16pm
If the string is "-" wouldn't the first while loop take care of everything?
Mar 6, 2011 at 11:31pm
@firedraco: But then the second while loop would run, breaking as TC has stated.
Mar 6, 2011 at 11:35pm
Ah...In that case, I don't see anything better than what you have...although testing for s.empty() is probably more easily read.
Mar 7, 2011 at 1:04am
Thank you, guys! (3800 posts are a number!)

a solution to kick one loop for the price of an extra int:

1
2
while (! s.empty() && (! (i = s.find('-')) || (i = s.rfind('-')) == s.size() - 1))
    s.erase(i, 1);


I parse the whole alphabet with a call to find_first_not_of() and this little char wants two loops alone... ;-)
Mar 7, 2011 at 1:05am
Tbh, I would probably prefer the original, it was a lot easier to figure out what you were doing.
Mar 7, 2011 at 10:09pm
Damn, it's that simple:

1
2
3
4
5
while (s.find('-') == 0)
    s.erase(0, 1);

while (s.rfind('-') == static_cast<unsigned char>( s.size() - 1 ))
    s.erase(s.size()-1, 1);


or as one-liner:

1
2
while (! (i = s.find('-')) || (i = s.rfind('-')) == (UCHAR) (s.size() - 1)) // typedef'd
    s.erase(i, 1);


...same story like EOF!
Last edited on Mar 7, 2011 at 10:10pm
Mar 7, 2011 at 10:33pm
If all you want to do is remove leading and trailing characters, then you only need to use the standard string algorithms:

 
#include <string> 
1
2
s.erase( s.find_last_not_of( '-' ) +1 );  // trim right
s.erase( 0, s.find_first_not_of( '-' ) );  // trim left 

To remove all instances of a character, you'll need one of the general algorithms:

 
#include <algorithm> 
 
s.erase( std::remove( s.begin(), s.end(), '-' ), s.end() );

Hope this helps.
Mar 8, 2011 at 12:31am
...i have fun with my wordlists... millions of words... :-)

==>but better cast to ushort:

1
2
while (! (i = s.find('-')) || (i = s.rfind('-')) == (USHORT) (s.size() - 1))
    s.erase(i, 1);


@Duoas: good idea, i will check this...
Last edited on Mar 8, 2011 at 12:48am
Mar 8, 2011 at 3:13am
Er, why are you casting to random Windows #defined type aliases? Use the proper thing:

if (i = s.rfind('-')) != std::string::npos)
Mar 8, 2011 at 8:55pm
...you're right, STL at it's best:

1
2
3
4
5
6
7
8
9
10
const char *pcc_Char2Cut = "!\"#'()*+,-./:;<>?@[]_`{|}~"; // lead + tail
ifstream StmFI;

void S::F_Parse()
{
    string s;
    while (StmFI >> s) {
        s.erase(s.find_last_not_of(pcc_Char2Cut) + 1).erase(0, s.find_first_not_of(pcc_Char2Cut));
     // ...
}


a one-liner, and w/o any loop :-)


Remember:
  erase(0, 0) is okay (does nothing)
  erase(size(), npos) is okay (dito)

  but erase(-1) crashes!


one can think this gives:
~find_last_not_of()
~erase()
~find_first_not_of()
~erase()

but apparently the compiler is free to code:
~find_first_not_of()
push result
~find_last_not_of()
~erase()
~erase()


Remember:
  erase last before first

  else crash!



@Duoas: Er, why are you casting to random Windows #defined type aliases?

To avoid a crash. And uchar gives strlen 255, but ushort 65K.

@Duoas: if (i = s.rfind('-')) != std::string::npos)

want only delete lead/trail chars


Thanx again for your inspiration!
Last edited on Mar 9, 2011 at 12:20am
Mar 9, 2011 at 12:31am
To avoid a crash. And uchar gives strlen 255, but ushort 65K.
If you are crashing, it is because you are doing something the wrong way. Applying casts to unrelated types doesn't solve the problem, it only hides it... Which is exactly why I referred you to the documented behavior: if the item is not found, npos is returned. Don't cast and compare with random stuff (and by random, I mean whatever it is that you think belongs there), use the documented results.

Hope this helps.
Mar 9, 2011 at 1:13am
seems you don't like the one-liner... :-(

The reason for the crash i want avoid is absolutly clear:

the not-found constant npos is defined with -1, hex 0xFFFFFFFF

an empty string has size 0, my formula subtracts 1, that gives also -1

the compare is true and its calls erase(-1) and crashes

a cast to uchar gives 0x000000FF, ushort 0x0000FFFF

this compares false ==>no call ==>no crash


sorry, i thought this was clear... maybe i'm too short sometimes ;-)
Last edited on Mar 9, 2011 at 1:22am
Mar 9, 2011 at 1:19am
...documented behavior


btw... the kick is the undocumented behavior...

think on the great Andrew Schulman "Undocumented Windows"! ;-)
Mar 9, 2011 at 12:03pm
No, your formula is wrong.

Also, you have no guarantee that npos is -1. Don't assume it is.

It is always better to code it correctly. If you want one-liners, use the code I first gave you:
http://www.cplusplus.com/forum/general/38086/#msg205010

Good luck!
Mar 9, 2011 at 12:35pm
closed account (z05DSL3A)
dutChBZ,
npos maybe defined as static const size_t npos = -1; but semantically it is greatest possible value for an element of type size_t. If the type of size_t is not as expected or changed for some reason in any particular implementation of STL, your 'magic number' code may well break.
Mar 9, 2011 at 1:39pm
Thanxs, but rather vice versa.

I would like to see npos as a 'magic number' i must not care of. But with the code:

1
2
while (s.rfind('-') == s.size() - 1)
    s.erase(s.size()-1, 1);


i must have learnt the hard way that this isn't the case.


If you want one-liners, use the code I first gave you


Thats exactly what i have done. And w/o any npos :-)
Mar 9, 2011 at 1:45pm
Yes, i like one-liner. I like highly-optimized, cool looking code. Thats my ASM-origin.

Mea culpa.
Mar 9, 2011 at 2:25pm
"highly optimized one-liner" is not a tautology.

If you want fast, you'll want an algorithm that performs the fewest memory allocations and memory accesses possible.

For that, you probably want something like this:

1
2
3
4
5
6
7
8
9
10
std::string remove_dashes( const std::string& src )
{
    std::string::size_type first_non_dash = src.find_first_not_of( '-' );
    if( first_non_dash == std::string::npos ) return src;

    std::string::size_type last_non_dash = src.find_last_not_of( '-' );
    if( last_non_dash == first_non_dash ) return std::string();

    return std::string( src, first_non_dash, last_non_dash - first_non_dash );
}

Mar 9, 2011 at 8:48pm
@jsmith
I think we're wasting our time here.
Pages: 12