removing leading+trailing dashes from strings

Pages: 12
Hello everyone in this great forum, which i finally joined!

I parse wordlists:

1
2
3
4
5
6
7
8
9
10
11
ifstream StmFI("wordlist");
string s;

while (StmFI >> s) { // e.g. "-string-with-dashes--"
    while (s.find('-') == 0)
        s.erase(0, 1);

    while (s.rfind('-') == s.size() - 1)
        s.erase(s.size()-1, 1);
    // ...
}

if string is "-" ==>boom!

s.size() - 1 == 0 - 1 == -1 and also string::npos == -1
==>s.erase(-1, 1) !!!

circumstantially solution:

1
2
    while (s.size() && s.rfind('-') == s.size() - 1)
        s.erase(s.size()-1, 1);


Has anyone a more elegant solution?
Last edited on
If the string is "-" wouldn't the first while loop take care of everything?
@firedraco: But then the second while loop would run, breaking as TC has stated.
Ah...In that case, I don't see anything better than what you have...although testing for s.empty() is probably more easily read.
Thank you, guys! (3800 posts are a number!)

a solution to kick one loop for the price of an extra int:

1
2
while (! s.empty() && (! (i = s.find('-')) || (i = s.rfind('-')) == s.size() - 1))
    s.erase(i, 1);


I parse the whole alphabet with a call to find_first_not_of() and this little char wants two loops alone... ;-)
Tbh, I would probably prefer the original, it was a lot easier to figure out what you were doing.
Damn, it's that simple:

1
2
3
4
5
while (s.find('-') == 0)
    s.erase(0, 1);

while (s.rfind('-') == static_cast<unsigned char>( s.size() - 1 ))
    s.erase(s.size()-1, 1);


or as one-liner:

1
2
while (! (i = s.find('-')) || (i = s.rfind('-')) == (UCHAR) (s.size() - 1)) // typedef'd
    s.erase(i, 1);


...same story like EOF!
Last edited on
If all you want to do is remove leading and trailing characters, then you only need to use the standard string algorithms:

 
#include <string> 
1
2
s.erase( s.find_last_not_of( '-' ) +1 );  // trim right
s.erase( 0, s.find_first_not_of( '-' ) );  // trim left 

To remove all instances of a character, you'll need one of the general algorithms:

 
#include <algorithm> 
 
s.erase( std::remove( s.begin(), s.end(), '-' ), s.end() );

Hope this helps.
...i have fun with my wordlists... millions of words... :-)

==>but better cast to ushort:

1
2
while (! (i = s.find('-')) || (i = s.rfind('-')) == (USHORT) (s.size() - 1))
    s.erase(i, 1);


@Duoas: good idea, i will check this...
Last edited on
Er, why are you casting to random Windows #defined type aliases? Use the proper thing:

if (i = s.rfind('-')) != std::string::npos)
...you're right, STL at it's best:

1
2
3
4
5
6
7
8
9
10
const char *pcc_Char2Cut = "!\"#'()*+,-./:;<>?@[]_`{|}~"; // lead + tail
ifstream StmFI;

void S::F_Parse()
{
    string s;
    while (StmFI >> s) {
        s.erase(s.find_last_not_of(pcc_Char2Cut) + 1).erase(0, s.find_first_not_of(pcc_Char2Cut));
     // ...
}


a one-liner, and w/o any loop :-)


Remember:
  erase(0, 0) is okay (does nothing)
  erase(size(), npos) is okay (dito)

  but erase(-1) crashes!


one can think this gives:
~find_last_not_of()
~erase()
~find_first_not_of()
~erase()

but apparently the compiler is free to code:
~find_first_not_of()
push result
~find_last_not_of()
~erase()
~erase()


Remember:
  erase last before first

  else crash!



@Duoas: Er, why are you casting to random Windows #defined type aliases?

To avoid a crash. And uchar gives strlen 255, but ushort 65K.

@Duoas: if (i = s.rfind('-')) != std::string::npos)

want only delete lead/trail chars


Thanx again for your inspiration!
Last edited on
To avoid a crash. And uchar gives strlen 255, but ushort 65K.
If you are crashing, it is because you are doing something the wrong way. Applying casts to unrelated types doesn't solve the problem, it only hides it... Which is exactly why I referred you to the documented behavior: if the item is not found, npos is returned. Don't cast and compare with random stuff (and by random, I mean whatever it is that you think belongs there), use the documented results.

Hope this helps.
seems you don't like the one-liner... :-(

The reason for the crash i want avoid is absolutly clear:

the not-found constant npos is defined with -1, hex 0xFFFFFFFF

an empty string has size 0, my formula subtracts 1, that gives also -1

the compare is true and its calls erase(-1) and crashes

a cast to uchar gives 0x000000FF, ushort 0x0000FFFF

this compares false ==>no call ==>no crash


sorry, i thought this was clear... maybe i'm too short sometimes ;-)
Last edited on
...documented behavior


btw... the kick is the undocumented behavior...

think on the great Andrew Schulman "Undocumented Windows"! ;-)
No, your formula is wrong.

Also, you have no guarantee that npos is -1. Don't assume it is.

It is always better to code it correctly. If you want one-liners, use the code I first gave you:
http://www.cplusplus.com/forum/general/38086/#msg205010

Good luck!
closed account (z05DSL3A)
dutChBZ,
npos maybe defined as static const size_t npos = -1; but semantically it is greatest possible value for an element of type size_t. If the type of size_t is not as expected or changed for some reason in any particular implementation of STL, your 'magic number' code may well break.
Thanxs, but rather vice versa.

I would like to see npos as a 'magic number' i must not care of. But with the code:

1
2
while (s.rfind('-') == s.size() - 1)
    s.erase(s.size()-1, 1);


i must have learnt the hard way that this isn't the case.


If you want one-liners, use the code I first gave you


Thats exactly what i have done. And w/o any npos :-)
Yes, i like one-liner. I like highly-optimized, cool looking code. Thats my ASM-origin.

Mea culpa.
"highly optimized one-liner" is not a tautology.

If you want fast, you'll want an algorithm that performs the fewest memory allocations and memory accesses possible.

For that, you probably want something like this:

1
2
3
4
5
6
7
8
9
10
std::string remove_dashes( const std::string& src )
{
    std::string::size_type first_non_dash = src.find_first_not_of( '-' );
    if( first_non_dash == std::string::npos ) return src;

    std::string::size_type last_non_dash = src.find_last_not_of( '-' );
    if( last_non_dash == first_non_dash ) return std::string();

    return std::string( src, first_non_dash, last_non_dash - first_non_dash );
}

@jsmith
I think we're wasting our time here.
Pages: 12