removing leading+trailing dashes from st

Forum

Forum
General C++ Programming
removing leading+trailing dashes from st

removing leading+trailing dashes from strings

Pages: 12

Mar 6, 2011 at 9:49pm

Hello everyone in this great forum, which i finally joined!

I parse wordlists:

ifstream StmFI("wordlist");
string s;

while (StmFI >> s) { // e.g. "-string-with-dashes--"
    while (s.find('-') == 0)
        s.erase(0, 1);

    while (s.rfind('-') == s.size() - 1)
        s.erase(s.size()-1, 1);
    // ...
}

if string is "-" ==>boom!

s.size() - 1 == 0 - 1 == -1 and also string::npos == -1
==>s.erase(-1, 1) !!!

circumstantially solution:

1
2

    while (s.size() && s.rfind('-') == s.size() - 1)
        s.erase(s.size()-1, 1);

Has anyone a more elegant solution?

Last edited on Mar 6, 2011 at 9:55pm

Mar 6, 2011 at 10:16pm

firedraco (6245)

If the string is "-" wouldn't the first while loop take care of everything?

Mar 6, 2011 at 11:31pm

Zhuge (4664)

@firedraco: But then the second while loop would run, breaking as TC has stated.

Mar 6, 2011 at 11:35pm

firedraco (6245)

Ah...In that case, I don't see anything better than what you have...although testing for s.empty() is probably more easily read.

Mar 7, 2011 at 1:04am

dutChBZ (21)

Thank you, guys! (3800 posts are a number!)

a solution to kick one loop for the price of an extra int:

1
2

while (! s.empty() && (! (i = s.find('-')) || (i = s.rfind('-')) == s.size() - 1))
    s.erase(i, 1);

I parse the whole alphabet with a call to find_first_not_of() and this little char wants two loops alone... ;-)

Mar 7, 2011 at 1:05am

firedraco (6245)

Tbh, I would probably prefer the original, it was a lot easier to figure out what you were doing.

Mar 7, 2011 at 10:09pm

dutChBZ (21)

Damn, it's that simple:

while (s.find('-') == 0)
    s.erase(0, 1);

while (s.rfind('-') == static_cast<unsigned char>( s.size() - 1 ))
    s.erase(s.size()-1, 1);

or as one-liner:

1
2

while (! (i = s.find('-')) || (i = s.rfind('-')) == (UCHAR) (s.size() - 1)) // typedef'd
    s.erase(i, 1);

...same story like EOF!

Last edited on Mar 7, 2011 at 10:10pm

Mar 7, 2011 at 10:33pm

Duthomhas (13277)

If all you want to do is remove leading and trailing characters, then you only need to use the standard string algorithms:

#include <string>

1
2

s.erase( s.find_last_not_of( '-' ) +1 );  // trim right
s.erase( 0, s.find_first_not_of( '-' ) );  // trim left

To remove all instances of a character, you'll need one of the general algorithms:

#include <algorithm>

s.erase( std::remove( s.begin(), s.end(), '-' ), s.end() );

Hope this helps.

Mar 8, 2011 at 12:31am

dutChBZ (21)

...i have fun with my wordlists... millions of words... :-)

==>but better cast to ushort:

1
2

while (! (i = s.find('-')) || (i = s.rfind('-')) == (USHORT) (s.size() - 1))
    s.erase(i, 1);

@Duoas: good idea, i will check this...

Last edited on Mar 8, 2011 at 12:48am

Mar 8, 2011 at 3:13am

Duthomhas (13277)

Er, why are you casting to random Windows #defined type aliases? Use the proper thing:

if (i = s.rfind('-')) != std::string::npos)

Mar 8, 2011 at 8:55pm

dutChBZ (21)

...you're right, STL at it's best:

const char *pcc_Char2Cut = "!\"#'()*+,-./:;<>?@[]_`{|}~"; // lead + tail
ifstream StmFI;

void S::F_Parse()
{
    string s;
    while (StmFI >> s) {
        s.erase(s.find_last_not_of(pcc_Char2Cut) + 1).erase(0, s.find_first_not_of(pcc_Char2Cut));
     // ...
}

a one-liner, and w/o any loop :-)


Remember:
  erase(0, 0) is okay (does nothing)
  erase(size(), npos) is okay (dito)

  but erase(-1) crashes!

one can think this gives:
~find_last_not_of()
~erase()
~find_first_not_of()
~erase()

but apparently the compiler is free to code:
~find_first_not_of()
push result
~find_last_not_of()
~erase()
~erase()


Remember:
  erase last before first

  else crash!

@Duoas: Er, why are you casting to random Windows #defined type aliases?

To avoid a crash. And uchar gives strlen 255, but ushort 65K.

@Duoas: if (i = s.rfind('-')) != std::string::npos)

want only delete lead/trail chars

Thanx again for your inspiration!

Last edited on Mar 9, 2011 at 12:20am

Mar 9, 2011 at 12:31am

Duthomhas (13277)

To avoid a crash. And uchar gives strlen 255, but ushort 65K.

If you are crashing, it is because you are doing something the wrong way. Applying casts to unrelated types doesn't solve the problem, it only hides it... Which is exactly why I referred you to the documented behavior: if the item is not found, npos is returned. Don't cast and compare with random stuff (and by random, I mean whatever it is that you think belongs there), use the documented results.

Hope this helps.

Mar 9, 2011 at 1:13am

dutChBZ (21)

seems you don't like the one-liner... :-(

The reason for the crash i want avoid is absolutly clear:

the not-found constant npos is defined with -1, hex 0xFFFFFFFF

an empty string has size 0, my formula subtracts 1, that gives also -1

the compare is true and its calls erase(-1) and crashes

a cast to uchar gives 0x000000FF, ushort 0x0000FFFF

this compares false ==>no call ==>no crash

sorry, i thought this was clear... maybe i'm too short sometimes ;-)

Last edited on Mar 9, 2011 at 1:22am

Mar 9, 2011 at 1:19am

dutChBZ (21)

...documented behavior

btw... the kick is the undocumented behavior...

think on the great Andrew Schulman "Undocumented Windows"! ;-)

Mar 9, 2011 at 12:03pm

Duthomhas (13277)

No, your formula is wrong.

Also, you have no guarantee that npos is -1. Don't assume it is.

It is always better to code it correctly. If you want one-liners, use the code I first gave you:
http://www.cplusplus.com/forum/general/38086/#msg205010

Good luck!

Mar 9, 2011 at 12:35pm

closed account (z05DSL3A)

dutChBZ,
npos maybe defined as static const size_t npos = -1; but semantically it is greatest possible value for an element of type size_t. If the type of size_t is not as expected or changed for some reason in any particular implementation of STL, your 'magic number' code may well break.

Mar 9, 2011 at 1:39pm

dutChBZ (21)

Thanxs, but rather vice versa.

I would like to see npos as a 'magic number' i must not care of. But with the code:

1
2

while (s.rfind('-') == s.size() - 1)
    s.erase(s.size()-1, 1);

i must have learnt the hard way that this isn't the case.

If you want one-liners, use the code I first gave you

Thats exactly what i have done. And w/o any npos :-)

Mar 9, 2011 at 1:45pm

dutChBZ (21)

Yes, i like one-liner. I like highly-optimized, cool looking code. Thats my ASM-origin.

Mea culpa.

Mar 9, 2011 at 2:25pm

jsmith (5804)

"highly optimized one-liner" is not a tautology.

If you want fast, you'll want an algorithm that performs the fewest memory allocations and memory accesses possible.

For that, you probably want something like this:

std::string remove_dashes( const std::string& src )
{
    std::string::size_type first_non_dash = src.find_first_not_of( '-' );
    if( first_non_dash == std::string::npos ) return src;

    std::string::size_type last_non_dash = src.find_last_not_of( '-' );
    if( last_non_dash == first_non_dash ) return std::string();

    return std::string( src, first_non_dash, last_non_dash - first_non_dash );
}

Mar 9, 2011 at 8:48pm

Duthomhas (13277)

@jsmith
I think we're wasting our time here.

Pages: 12

C++

Forum

removing leading+trailing dashes from strings