I would like to incorporate the snowball stemming library (snowball.tartarus.org) to my code. This library is written in C, and I need to use some cast to transform the string representation used in this library to and from std::string.
My code works, but I don’t know if if used the casts correctly. Can someone say me my code is correct and efficient?
Thanks.
The library represents words as “sb_symbol”, which is a typedef for unsigned char
typedefunsignedchar sb_symbol;
To stem a word, the function “sb_stemmer_stem” is used.
1 2
const sb_symbol * sb_stemmer_stem(struct sb_stemmer * stemmer,
const sb_symbol * word, int size);
Here is the code I use to wrap this in a C++ class (see my function std::string snowball_stemmer::stem(std::string to_stem) ).
To cast a string to the sb_symbol structure, I use
(sb_symbol *) std::string.c_str()
And to cast a sb_symbol structure to a string
std::string( (constchar *) sb_symbol )
stemmer.h
1 2 3 4 5 6 7 8 9 10 11 12 13 14
class snowball_stemmer{
private :
sb_stemmer * stemmer;
public :
snowball_stemmer(char* , char* );
~snowball_stemmer();
std::string stem(std::string);
};
Thanks for your advices. It seems that because unsigned char and char are uncorrelated, I need to use reinterpret_cast.
1 2
erreur: invalid static_cast from type ‘std::basic_string<char>::size_type {aka longunsignedint}’ to type ‘const sb_symbol* {aka constunsignedchar*}’
auto stemmed = sb_stemmer_stem(stemmer, static_cast<const sb_symbol *> to_stem.c_str() , to_stem.size());