MinGW custom codecvt facet VTABLE error

Hey gurus,

I've finally found the little beast I've been looking for to play with UTF conversions: the std::codecvt facets.

I've written a custom template class in an .hpp file (that derives from std::codecvt, of course) and everything seems to compile smoothly but then I get these errors from the linker:
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x10): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_out(duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t>&, char const*, char const*, char const*&, wchar_t*, wchar_t*, wchar_t*&) const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x14): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_unshift(duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t>&, wchar_t*, wchar_t*, wchar_t*&) const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x18): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_in(duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t>&, wchar_t const*, wchar_t const*, wchar_t const*&, char*, char*, char*&) const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x1c): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_encoding() const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x20): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_always_noconv() const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x24): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_length(duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t>&, wchar_t const*, wchar_t const*, unsigned int) const'
C:\DOCUME~1\Michael\LOCALS~1\Temp/ccLsuc2w.o:a.cpp:(.rdata$_ZTVSt7codecvtIcwN9duthomhas3utf8internal20utf8_codecvt_state_tIwEEE[vtable for std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >]+0x28): undefined reference to `std::codecvt<char, wchar_t, duthomhas::utf::internal::utf8_codecvt_state_t<wchar_t> >::do_max_length() const'
collect2: ld returned 1 exit status

This is infuriating me because I have defined all these objects, exactly as specified, but the linker doesn't seem to like it.

For example, here's my do_length() method:
1
2
3
4
5
6
7
8
9
10
        virtual int do_length(
                state_type&  state,
          const extern_type* from_begin,
          const extern_type* from_end,
                size_t       max
                )            const
          {
          size_t dist = from_end - from_begin;
          return (int)((max < dist) ? max : dist);
          }

Does anyone know what is going on? How do I get the linker to play nice with the vtable?

Thank you for your time.
Have you tried a different compiler? It could be a bug. Would it be possible for you to post the entire code? I'd like to try compiling it myself.
The only time I've gotten those kinds of errors are when I have circular inclusion somewhere, but I doubt you'd have those and not notice them.
It appears to be a compiler problem. Borland's bcc32.exe had no problem with it.

I found something online that says GCC has problems with pure virtual functions that are only defined in header files, and I tried putting them in a source file, but as the whole class is a template class, I don't know better what to do. GCC still complains. (Hence this post.)

And no, there are no circular dependencies... :-)

I'll post the whole thing once I'm done, and helios can tell me how to make the GCC behave.
Silly question... and it may not be the solution to your problem. But is the codecvt a C or C++ compiled library?

If C, perhaps try to wrap the header usage and any function forward declarations for that library with:
1
2
3
4
5
6
7
8
9
 #ifdef __cplusplus
 extern "C" {
 #endif 

   // Library include and forward declarations here
 
 #ifdef __cplusplus
 }
 #endif  
The more I play with it, the more I think it unlikely that I'll ever make it work.

Borland's C++ doesn't seem to use the codecvt class. So, while it compiles cleanly, it doesn't do anything (not even throw errors).

I can get GCC to compile now, thanks to
Reading UTF-8 with C++ streams
http://www.codeproject.com/KB/stl/utf8facet.aspx
but it seems like so much ad-hockery that I'm thoroughly disgusted. I can't make my type a template on the internal type (which may or may not be wchar_t) without all the obnoxious VTABLE crap, and even when I get it to compile when I finally say:
 
  outf << s << endl;

the program crashes with a bad_cast exception. What!? Is it because I used my own type instead of std::mbstate_t? (None of my custom codecvt class's methods are getting called, so it is happening before.)

Maybe I'll just derive my own fstream class and make it do UTF-8/CESU-8/UTF-16[BE/LE]/UTF-32 conversions without all the silly nonsense.

AAARRRGGHHH!
Hence the downfall of template programming.
I guess I'll complicate my code and use the std::mbstate_t and see if I can't get it to work... Oh wait, emilio already did that! Might as well use his code. Even though I can't stand it.

What a bunch of cruft.
Is whatever advantage this will bring really worth all this trouble?
A guilt-free, inline, automatic UTF filter? Of course. :-)

I just found a wonderful site:
http://www.unc.edu/depts/case/pgi/pgC++_lib/stdlibug/def_8655.htm

If the nice explanation there works, I'll post my results so you can all use it.
Remember when I said "be clever with your algorithms, not with your syntax" (or something to that effect)? Wouldn't all this effort be better spent on a nice, simple, and robust conversion function?
I already have those.

The problem is that I have to know something about the stream to read it. That is, once I open it, I have to check what kind of stream it is, then everywhere in my code that I read data from said stream, I must select the appropriate transformer and use it. That's a lot of work just to read from any UTF stream.

I'd rather simply open the stream and begin reading.
Or open the stream, specify an encoding, and begin writing.

The C++ standard provides the codecvt facet stuff for that very purpose: transparent I/O transformation. Its structure localizes all the messy details into one spot, so you don't have to deal with it elsewhere. That's a clever algorithm.

I'm trying to learn how to use it.
Alright then.

EDIT: Oh, I just remembered. What MinGW did you try compiling with? 3.x has given me quite a few headaches, that's why I'm asking. You should try 4.x if that's the case.
Last edited on
Thanks. I've been using 4.3.0 all this time, alas.
Well, this is what I have learned.

The codecvt facet can be modified, and I figured out how to do it in GCC... It required me to first derive from the GXXLIB base class __codecvt_abstract_base, after which I can provide my own derived extension! Like so:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#ifdef __GLIBCXX__
namespace std
  {
  using duthomhas::utf::utf_state_t;

  template <typename CharType>
  class codecvt <CharType, char, utf_state_t> :
    public __codecvt_abstract_base <CharType, char, utf_state_t>
    {
    public:
      static locale::id id;

    protected:
      explict codecvt( size_t refs = 0 ):
        __codecvt_abstract_base <CharType, char, utf_state_t> ( refs )
        { }
    };
  }
#endif 
I didn't do that initially, hence the compiler was giving me the vtable errors -- meaning that essentially there was a disconnect between a (missing) base class and my derived class over the templated state type utf_state_t.


Also, I have learned that all this stuff is left to be implementation defined by the C++ standard -- meaning that I can make it work in some specific compiler/libc++ combination, but that it is not possible to do it portably.

Which leaves me where I started: the STL is broken. It provides no internationalization support other than that provided by incompatible (vendor-specific) extensions, and it does not let me provide my own extension (using the proper thing) in a portable way.


At this point, I'm not sure what I will do. Perhaps I will write an iostream wrapper that will do it properly.

Fooey.
Topic archived. No new replies allowed.