Character array to wide string?

Forum

Forum
General C++ Programming
Character array to wide string?

Character array to wide string?

Nov 19, 2013 at 9:19pm

I am still working on my project which will be reading some old data from some old DOS files. The data stored there is naturally, char*. Once I read in my character array, how do I assign this to a wstring since my application is UNICODE?

Here is my current solution:

wchar_t* Class::Function(char *pName)
{
  //I verify the pointer and such first, then do the below
  this->_Name.assign(pName, (pName + strlen(pName));
  return this->_Name.c_str();
}

Am I on the right track here?

Nov 19, 2013 at 10:09pm

Disch (13742)

The root questions are:

1) How is the source data encoded (UTF-8 seems unlikely if this is old DOS files... but I guess it depends on HOW old).
2) Do you care about preserving anything beyond the ASCII set?
3) How is the dest data to be encoded (I would assume UTF-16?). Given that you said "UNICODE" I'm assuming this is on Windows and you want UTF-16.

WinAPI provides a method (MultiByteToWideChar) which can convert pretty much any codepage, as well as UTF-8, to UTF-16.

Or... if you don't care and all you care about is the basic ASCII set... it's a straight 1:1 copy (just converting a 1 byte character to a 2+ byte character):

void copyToWide(wchar_t* dst, const char* src)
{
    while(*src)
    {
        *dst = *src;
        ++dst;
        ++src;
    }
    *dst = 0;
}

EDIT: your use of assign is actually simpler than my approach. Hah. That will work just fine if you don't care about anything beyond normal ASCII.

Last edited on Nov 19, 2013 at 10:13pm

Nov 19, 2013 at 10:33pm

Sephiroth (48)

Yes I am on Windows using wchar_t which is 16bit, but I will also be using this on Linux where wchar_t is 32bit. The good news? I am only READING the data on both systems so there will be no new writing at this point. If I decide to write data and share it, I will write it as 16bit for Windows and easily read that into the Linux equivalent.

Now it is old DOS, not UTF8. I am assuming old ASCII only, and I do not plan on converting it to any other languages at this point. That would be a project in itself, so the ASCII set is fine with me. I can jump any conversion hurdles down the road.

Finally, it looks as though my call to the "assign" method is correct? I am asking because this will be a library and as such I will not be able to test it until after the program which uses this library is also to a certain stage in development.

Nov 19, 2013 at 10:37pm

Disch (13742)

Finally, it looks as though my call to the "assign" method is correct?

Yes.

Though I would question your use of wchar_t in this library, as chars are typically much easier to work with. Case in point... you just mentioned that wchar_t is different sizes on different platforms... which makes it more difficult to write portable code.

Nov 20, 2013 at 12:05am

Sephiroth (48)

Yes, but everything is UNICODE now. Plus it will allow me to target other languages and such in the future if I need to. Just because it is easy doesn't mean it is right, after all. I honestly don't know anybody not coding UNICODE apps anymore, and I enjoy learning as I go with the new UNICODE stuff.

Nov 20, 2013 at 1:37am

Disch (13742)

UTF-8 is unicode... and can be contained in a normal char array. Or a normal std::string.

In fact.. IIRC, the API for filesystem I/O on linux platforms all accept UTF-8 strings (const char*).

Last edited on Nov 20, 2013 at 1:38am

Nov 21, 2013 at 2:24am

Sephiroth (48)

I know you can use UTF8 with a char array, but I have never come across a UTF8 file that wasn't using some other form of storage. Most of the time it is wchar_t. This particular file was out about the same time Windows 95 was released, so no worries of UTF8 there. If it was a Linux system, maybe. Linux always seems to be ahead of the game.

Nov 21, 2013 at 2:29am

Cubbi (4774)

The data stored there is naturally, char*. Once I read in my character array, how do I assign this to a wstring

Why read into a char array in the first place? Open the file as std::wifstream, and you can getline or operator>> straight into a std::wstring.

Nov 21, 2013 at 4:58am

Disch (13742)

I know you can use UTF8 with a char array, but I have never come across a UTF8 file that wasn't using some other form of storage.

Different strokes I guess. I've found that UTF-8 is more common pretty much everywhere. Especially in places that had to be 'upgraded' while still maintaining backward compatibility (switching from ANSI to UTF-8 is much easier than switching to UTF-16).

Posix API is an example .. none of it from what I've seen uses UTF-16. Any kind of binary file with embedded comments (zip, png, etc). In fact the only place I can think of where I've seen UTF-16 in widespread use is in WinAPI.

But whatever... it's your code and you can do what you want. =) Don't let me bully you.

Nov 25, 2013 at 3:06am

Sephiroth (48)

I just wanted to answer Cubbi. I have to read the file into a char array due to the file being binary. It contains 3D object data as well as ANSI names for said objects.

Topic archived. No new replies allowed.

C++

Forum

Character array to wide string?