How to search unicode string in unsigned

Forum

Forum
Beginners
How to search unicode string in unsigned

How to search unicode string in unsigned char *buffer?

Hi friendz,

i am new in c++. maybe my question is childish.
unsigned char * buffer = NULL;
/*some code for assign data for buffer */
wchar_t data[] = L"cplusplus"; //Unicode

Actually i have buffer is unsigned. but i need to search unicode sting in buffer. i do not how? strncmp is not work. can u plz healp me.

Regards,
Kuluoz

Peter87 (11226)

Is this what you want?

buffer = reinterpret_cast<unsigned char*>(data);

Last edited on

Kuluoz (19)

i need like for example
unsigned char *buffer =NULL;
unsigned char data[] ="cplus"

if(0==strncmp(buffer,data,4))
printf("true");

it is ok for that. now i need to check data(unicode) exist in buffer.

Last edited on

JLBorges (13770)

#include <iostream>
#include <cstring>
#include <cstdlib>
#include <vector>

// convert narrow character string (NTMBS) to sequence of wide characters
// return result as a vector of (null terminated) wide characters
std::vector<wchar_t> to_wcs( const char* mbs ) // invariant: not nullptr
{
   std::vector<wchar_t> wstr( std::strlen(mbs)+1, 0 ) ;

   // http://en.cppreference.com/w/cpp/string/multibyte/mbstowcs
   /* const auto n = */ std::mbstowcs( std::addressof( wstr.front() ), mbs, wstr.size() ) ;
   // if( n == std::size_t(-1) ) throw std::domain_error( "badly formed multi-byte string" ) ;

   return wstr ;
}

// compare narrow character string (NTMBS) with wide character string (c-style)
// semantics are similar to that of std::strcmp
int mbs_wcs_cmp( const char* mbs, const wchar_t* wcs )
{ return std::wcscmp( std::addressof( to_wcs(mbs).front() ), wcs ) ; }

// locate the logical equivalent of wide character string (c-style)
// inside a narrow character string (NTMBS)
// semantics are similar to that of std::strstr
const char* mbs_wcs_str( const char* mbs, const wchar_t* wcs )
{
    const auto vec = to_wcs(mbs) ;
    const wchar_t* wstr = std::addressof( vec.front() ) ;
    const auto p = std::wcsstr( wstr, wcs ) ;
    return p == nullptr ? nullptr : mbs + (p-wstr) ;
}

// non-const overload for the above
char* mbs_wcs_str( char* mbs, const wchar_t* wcs )
{ return const_cast<char*>( mbs_wcs_str( const_cast< const char* >(mbs), wcs ) ) ; }

int main()
{
    const char cstr[] = "abcdefghijkl" ;
    const wchar_t wstr[] = L"abcdefghijkl" ;

    std::cout << mbs_wcs_cmp( cstr, wstr ) << '\n' ; // 0

    const wchar_t wstr2[] = L"ghijkl" ;
    std::cout << mbs_wcs_cmp( cstr, wstr2 ) << '\n' // -1 (typical)
              << mbs_wcs_str( cstr, wstr2 ) << '\n' ; // ghijkl

    const wchar_t wstr3[] = L"MNOP" ;
    std::cout << mbs_wcs_cmp( cstr, wstr3 ) << '\n' // +1 (typical)
              << (const void*) mbs_wcs_str( cstr, wstr3 ) << '\n' ; // nullptr (not found)
}

http://coliru.stacked-crooked.com/a/07ace56d5b455147

coder777 (8443)

Actually there is a reason why you have unsigned char on the one hand and wchar_t on the other. They represent different character sets and hence they are not compatible. You need to convert one into the other in order to compare them. Thus you need to identify the different character sets. Under windows that may what windows calls ANSI/UNICODE.

Under windows you can use WideCharToMultiByte(...) and/or MultiByteToWideChar(...):

https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/windows/desktop/dd319072(v=vs.85).aspx

Kuluoz (19)

Hi JLBorges,

thanks. Above code is work. But my buffer size 100. i need to check 10 unicode bytes only. now what i do?

JLBorges (13770)

Something like this, perhaps:

#include <iostream>
#include <cstdlib>
#include <algorithm>

// convert narrow character string (NTMBS, array of size SZ)
// to sequence of wide characters. return pointer to statically allocated buffer
// so the function is not thread-safe; use the result immediately (before the next call)
template < std::size_t SZ > const wchar_t* to_wcs( const char (&mbs)[SZ] )
{
    static wchar_t wstr[SZ] ;
    std::mbstowcs( wstr, mbs, SZ ) ;
    return wstr ;
}

// compare narrow character string (NTMBS) with wide character string
// of wcs_sz characters ie. wcs_sz*sizeof(wchar_t) bytes
// the wide character string need not be null terminated
// semantics are similar to that of std::strncmp
template < std::size_t SZ >
int mbs_wcs_ncmp( const char (&mbs)[SZ], const wchar_t* wcs, std::size_t wcs_sz )
{ return std::wcsncmp( to_wcs(mbs), wcs, std::min( SZ, wcs_sz ) ) ; }

// locate the logical equivalent of wide character string of wcs_sz characters inside
// a narrow character string (NTMBS). the wide character string need not be null terminated.
// other than that, the semantics are similar to that of std::strstr
template < std::size_t SZ >
const char* mbs_wcs_nstr( const char (&mbs)[SZ], const wchar_t* wcs, std::size_t wcs_sz )
{
    const wchar_t* wstr = to_wcs(mbs) ;
    const auto p = std::search( wstr, wstr+SZ , wcs, wcs + std::min(SZ,wcs_sz) ) ;
    return p == (wstr+SZ) ? nullptr : mbs + (p-wstr) ;
}

// non-const overload for the above
template < std::size_t SZ >
char* mbs_wcs_nstr( char (&mbs)[SZ], const wchar_t* wcs, std::size_t wcs_sz )
{ return (char*) mbs_wcs_nstr( (const char (&)[SZ])mbs, wcs, wcs_sz ) ; }

int main()
{
    const char cstr[] = "abcdefghijkl" ;
    const wchar_t wstr[] = L"abcdefghijkl" ;

    std::cout << mbs_wcs_ncmp( cstr, wstr, 10 ) << '\n' ; // 0

    const wchar_t wstr2[] = L"ghijkl" ;
    std::cout << mbs_wcs_ncmp( cstr, wstr2, 5 ) << '\n' // negative (typical)
              << mbs_wcs_nstr( cstr, wstr2, 4 ) << '\n' ; // ghijkl

    const wchar_t wstr3[] = L"MNOP" ;
    std::cout << mbs_wcs_ncmp( cstr, wstr3, 4 ) << '\n' // positive (typical)
              << (const void*) mbs_wcs_nstr( cstr, wstr3, 4 ) << '\n' ; // nullptr (not found)
}

http://coliru.stacked-crooked.com/a/ea599b51f7ef53da

Kuluoz (19)

Thanks. It is working good :)

Topic archived. No new replies allowed.

C++

Forum

How to search unicode string in unsigned char *buffer?