Undefined behaviour?

If I run this code in clang and input two character strings which contain a number followed by a generic character (e.g. "2a") if the char is in the set {a b c d e f i n p x} it returns zero, but if the char is in the set {g h j k l m o q r s t u v w y z} or is punctuation it returns 2. If I run the same code at http://cpp.sh, it returns 2 for all characters.

1
2
3
4
5
6
7
8
int main() {

    double d;

    cout << "Input a double followed by a char (eg. '2h'): ";
    cin >> d;
    cout << "The double was " << d << "\n";
} 


LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Input a double followed by a char (eg. '2h'): 2a
The double was 0


http://cpp.sh (GCC 4.9.2)
Input a double followed by a char (eg. '2h'): 2a
The double was 2

I would expect the value to be 0 if 2e was input, as the e would indicate an exponent would follow, and since an exponent couldn't be extracted the variable should be set to 0. (And, despite your claim, cpp.sh gives 0 for 2e.) This is the behavior I get with VC++ , ideone and clang at rextester.com as well.

http://rextester.com/YBKOC76769
http://ideone.com/F7Cg5K

Which makes me wonder if there isn't something wrong with your test code.
Last edited on
@cire

Thanks for the correction. I was mistaken cpp.sh does give 0 for 2e. But I compiled your test program and it gives:

For "2a" val = 0, buffer = 
For "2b" val = 0, buffer = 
For "2c" val = 0, buffer = 
For "2d" val = 0, buffer = 
For "2e" val = 0, buffer = 
For "2f" val = 0, buffer = 
For "2g" val = 2, buffer = g
For "2h" val = 2, buffer = h
For "2i" val = 0, buffer = 
For "2j" val = 2, buffer = j
For "2k" val = 2, buffer = k
For "2l" val = 2, buffer = l
For "2m" val = 2, buffer = m
For "2n" val = 0, buffer = 
For "2o" val = 2, buffer = o
For "2p" val = 0, buffer = 
For "2q" val = 2, buffer = q
For "2r" val = 2, buffer = r
For "2s" val = 2, buffer = s
For "2t" val = 2, buffer = t
For "2u" val = 2, buffer = u
For "2v" val = 2, buffer = v
For "2w" val = 2, buffer = w
For "2x" val = 0, buffer = 
For "2y" val = 2, buffer = y

If you or someone else would like to try to replicate this I'm using the g++ command in a mac terminal to compile the program.
g++ --version
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.3.0
Thread model: posix
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#include <iostream>
#include <sstream>
#include <string>
#include <iomanip>

int main()
{
    for( std::string str : { "2a", "2e", "0x2a", "0x2ab", "0x2ab.cp2", "abcd", "NaN", "-INF" } )
    {
        std::cout << std::setw(12) << std::quoted(str) << ' ' ;
        std::istringstream stm(str) ;
        double value ;
        if( stm >> value ) std::cout << value << '\n' ;
        else std::cout << "*** error ***\n" ;
    }
}


LLVM libc++:
        "2a" *** error ***
        "2e" *** error ***
      "0x2a" 42
     "0x2ab" 683
 "0x2ab.cp2" 2735
      "abcd" *** error ***
       "NaN" nan
      "-INF" -inf


GNU libstdc++:
        "2a" 2
        "2e" *** error ***
      "0x2a" 0
     "0x2ab" 0
 "0x2ab.cp2" 0
      "abcd" *** error ***
       "NaN" *** error ***
      "-INF" *** error ***

http://coliru.stacked-crooked.com/a/20fde32093c1d397
Microsoft:
        "2a" 2
        "2e" *** error ***
      "0x2a" 0
     "0x2ab" 0
 "0x2ab.cp2" 0
      "abcd" *** error ***
       "NaN" *** error ***
      "-INF" *** error ***

http://rextester.com/YQBTL52681

It appears that libc++ is trying to parse "2a" as a binary floating point literal and then reports an error because of the missing 0x prefix.
Last edited on
I was able to replicate the behaviour using wandbox.
http://melpon.org/wandbox/
@JLBorges

Thanks for the explanation. So does that mean it's undefined behaviour to extract doubles from strings in c++?
> So does that mean it's undefined behaviour to extract doubles from strings in c++?

It is not undefined behaviour. But it is inconsistent behaviour.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
#include <sstream>
#include <string>
#include <iomanip>
#include <cstdlib>
#include <cstdio>

int main()
{
    for( std::string str : { "2a", "2e", "0x2a", "0x2ab", "0x2ab.cp2", "abcd", "NaN", "-INF" } )
    {
        std::cout << std::setw(12) << std::quoted(str) << "  stream: " ;

        std::istringstream stm(str) ;
        double value ;
        if( stm >> value ) std::cout << std::setw(5) << value  ;
        else std::cout << "*err*" ;

        char* end = nullptr ;
        errno = 0 ;
        std::cout << "  std::strtod: " ;
        const double value2 = std::strtod( str.c_str(), std::addressof(end) ) ;
        if( errno == 0 && end != str.c_str() )
            std::cout << std::setw(5) << value2  ;
        else std::cout << "*err*" ;

        double value3 ;
        std::cout << "  std::sscanf: " ;
        if( std::sscanf( str.c_str(), "%lg", std::addressof(value3) ) == 1 )
            std::cout << std::setw(5) << value3 << "\n\n" ;
        else std::cout << "*err*\n\n" ;
    }
}


AFAIK, none of the mainstream library implementations have got everything right,
as specified by the IS: http://en.cppreference.com/w/cpp/locale/num_get/get
(I do not know if there have been defect reports; so I've asked Cubbi about this; he hasn't replied as yet.)

------- libc++ -------

        "2a"  stream: *err*  std::strtod:     2  std::sscanf:     2

        "2e"  stream: *err*  std::strtod:     2  std::sscanf:     2

      "0x2a"  stream:    42  std::strtod:    42  std::sscanf:    42

     "0x2ab"  stream:   683  std::strtod:   683  std::sscanf:   683

 "0x2ab.cp2"  stream:  2735  std::strtod:  2735  std::sscanf:  2735

      "abcd"  stream: *err*  std::strtod: *err*  std::sscanf: *err*

       "NaN"  stream:   nan  std::strtod:   nan  std::sscanf:   nan

      "-INF"  stream:  -inf  std::strtod:  -inf  std::sscanf:  -inf

----- libstdc++ ------

        "2a"  stream:     2  std::strtod:     2  std::sscanf:     2

        "2e"  stream: *err*  std::strtod:     2  std::sscanf:     2

      "0x2a"  stream:     0  std::strtod:    42  std::sscanf:    42

     "0x2ab"  stream:     0  std::strtod:   683  std::sscanf:   683

 "0x2ab.cp2"  stream:     0  std::strtod:  2735  std::sscanf:  2735

      "abcd"  stream: *err*  std::strtod: *err*  std::sscanf: *err*

       "NaN"  stream: *err*  std::strtod:   nan  std::sscanf:   nan

      "-INF"  stream: *err*  std::strtod:  -inf  std::sscanf:  -inf

http://coliru.stacked-crooked.com/a/91a1207e0ef96e5d
This has to do with the defect in the standard where the set of characters filtered out by num_get::do_get stage 2 does not match the characters that stage 3's as-if-strtod would accept/reject.

The character after a double issue was reported as a bug against libc++ and its maintainers defend themselves in this comment: https://llvm.org/bugs/show_bug.cgi?id=17782#c6

The NaN/INF issue is an open LWG issue http://cplusplus.github.io/LWG/lwg-active.html#2381 and libc++ implemented a fix for it by explicitly allowing 'n' and 'i' in stage 2.

Last edited on
Topic archived. No new replies allowed.