A question about libraries,the #include mechanism, and the design of C++?

Jun 27, 2010 at 9:12pm
I warn in advance some of you may find this question stupid or over the top. Nonetheless I am trying to make sense of the world I live in thus I still ask.

Consider the simple hello world program:

#include <iostream>
using namespace std;
int main()
{
cout<<"Hello World!\n";
return 0;
};

It's a very simple program. So simple in fact that it only uses the cout object which is an instance of the ostream class. However the "#include <iostream>" preprocessor directive includes the contents of the whole iostream file. The iostream file for example contains other ostream objects like cerr and clog. Also it contains the istream class and objects of it such as cin. HOWEVER ALL OF THIS EXTRA STUFF IS NOT NEEDED FOR THIS SPECIFIC PROGRAM AND YET THE INCLUDE DIRECTIVE STILL INCLUDES THE WHOLE IOSTREAM FILE.

Doesn't this ultimately increase the size of the executable?

Yes yes I know:
-Premature optimization and overoptimization are the root of all evil.
-That life is short and that memory is cheap.
-That the size overhead really isn't that big.
Nonetheless is there a way to include what I need and only what I need short of opening up the iostream file copy paste copy paste nightmare. Why can't C/C++ do something like "from math import sin" like Python does or "import java.util.Scanner" like Java does. Why does C/C++ use such a primitive unaware and nonmodular library system? I mean I may not understand the design ideas Ritchie/Stroustrup had in mind and am most likely missing something.

I just find it strange that a low level programming language has such a set of imprecise library mechanisms. Isn't the point of a low level language like C++ precision and control?

Now I've asked this question in other places and got responses of varying quality but one of the good ones stated:

"It would be possible to break this include file down into smaller #include files, but now you've complicated the using programmer's job. Which #include file should he include? In what order? And you've complicated the designing programmer's job. How small should he go? How many #include files should there be?"

Maybe I'm missing something but I think the answer to the question this individual poses is that you should be able to #include the general library file like we have now but be able to also #include smaller files which are #included in the general to achieve more control over what you import and that this should be broken down to each individual function or class in the general #include file. This at least offers the user of the library convenience though as was pointed out can be somewhat harder on the programmer. However isn't the point of programming to have programs built from independent building blocks.

Why does C++ work the way it does? Every language is designed the way it is for a reason and I want to know why C/C++ work this way. After all Ritchie/Stroustrup had decades to either fix the problem if it is one yet it wasn't done. Why? What the bloody hell is going on here?
Jun 27, 2010 at 10:03pm
#include and executable size are not in any way related.

It is really a problem of the compiler generating enough symbol information in the
object files so as to allow the linker to optimize out unneeded symbols. But as it
stands today, the linker does not know what symbols are needed (directly or
indirectly) and thus the entire library has to be linked.
Jun 27, 2010 at 10:06pm
It is up to the compiler to decide what to compile and what to leave out. Modern compilers are enough smart to know that if you don't use some feature, then they won't litter the executable with unnecessary code.

I complied the hello world example. VC++ 2010 (with default settings) made an 8192 byte executable, which is in my opinion is exceptionally small. I looked into the binary code, and it only compiled std::cout and std::ostream, I didn't found neither std::cin or std::istream.1

1: VC++2010 generates executable with readable parts, containing the name of the compiled functions and classes
Jun 27, 2010 at 10:18pm
When you import in a scripting language that's what happens.
You import parse and compile that module code.

C++ and C are compiled languages.
which means this library code is linked in, not imported into your object directly.

A static linked library will be bundled with your executable,
but ...
Most libraries in this day and age are shared objects.
which means your code and any other running processes share
the library. The library code is stashed in virtual memory somewhere
used by anything on the system.
when your program runs clever magic goes on to link the bits you need
into your executable.

Jun 27, 2010 at 10:21pm
This approach works because only those functions that actually are used are linked/imported (or created, in the case of template functions).
A header file generally just consists of forward declarations that announce to the compiler that the function or object exists in some module. However, if that function/object is never referenced, the linker won't import it or even bother checking whether it actually exists somewhere.
Jun 28, 2010 at 12:44am
mhm interesting i think i get the gist of what you guys are saying
however can someone explain why it is this way and not the way i talked about as a possibility
im trying to understand what it is that i am missing
and in the unlikely even it turns out i am on to something and that c++ can be improved in this respect let me know
Jun 28, 2010 at 8:57am
In my opinion that wouldn't be an improvement but a very radical change in the language. It would most likely break any existing code, and I don't think it would turn out to be anything more powerful then what we have now with simple #includes.
Last edited on Jun 28, 2010 at 8:57am
Jun 28, 2010 at 12:54pm
Let's say a have a library library.cpp/library.h that has two functions:

1
2
3
4
5
6
7
8
void bar() {
    printf( "Calling foo\n" );
    foo();
}

void foo() {
    printf( "foo called!\n" );
}


Let's say I write my main:

1
2
3
4
5
#include "library.h"

int main() {
    bar();
};


When I compile main.cpp, obviously the compiler knows that I need the symbol "bar" from the library.
But what the compiler doesn't know is that in order to use bar, I also indirectly need foo() from the
library and also printf() from the C library.

How can the compiler know this? Well, I suppose it could build a symbol dependency graph and embed
that in all object files and link it to the library. But:

1) You've increased the size of all compiled files, including the libraries;
2) You've significantly increased compilation time as a result of having to build dependency graphs.
Jun 28, 2010 at 4:14pm
but if java and python can do it why not c++?
Jun 28, 2010 at 4:58pm
I may be wrong but isn't there a very clear distinction between code generation and linking. I feel you're not making a distinction between the two. #include doesn't really have that much to do with libraries(.lib/dll files)
Last edited on Jun 28, 2010 at 5:08pm
Jun 28, 2010 at 5:50pm
but if java and python can do it why not c++?

Exactly what is "it"?
A static library consists of several object files linked together. When you use a function/object from the library, the object file containing that symbol (and all dependencies from the same library) is added to your own program, but not the rest of the library.
When you link against a shared library, only the symbols you use are imported (no code from the library is added to your own executable, obviously).
Last edited on Jun 28, 2010 at 5:51pm
Topic archived. No new replies allowed.