Passing arrays to and from functions safely and securely

I wrote this article for the purpose of addressing common problems that are experienced by the programmers posting questions to the forums. I'll provide numerous examples along with detailed explanations of each.

Organization
Part 1) Passing arrays to functions
Part 2) Returning arrays from functions

First let me post a link to the arrays and templates tutorials of cplusplus.com. I think that it would be good for some to review these briefly before continuing. For the templates tutorial you only need to read the first section on function templates. Consider my article an amplification to the C++ tutorials on arrays and function templates.
http://www.cplusplus.com/doc/tutorial/arrays/
http://www.cplusplus.com/doc/tutorial/templates/

Terminology
The use of the word array will immediately result in some confusion for a number of reasons. First other languages have built in smart array types which do not work the same way as C/C++ arrays. The term array is defined in numerous dictionaries and so there is a generalized concept of an array which leads to confusion when discussing specific kinds of array types that are defined by C++ or some other language. The std::vector is described by the C++ standard as a sequence container but C++ programmers sometimes refer to the std::vector as a dynamic array. In fact any of the standard sequence containers that provide random access will fit into a more general definition of the term array. For instance, consider these definitions:
dictionary.com

Computers. a block of related data elements, each of which is usually identified by one or more subscripts.

Merriam Webster

(1) : a number of mathematical elements arranged in rows and columns (2) : a data structure in which similar elements of data are arranged in a table b : a series of statistical data arranged in classes in order of magnitude

When I use the term array I am talking about the more general definition of the concept that you would find in any dictionary. When referring to the "data structure" described by section 8.3.4 of the C++ std I will use the term C-Array. The following example shows an example of a C-Array. In it data is a C-Array. This type of a data structure existed in the C language and likewise exists within C++. However, I'll show you in numerous examples why it is sometimes better to consider using one of the standard sequence containers.

1
2
3
const int SIZE(5);
int data[SIZE];
std::generate(data, data + SIZE, rand);


PART I - Passing arrays to functions

Compile and execute the program. It contains a defect which will be evident when you analyze the output.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#include <iostream>

void printArray(int data[])
{
    for(int i = 0, length = sizeof(data); i < length; ++i)
    {
        std::cout << data[i] << ' ';
    }
    std::cout << std::endl;
}

int main()
{
    int data[] = { 5, 7, 8, 9, 1, 2 };
    printArray(data);
	return 0;
}
You will see that only the first 4 elements of the array are printed. The sizeof(data) returns a value of 4! That happens to be the size of the pointer used to pass the array to printArray. That has a couple of implications. First the array does not get copied. The pointer to the first element of the array is copied. C-Arrays do not have copy constructors, assignment operators, or functional interfaces. In the following examples you will see examples using std::vector, std::deque, and std::list which are dynamic sequence containers provided by the C++ std template library. This is not a full tutorial on those containers but they are used in order to show the flexibility in the proposed improvements to the flawed program.

Let's look at another example. In it, I have created numerous printArray functions that are overloaded so that I can show a number of solutions. I will then analyze each of those solutions and explain the pros and cons of each.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#include <iostream>
#include <vector>
#include <deque>
#include <list>

// Method 1: works but very little security.  It is impossible to validate
// the inputs since the size of data still cannot be validated. If length is too large
// undefined behavior will occur.
void printArray(int data[], int length)
{
    for(int i(0); i < length; ++i)
    {
        std::cout << data[i] << ' ';
    }
    std::cout << std::endl;
}

// Method 2: Type safe and more generic.  Works with any container that supports forward iterators.
// Limitation - cannot validate iterators so caller could pass null or invalid pointers.  Typesafe - won't
// allow you to pass inconsistent iterator types.  Allows you to pass any valid range of a container.
template <class ForwardIteratorType> 
void printArray(ForwardIteratorType begin, ForwardIteratorType end)
{
    while(begin != end)
    {
        std::cout << *begin << ' ';
        ++begin;
    }
    std::cout << std::endl;
}

// Method 3 - This implementation is as typesafe and secure as you can get but
// does not allow a subrange since the entire container is expected.  It could
// be useful if you want that extra security and know that you want to operate
// on the entire container.
template <class ContainerType> 
void printArray(const ContainerType& container)
{
    ContainerType::const_iterator current(container.begin()), end(container.end());
    for( ; 
        current != end; 
        ++current)
    {
        std::cout << *current << ' ';
    }
    std::cout << std::endl;
}

int main()
{
    // Method 1.
    const int LENGTH(6);
    int data[LENGTH] = { 5, 7, 8, 9, 1, 2 };
    printArray(data, LENGTH);

    // Method 2.
    printArray(data, data + LENGTH);
    std::vector<int> vData(data, data + LENGTH);
    printArray(vData.begin(), vData.end());
    std::list<int> lData(data, data + LENGTH);
    printArray(lData.begin(), lData.end());
    std::deque<int> dData(data, data + LENGTH);
    printArray(dData.begin(), dData.end());
    // won't compile if caller accidentally mixes iterator types.
    //printArray(dData.begin(), vData.end());

    // method 3.
    printArray(vData);
    printArray(dData);
    printArray(lData);
	return 0;
}


Method 2 is unique in that it allows you to specify any range of the array where method 1 and 2 accomplish the same goal of printing the entire container. If that is what your intention was all along then I submit to you that method 3 is the best. It is most secure and typesafe. There is very little if any chance that a caller could specify invalid parameters. An empty container would not cause any problem. The function simply wouldn't print any values.

It is important to note that a C-Array cannot be passed using method 3. Method 3 requires the use of a container such as the std::vector. C-Arrays are a holdover from the C language and do not have functional interfaces. Method 1 or 2 would need to be used if you are dealing with C-Arrays. I'm sure that there are other ways as well but it is up to you to determine which method is best for your project.

One could produce hundreds of example programs that demonstrate these points even further but I'll leave it up to the readers to copy the program and build other kinds of examples. The beauty of templates is that it reduces repetitive programming tasks. Define the function once so that the function can be called multiple times where each time you specify a different type. It is simply a matter of making sure that the type supports the minimim requirements of the function. The method 3 printArray function requires that the ContainerType has begin() and end() member functions that return forward iterators and that the objects within the container are instances of classes that support the operator<< function. The operator<< can be defined for user defined types as well so method 3 is not limited to containers of built in types.
Last edited on
Part II – Returning arrays from functions

What follows is an example containing two typical problems with returning arrays from functions. For the record, I do not believe that returning arrays from a function is necessary. It may seem natural to return the result of a function but it isn't necessary. You can provide out parameters to a function using pointers or references.

The following program produces this output using MS Visual Studio C++ express 2008.
13 8 9 10 11 12
-858993460 -858993460 -858993460 -858993460 -858993460 3537572
41 18467 6334 26500 19169 15724
41 18467 6334 26500 19169 15724
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <algorithm>
#include <iostream>

// Prints out array elements. Method 2 from PART I.
template <class ForwardIteratorType> 
void printArray(ForwardIteratorType begin, ForwardIteratorType end)
{
    while(begin != end)
    {
        std::cout << *begin << ' ';
        ++begin;
    }
    std::cout << std::endl;
}

// This function is a poor design which will lead to undefined behavior when the caller
// tries to use the pointer that is returned.  data is allocated on the stack and destroyed
// after the function returns.  The pointer to the memory is returned but it is a dangling
// pointer to memory that has already been released.
{
    int data[6] = { 13, 8, 9, 10, 11, 12 };
    int* pointer = data;
    printArray(pointer, pointer + 6);
    return pointer;
}

// The *& symbol means reference to a pointer so that modification of the array 
// results in modification of lotteryNumbers back in main.  In this case the pointer
// updated back in main is valid but the caller has to remember to release the memory
// at some point.  Therefore this approach is error prone.
void generateArray(int *&array, int length)
{
    int* pointer = new int[length];
    // std::generate requires the <algorithm> header
    std::generate(pointer, pointer + length, rand);
    printArray(pointer, pointer + length);
    array = pointer;
}

int main()
{
    int* lotteryNumbers = generateArray();
    printArray(lotteryNumbers, lotteryNumbers + 6);

    const int LENGTH(6);
    generateArray(lotteryNumbers, LENGTH);
    printArray(lotteryNumbers, lotteryNumbers + 6);
    delete lotteryNumbers;
    return 0;
}


The first call to printArray occurred within the version of generateArray that returns a value. At that time the array named data was valid and had been allocated from stack memory since it was created local to the function. Once generateArray returns the memory is returned to the stack and available for the program to reuse for some other purpose. Therefore the pointer that is returned to main points to memory that can and will be overwritten and the second line of output is garbage. The behavior is undefined. There is no way to predict how a program like this will behave. The output that I witnessed may not be the output that you see with another compiler and/or run-time environment.

There is another problem with that same version of generateArray. The function can only return one value. How does main know how big the array is, even if it was properly constructed using heap memory? In this case the programmer who wrote both functions coded this assumption which is a bad design.

Notice that there is another version of generateArray that takes two parameters and has a void return type. The first argument is a reference to a pointer so that the lotteryNumbers pointer of main is modified . The second argument is the length which I require the caller to supply. Although the function can accomplish the task successfully is it the best approach? In a complicated, large scale application memory leaks can cause serious problems and it may not be so easy to manage memory yourself.

I think that we can do even better. One question that arises is why would you want a function that builds an array? You can easily instantiate an array in place. Let me create a function that reads console input, and fills an array for the user. The below example allows the array to be constructed by a function without the caller having to worry about memory leaks or stack vs. heap memory allocations. There are many ways to do this. In this case I chose to allow the caller to pass an array of any size and the function will simply add to it. It could start out empty but it doesn't have to. The std::vector is managing the memory so when the main function exits it is destroyed without the programmer having to worry about garbage collection.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <vector>
#include <iostream>
#include <limits>

// Prints out array elements. Method 2 from PART I.
template <class ForwardIteratorType> 
void printArray(ForwardIteratorType begin, ForwardIteratorType end)
{
    while(begin != end)
    {
        std::cout << *begin << ' ';
        ++begin;
    }
    std::cout << std::endl;
}

// The caller must decide whether to pass an empty container.  This function will 
// add to it.  
void readScores(std::vector<int>& container)
{
    std::cout << "Type the list of scores followed by a non-numeric character and press enter when finished. " 
              << "For instance (22 25 26 f <enter> " << std::endl;
    int temp(0);
    while(std::cin >> temp)
    {
        container.push_back(temp);
    }
    // clear and discard any leftover data from the input stream.
    std::cin.clear();
    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
}

int main()
{
    std::vector<int> scores; // uninitialized.  Let readScores fill it.
    readScores(scores);
    printArray(scores.begin(), scores.end());
    return 0;
}


I chose not to make readScores a template function this time. It doesn't have to be and I wanted to keep the example fairly simple. It could be changed to be more generic. Try it if you dare and see what happens when you run the program. The point is that the function doesn't really need to build the array. Building the array within the function and returning it is tricky. You will either have to deal with garbage collection or return a std container by value which could result in unnecessary copy construction.

Unfortunately return by value means that at the very least you are probably going to have an assignment that would result in the caller's vector allocating memory to hold the copied data. The best way is really to pass by reference with a void return as I did in the earlier example. That example is more flexible as well since the caller can decide whether to add to an existing array or fill a new array.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
std::vector<int> container readScores()
{
    std::vector<int> container;
    std::cout << "Type the list of scores followed by a non-numeric character and press enter when finished. " 
              << "For instance (22 25 26 f <enter> " << std::endl;
    int temp(0);
    while(std::cin >> temp)
    {
        container.push_back(temp);
    }
    // clear and discard any leftover data from the input stream.
    std::cin.clear();
    std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
    
    // return by value. Container will be destroyed but data will be copied into callers vector instance which could result
    // in additional memory allocation.  
    return container;
}


I'll finish by stating that there are other ways of accomplishing these kinds of programming tasks and I'd like to encourage anyone to post some examples using template functions of their own or boost libraries.
Last edited on
Am I correct in understand that you started of wanting to pass arrays into and back from functions, but in the end, put the data in a different data structure and passed that instead?
Last edited on
I don't believe that i used any kind of data structure. Can you clarify the question please?
I believe you've copied the array content into STL containers and passed those instead. If this is the case, you haven't passed arrays at all, you've copied the content into different data structures and passed that instead. Right?
Wrong. The STL containers are sequence containers not data structures. The std::vector is a dynamic array in C++. The word array as a general term does not always have to refer to the C-Array as you seem to be implying.

As far as the main function goes it was convenient to simply build the C++ sequence containers with the original C-Arrays simply to show that you can use those sequence containers instead of C-Arrays. That does not mean that std::vector is any less of an array then a C-Array. It is out of scope for this article to explain the subtle differences of all of those container types but I did want to use them to show the effectiveness of the template functions.
I thought a stack, double ended queue, linked list, vector, array, dictionary, ... are distinctly different data structures with different properties. None of those listed have identical properties.

Anyway, the point is that you copy the data from the array into another data structure. Copies are not always cheap or possible. I don't think you're solving the problem you set out address, which is to pass arrays safely in and out of functions in a type independent way.
Method 2 would just as easily work if given pointers to first and last in an array
instead of copying to an STL container.

1
2
3
4
5
6
7
8
9
10
11
12
13
template< typename T, size_t N >
T* abegin( T (&a)[ N ] )
{ return &a[0]; }

template< typename T, size_t N >
T* aend( T (&a)[ N ] )
{ return &a[ N ]; }

int main() {
    static const int array[] = { 1, 2, 3, 4, 5, 6 };

    printArray( abegin( array ), aend( array ) );
}


Method 2 is generally the preferred approach (compared to Method 3) for a couple of reasons:
1) It is container agnostic, since all containers support forward iterators;
2) It allows you to operate on only a subrange of elements instead of the entire container;
3) It works with pointers/arrays also.

A fourth solution is this:
template< typename T, size_t N >
void printArray( T array[ N ] )
{ /* ... */ }

as it is worthy to note that the above only works (compiles) for fixed-length arrays.
I thought a stack, double ended queue, linked list, vector, array, dictionary, ... are distinctly different data structures with different properties. None of those listed have identical properties.


Thank you for helping to prove my point. They are different types which shows how flexible the templates can be.

Anyway, the point is that you copy the data from the array into another data structure. Copies are not always cheap or possible. I don't think you're solving the problem you set out address, which is to pass arrays safely in and out of functions in a type independent way.


Well this is an article. Obviously you wouldn't create an array in a real program for the sole purpose of initializing a vector or deque. I do not understand why you continue to call them structures. In a real program you might use a vector, deque, list, or some other sequence container instead of a C-Array. In this case the main function is simply creating a variety of different inputs in order to test the template functions. Perhaps you are more of a C programmer and have not used those containers before. They are in fact, different kinds of arrays but they are still arrays in C++. When I get some free time I will have to code a quick example that shows how a vector can be passed to a function that expects a C-Array to demonstrate the point even further. This isn't a C article. It's a C++ article.
Nonesense. If a deque, list and a vector are not data strucutres, what is?
Good question KBW. According to wikipedia and many other sites a data structure is nothing more than an organized collection of data. By that definition a C-Array is also a "data structure". Do you have some other definition of the term that you would like to share? I guess I am a bit confused as to why you think that a vector is a data structure but that a C-Array isn't.

The word array is defined in numerous dictionaries. For instance, consider this definition from dictionary.com
Computers. a block of related data elements, each of which is usually identified by one or more subscripts.


From Merriam Webster,
(1) : a number of mathematical elements arranged in rows and columns (2) : a data structure in which similar elements of data are arranged in a table b : a series of statistical data arranged in classes in order of magnitude


When I use the term array I am talking about the more general definition of the concept that you would find in any dictionary. When referring to the "data structure" described by section 8.3.4 of the C++ std I used the term C-Array. I'll have to think about it and consider adding some definitions at the top of the article to explain that. The C++ std doesn't really have a cute, concise definition of the word array.

I did not write this for the sole purpose of discussing C-Arrays and only C-Arrays. Never did I suggest that one should ever copy a C-Array into a std sequence container for the purpose of passing that to a function. What is implemented within the main function of the examples are nothing more than tests that show the different combinations of inputs that can be used to call the functions. What I did suggest is that the std sequence containers are also kinds of dynamic arrays that can be used instead of C-Arrays.

Maybe I should have included JSmiths example but I knew that he would post his solution anyway and he did. Come to think of it I should have made method 1 above a template function as well. I may edit that and credit Jsmith since his solution is more generic.
hey, guys, check this out:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#include <iostream>
#include <algorithm>
using namespace std;

template<class T, int N>
class SmartArray
{
      private:
      T array[N]; 
      static T foo;
      
      public:
      SmartArray(){}
      ~SmartArray(){}
      
      inline int size() const {return N;}       
      inline T * get_ptr(){return array;}      

      T& operator[](int i) 
      {
            if (i>=0 && i<N) return array[i];
         
            cerr << "out of bounds!...\n";
            return foo;
      }           
      
      const T& operator[](int i) const 
      {
            if (i>=0 && i<N) return array[i]; 
            
            cerr << "out of bounds!...\n";
            return foo;
      }
      
};

template<class T, int N>
T SmartArray<T,N>::foo;

template<class T, int N>
void read_arr(SmartArray<T,N> * arr)
{
     int i;
     for (i=0; i<N; i++)
     {
         cout << "enter element " << i+1 << ": ";
         cin >> (*arr)[i];
     }
}

template<class T, int N>
void print_arr(SmartArray<T,N> * arr)
{
     int i;
     
     cout << "{ ";
     for (i=0; i<N; i++)
     {
         cout << (*arr)[i] << ' ';
     }
     cout << '}' << endl;
}

int main()
{ 
    SmartArray<int,10> arr;
    
    read_arr(&arr);
    
    cout << "your array is:" << endl;
    print_arr(&arr);
    
    sort(arr.get_ptr(),arr.get_ptr()+arr.size());
    cout << "your sorted array is:" << endl;
    print_arr(&arr);
    
    cout << "let's try something bad..." << endl;
    arr[1000]=5;
    
    system("pause");
    return 0;
}


OR if you prefer dynamic to static memory allocation...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
#include <iostream>
#include <algorithm>
using namespace std;

template<class T>
class SmartArray
{
      private:
      T * array; 
      int sz;
      static T foo;
      
      public:
      SmartArray(int s){sz=s; array=new T[sz];}
      ~SmartArray(){delete[] array;}
      
      inline int size() const {return sz;}    
      inline T * get_ptr(){return array;}   
      
      T& operator[](int i) 
      {
            if (i>=0 && i<sz) return array[i];
         
            cerr << "out of bounds!...\n";
            return foo;
      }           
      
      const T& operator[](int i) const 
      {
            if (i>=0 && i<sz) return array[i]; 
            
            cerr << "out of bounds!...\n";
            return foo;
      }
      
};

template<class T>
T SmartArray<T>::foo;

template<class T>
void read_arr(SmartArray<T> * arr)
{
     int i;
     int size=arr->size();
     for (i=0; i<size; i++)
     {
         cout << "enter element " << i+1 << ": ";
         cin >> (*arr)[i];
     }
}

template<class T>
void print_arr(SmartArray<T> * arr)
{
     int i;
     int size=arr->size();
     
     cout << "{ ";
     for (i=0; i<size; i++)
     {
         cout << (*arr)[i] << ' ';
     }
     cout << '}' << endl;
}

int main()
{ 
    SmartArray<int> arr(10);
    
    read_arr(&arr);
    
    cout << "your array is:" << endl;
    print_arr(&arr);
    
    sort(arr.get_ptr(),arr.get_ptr()+arr.size());
    cout << "your sorted array is:" << endl;
    print_arr(&arr);
    
    cout << "let's try something bad..." << endl;
    arr[1000]=5;
    
    system("pause");
    return 0;
}

Last edited on
Your first example reinvents boost::array (http://www.boost.org/doc/libs/1_42_0/doc/html/array.html);
your second reinvents std::vector.





I wanted to follow up with an example that demonstrates why the std::vector is also referred to as a dynamic array. You'll find the details of the language requirement in section 23.2.4.

The elements of a
vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type
other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <vector>

// Imagine that this is a legay API function within an older library.
// It requires a C-Array as an input.
void printArray(int data[], int length)
{
    for(int i(0); i < length; ++i)
    {
        std::cout << data[i] << ' ';
    }
    std::cout << std::endl;
}


int main(int argc, char* argv[]) {
    const int SIZE(20);

    // Certainly you can use printArray for a C-Array.  
    int mainCArray[SIZE];
    std::generate(mainCArray, mainCArray + SIZE, rand);
    printArray(mainCArray, SIZE);

    // The C++ std guarantees that you can also call it even if you are
    // using a std::vector!  
    std::vector<int> mainData(mainCArray, mainCArray + SIZE);
    printArray(&mainData[0], SIZE);
}


It produces the following output:

30732 18699 23847 19795 3349 8875 31982 14716 13436 5349 16440 11945 1841 8521 4403 21459 23726 16394 31433 112 
30732 18699 23847 19795 3349 8875 31982 14716 13436 5349 16440 11945 1841 8521 4403 21459 23726 16394 31433 112


The point is that when writing new C++ code it is sometimes better to choose the C++ sequence containers. As the aforementioned example demonstrates the std::vector can be used instead of a C-Array. On the other hand if you need a sequence container that provides other kinds of behaviors and you are not concerned about maintaining compatibility with older libraries you can use std::list, std::deque, and so forth. The advantage of using the std::vector within newer code is that it easily supports growth and provides new functionality while also supporting the C like interface for accessing and writing elements.

Other languages actually provide a built in array type with functionality similar to the vector. Since the term array was already used in the C/C++ standard (what I have been referring to as the C-Array) and is a rather primitive kind of data structure this can be confusing to programmers coming from other languages. It probably isn't obvious to new C++ programmers that the vector has such capabilities.

Original article was modified
1) I added a paragraph on terminology at the beginning in order to more clearly state how I am using some terms throughout.
2) I updated a few comments above the examples in part II.
Last edited on
Topic archived. No new replies allowed.