Hi! I'm building header-only statistics library that uses a lot of templates. I have created a class template that takes as input an array-like container:
1 2 3 4 5 6 7 8 9
//I removed all the member functions/variables
//so you could see the general layout of the class:
template <typename T_cont>
class LinearRegression{
LinearRegression(const T_cont& X, const T_cont& Y);
};
template<T_cont>
LinearRegression<T_cont>::LinearRegression(const T_cont& X, const T_cont& Y){}
This works for simple linear regression (1d arrays), but I want to expand it
to allow for multiple linear regression(2d arrays as input). I decided to go about doing this by specializing the LinearRegression class to accept a 2d vector. However, I wanted to allow for the nested vector to hold any numeric type.
I guess my two questions are:
1.) Is the following code-snippet a viable option (does it actually specialize the above class)?
2.) I know my code is bad, but am I being a complete potato by designing the class this way? This is my first time using templates, so any general guidance is more than welcome!
My reasoning was that it would simplify the interface for users. Rather than knowing the names/interfaces of two classes, they would only have to know it for one single class. Ideally, the member functions would be identical in both classes. I'm planning on having a bunch of different regression models in my library and I don't want to have a bunch of unique classes for each one, so i'm trying to cluster them together when they are similar and have nearly identical interfaces.
Your suggestion raises a good point though, it would make it more readable if I created a second class whose name suggested that it takes a 2d array instead. In your opinion, which do you think would be easier for the users of the library?
Since there's no concrete type defined, no you can't specialize in this way. You're asking the compiler to specialize for vector<vector<T>> where T could be any type, and that's not sufficient to specialize on. It isn't, actually, a specialization.
If T were known, then yes, you could.
However, in order to advise I'd really need to better understand what you're going to do with the data. For example, is it more than merely how the container is traversed?
I'll put this notion in mind before we exchange further, but consider that the STL algorithms library (and some of utility) are not classes, but non-member template functions. That's how they apply to multiple containers.
Perhaps what you require isn't a full explicit template specialization but to design based on the algorithm of processing the data through template functions reacting to the class type.
Thank you for clarifying the specialization question!
For the what am I doing with the data question: it's definitely more than just how the data is traversed. The entire algorithm that uses the data is different, but the interface for how the end result is accessed is almost identical. To be honest, I guess I could just override the constructor for when a 2d vector is passed?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
//I'm not sure if this would be valid, though.
//Does the first constructor need to have T in it somewhere
//in order for the class to be properly instantialized?
//(and would the second constructor need to have a T_cont as a parameter?)
template <typename T_cont, typename T>
class LinearRegression{
LinearRegression(const T_cont& X, const T_cont& Y);
//new:
LinearRegression(const std::vector<std::vector<T>>& X, const std::vector<std::vector<T>>& Y);
};
template<T_cont, T>
LinearRegression<T_cont, T>::LinearRegression(const T_cont& X, const T_cont& Y){}
template<T_cont, T>
LinearRegression<T_cont, T>::LinearRegression(vectors parameters){}
I think I see what you're saying with the second half of your post. I am trying to save the state of the LinearRegression object in order to cut down on the time it takes for the calculations (new data might be added later on, and rather than recomputing everything again I can just add this new data with a much smaller calculation cost), so I would think that keeping it in a class is the best way to go?
Edit: Ah, Niccolo I think I finally understand the last sentence of your post. Essentially you're saying that maybe it would be best to create non-class template functions that handle different classes in specific ways? Wouldn't that be similar to just creating a separate class for the object that takes as input a 2d vector ?
Also, I could possibly check if the given container "T_cont" is a 1d or 2d array and then branch to the appropriate member functions according to the correct dimension. Since member functions in templated classes will only be instantiated if they have the possibility of being called, wouldn't this mean that both the 1d and 2d member functions are instantiated? That would just be extra code with no possibility of being called added to the executable?
Ideally, the library shouldn't require the user to specify different classes or different functions only because some feature of the input data has changed.
Offer a polymorphic interface, and keep the details to the background.
Also, I could possibly check if the given container "T_cont" is a 1d or 2d array and then branch to the appropriate member functions according to the correct dimension. Since member functions in templated classes will only be instantiated if they have the possibility of being called, wouldn't this mean that both the 1d and 2d member functions are instantiated? That would just be extra code with no possibility of being called added to the executable?
Yes.
Consider some hypothetical (broken) code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
template <typename T>
struct A
{
int dispatch(T t)
{
if (is_2d_array(t))
return array_2d(t);
elsereturn array_1d(t);
}
private:
int array_2d(T t) { return t[0][0]; }
int array_1d(T t) { return t[0]; }
};
Both member functions will always be instantiated (following a call to dispatch), and it doesn't matter whether you put in a 1d or 2d array; one of the member functions is not going to compile.
It is possible to avoid the potential problem by doing the branching at compile time.
If you wan't your class to accept 2 kinds of construction parameters such as std::vector and plain C array but still maintain single class or single constructor to handle both cases you may need to check the type.
Simple approach to do this:
1 2 3 4 5 6 7 8 9 10
#include<vector>
#include<type_traits>
template<typename T>
struct is_vector :
public std::false_type { };
template<typename T>
struct is_vector<std::vector<T>> :
public std::true_type { };
then somewhere in you class template member function you can check the type:
note this is hipothetical example done on construtor, which is not best way, but serves as an example on how to check the type in any member function:
1 2 3 4 5 6 7 8 9 10 11 12 13
template <typename T_cont, typename T>
class LinearRegression
{
LinearRegression(const T_cont& X, const T_cont& Y)
if(is_vector<T_cont>::value)
{
// we are dealing with vector
}
else
{
// dealing with plain C array's
}
};