building a multilayer perceptron

closed account (E093605o)
I want to build a multilayer perceptron and am almost done. I cannot initialize a test object though. I have written a basic linear algebra library "Matrix.h" to use.
In the main() of the MLP.cpp file I commented the error above the line producing it.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
//MLP.h
#pragma once
#include "Matrix.h"

template<typename T>
class MLP {
 public:
  std::vector<size_t> units_per_layer;
  std::vector<Matrix<T>> bias_vectors;
  std::vector<Matrix<T>> weight_matrices;
  std::vector<Matrix<T>> activations;

  double lr = .001d;

  MLP(std::vector<size_t> units_per_layer):
    units_per_layer(units_per_layer),
    weight_matrices(),
    bias_vectors(),
    activations()
    {

  for (size_t i = 0; i < units_per_layer.size() - 1; ++i) {
    size_t in_channels{units_per_layer[i]};
    size_t out_channels{units_per_layer[i+1]};

    // initialize to random Gaussian
    auto W  = mtx<T>::randn(out_channels, in_channels);
    weight_matrices.push_back(W);

    auto b  = mtx<T>::randn(out_channels, 1);
    bias_vectors.push_back(b);

    activations.resize(units_per_layer.size());
  }
}

inline auto sigmoid(double x) {
  return 1.0 / (1 + exp(-x));
}

inline auto d_sigmoid(double x){
  return (x * (1 - x));
}   

template <typename T>
auto MLP::forward(Matrix<T> x) {
  assert(std::get<0>(x.shape) == units_per_layer[0] && std:::get<1>(x.shape));

  activations[0] = x;
  Matrix prev(x);
  for (int i = 0; i < units_per_layer.size() - 1; ++i) {

    Matrix y = weight_matrices[i].matmul(prev);
    y = y + bias_vectors[i];
    y = y.apply_function(sigmoid);
    activations[i+1] = y;
    prev = y;
  }
  return prev;
}

template<typename T>
void MLP<T>::backprop(Matrix<T> target) {
  assert(get<0>(target.shape) == units_per_layer.back());

  // determine the simple error
  // error = target - output
  auto y = target;
  auto y_hat = activations.back();
  auto error = (target - y_hat);

  // backprop the error from output to input and step the weights
  for(int i = weight_matrices.size() - 1 ; i >= 0; --i) {
    //calculating errors for previous layer
    auto Wt = weight_matrices[i].T();
    auto prev_errors = Wt.matmul(delta);

    // apply derivative of function evaluated at activations
    //backprop for biases
    auto d_outputs = activations[i+1].apply_function(d_sigmoid);
    auto gradients = error.multiply_elementwise(d_outputs);
    gradients = gradients.multiply_scalar(lr);

    // backprop for weights
    auto a_trans = activations[i].T();
    auto weight_gradients = gradients.matmul(a_trans);

    //adjust weights
    bias_vectors[i] = bias_vectors[i].add(gradients);
    weight_matrices[i] = weight_matrices[i].add(weight_gradients);
    error = prev_errors;
  }
}

};

//MLP.cpp

#include "Matrix.h"
#include "MLP.h" 


int main(){
    
    std::vector<size_t> layers = {3,3};
    /* the following line throws an error: no instance of constructor "MLP" matches 
    the argument list */
    MLP test(layers);
    return 0;
}
Can't see what the specific problem is, and I don't have your Matrix.h, I can't compile it.

However, symbol units_per_layer in the constructor is ambiguous; not for the compiler, but for you.

In the initializer list, you really should use:
 
  units_per_layer(std::move(units_per_layer))
as there's no reason to copy that vector. You should also move it in main to remove that copy:
 
  MLP test(std::move(layers));

When you refer to units_per_layer in the constructor, you're picking up the parameter, not the member. It's best to give the parameter a different name.

Distilling your code to:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#include <vector>

template<typename T>
class MLP {
        std::vector<size_t> units_per_layer;

public:
        MLP(std::vector<size_t> units_per_layer):
                units_per_layer(units_per_layer) {
        }
};

int main() {
        MLP<int>{{2, 3, 4}};
}

It's seems the problem is MLP is a template class, and you didn't specify the type.
Last edited on
closed account (E093605o)
thanks for your answer. That indeed was one of the problems, it still does not compile however. The compiler complains "error: extra qualification ‘MLP<T>::’ on member ‘forward’ [-fpermissive]
45 | auto MLP<T>::forward(Matrix<T> x)" and the same for the other template method. Do you know why this happens? And the same error is raised for the backprop method as it has the same syntactical structure.
There's some issues with templates in MLP. There's a different syntax for defining a member function separate to the class definition (Matrix) to defining within the class definition (MLP). Also, delta isn't defined. Also note that there are some copies which possibly aren't needed which could reduce performance - but I'd suggest getting the code to work first and only if the performance is not acceptable for your real world data to then look at further optimisation. I've changed the params for a couple of functions to be pass by const ref as opposed to by value to remove some copying of data data structures.

This compiles:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
#include <vector>
#include <cmath>
#include <cassert>
#include <iostream>
#include <tuple>
#include <random>
#include <functional>

template<typename Type>
class Matrix {

	size_t cols {};
	size_t rows {};

public:
	std::vector<std::vector<Type>> data;
	std::tuple<size_t, size_t> shape;
	size_t elementCount {};

	/* constructors */
	Matrix(size_t rowsArg, size_t colsArg) : cols(colsArg), rows(rowsArg),
		elementCount(rows* cols), shape(std::tuple<size_t, size_t>(rows, cols)) {
		data = std::vector<std::vector<Type>>(rows, std::vector<Type>(cols));
	}

	Matrix() {};

	//methods
	void print();

	Matrix<Type> matmul(Matrix<Type>& m);

	Matrix<Type> multiply_elementwise(Matrix<Type>& m);

	Matrix<Type> multiply_scalar(Type scalar);

	Matrix<Type> square();

	Matrix<Type> add(Matrix<Type>& m);

	Matrix<Type> sub(Matrix& target);

	Matrix<Type> T();

	Matrix<Type> apply_function(Type(*func)(Type));

	Type& operator()(size_t row, size_t col) {
		assert(row < data.size() && col < data[0].size());
		return data[row][col];
	}

	Matrix operator+(Matrix& target) {
		return add(target);
	}

	Matrix operator-() {
		Matrix output(rows, cols);
		for (size_t r = 0; r < rows; ++r) {
			for (size_t c = 0; c < cols; ++c) {
				output(r, c) = -(*this)(r, c);
			}
		}
		return output;
	}

	Matrix operator-(Matrix& target) {  // for cleaner usage
		return sub(target);
	}


};

// methods
template<typename Type>
void Matrix<Type>::print() {
	for (int i = 0; i < rows; i++) {
		for (int j = 0; j < cols; j++) {
			std::cout << data[i][j] << " ";
		}
		std::cout << std::endl;
	}
}

template <typename Type>
Matrix<Type> Matrix<Type>::matmul(Matrix<Type>& target) {
	assert(cols == target.rows);
	Matrix output(rows, target.cols);

	for (size_t r = 0; r < output.rows; ++r) {
		for (size_t c = 0; c < output.cols; ++c) {
			for (size_t k = 0; k < target.rows; ++k)
				output(r, c) += (*this)(r, k) * target(k, c);
		}
	}
	return output;
};

template <typename T>
struct mtx {
	static Matrix<T> randn(size_t rows, size_t cols) {
		Matrix<T> M(rows, cols);

		std::random_device rd {};
		std::mt19937 gen { rd() };

		// init Gaussian distr. w/ N(mean=0, stdev=1/sqrt(numel))
		T n = static_cast<T>(M.elementCount);
		T stdev { 1 / sqrt(n) };
		std::normal_distribution<T> d { 0, stdev };

		// fill each element w/ draw from distribution
		for (size_t r = 0; r < rows; ++r) {
			for (int c = 0; c < cols; ++c) {
				M(r, c) = d(gen);
			}
		}
		return M;
	}
};

template <typename Type>
Matrix<Type> Matrix<Type>::multiply_elementwise(Matrix<Type>& target) {
	assert(shape == target.shape);
	Matrix output((*this));
	for (size_t r = 0; r < output.rows; ++r) {
		for (size_t c = 0; c < output.cols; ++c) {
			output(r, c) = target(r, c) * (*this)(r, c);
		}
	}
	return output;
}

template<typename Type>
Matrix<Type> Matrix<Type>::square() {
	Matrix output((*this));
	output = multiply_elementwise(output);
	return output;
}

template<typename Type>
Matrix<Type> Matrix<Type>::multiply_scalar(Type scalar) {
	Matrix output((*this));
	for (size_t r = 0; r < output.rows; ++r) {
		for (size_t c = 0; c < output.cols; ++c) {
			output(r, c) = scalar * (*this)(r, c);
		}
	}
	return output;
}

template<typename Type>
Matrix<Type> Matrix<Type>::add(Matrix& target) {
	assert(shape == target.shape);
	Matrix output(rows, std::get<1>(target.shape));

	for (size_t r = 0; r < output.rows; ++r) {
		for (size_t c = 0; c < output.cols; ++c) {
			output(r, c) = (*this)(r, c) + target(r, c);
		}
	}
	return output;
}

template<typename Type>
Matrix<Type> Matrix<Type>::sub(Matrix& target) {
	Matrix neg_target = -target;
	return add(neg_target);
}

template<typename Type>
Matrix<Type> Matrix<Type>::T() {
	size_t new_rows { cols }, new_cols { rows };
	Matrix transposed(new_rows, new_cols);

	for (size_t r = 0; r < new_rows; ++r) {
		for (size_t c = 0; c < new_cols; ++c) {
			transposed(r, c) = (*this)(c, r);  // swap row and col
		}
	}
	return transposed;
}

template<typename Type>
Matrix<Type> Matrix<Type>::apply_function(Type(*func)(Type)) {
	Matrix output((*this));
	for (size_t r = 0; r < rows; ++r) {
		for (size_t c = 0; c < cols; ++c) {
			output(r, c) = func((*this)(r, c));
		}
	}
	return output;
}


template<typename T>
class MLP {
public:
	std::vector<size_t> units_per_layer;
	std::vector<Matrix<T>> bias_vectors;
	std::vector<Matrix<T>> weight_matrices;
	std::vector<Matrix<T>> activations;

	double lr = .001;

	MLP(const std::vector<size_t>& units_per_layer) :
		units_per_layer(units_per_layer) {
		//weight_matrices(),
		//bias_vectors(),
		//activations() {

		for (size_t i = 0; i < units_per_layer.size() - 1; ++i) {
			size_t in_channels { units_per_layer[i] };
			size_t out_channels { units_per_layer[i + 1] };

			// initialize to random Gaussian
			auto W = mtx<T>::randn(out_channels, in_channels);
			weight_matrices.push_back(W);

			auto b = mtx<T>::randn(out_channels, 1);
			bias_vectors.push_back(b);

			activations.resize(units_per_layer.size());
		}
	}

	inline auto sigmoid(double x) {
		return 1.0 / (1 + exp(-x));
	}

	inline auto d_sigmoid(double x) {
		return (x * (1 - x));
	}

	auto forward(const Matrix<T>& x) {
		assert(std::get<0>(x.shape) == units_per_layer[0] && std:::get<1>(x.shape));

		activations[0] = x;
		auto prev(x);

		for (int i = 0; i < units_per_layer.size() - 1; ++i) {
			auto y = weight_matrices[i].matmul(prev);

			y = y + bias_vectors[i];
			y = y.apply_function(sigmoid);
			activations[i + 1] = y;
			prev = y;
		}
		return prev;
	}

	void backprop(const Matrix<T>& target) {
		assert(get<0>(target.shape) == units_per_layer.back());

		// determine the simple error
		// error = target - output
		auto y = target;
		auto y_hat = activations.back();
		auto error = (target - y_hat);

		// backprop the error from output to input and step the weights

		for (int i = weight_matrices.size() - 1; i >= 0; --i) {
		  //calculating errors for previous layer
			auto Wt = weight_matrices[i].T();

			//// delta NOT DEFINED
			//auto prev_errors = Wt.matmul(delta);

			// apply derivative of function evaluated at activations
			//backprop for biases
			auto d_outputs = activations[i + 1].apply_function(d_sigmoid);
			auto gradients = error.multiply_elementwise(d_outputs);
			gradients = gradients.multiply_scalar(lr);

			// backprop for weights
			auto a_trans = activations[i].T();
			auto weight_gradients = gradients.matmul(a_trans);

			//adjust weights
			bias_vectors[i] = bias_vectors[i].add(gradients);
			weight_matrices[i] = weight_matrices[i].add(weight_gradients);
			////error = prev_errors;
		}
	}
};


int main() {
	const std::vector<size_t> layers { 3,3 };

	MLP<double> test (layers);
}

Last edited on
Your indentation seems to be out, making it difficult to see that you've written:
1
2
3
4
5
6
7
8
template<typename Type>
class Matrix {
public:
    template <typename T>
    auto MLP::forward(Matrix<T> x) {
        //...
    }
};

The error should now be obvious. (That MPL:: shouldn't be there.)

I suggest you use tabs or a shiftwidth of 4, and ensure your indentation is correct. It can be checked with clang-tidy.
closed account (E093605o)
thanks for all your answers. The MLP works now. One last thing, I want to log the error and outputs in a file to plot some graphs later, how do I do the logging part? I want to use fstream in the Main.cpp file but I did not find helpful information right away.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
//MLP.h
#pragma once
#include "Matrix.h"

template<typename T>
class MLP {
 public:
  std::vector<size_t> units_per_layer;
  std::vector<Matrix<T>> bias_vectors;
  std::vector<Matrix<T>> weight_matrices;
  std::vector<Matrix<T>> activations;
  std::vector<Matrix<T>> zs;

  double lr = .001d;

  MLP(std::vector<size_t> units_per_layer):
    units_per_layer(units_per_layer),
    weight_matrices(),
    bias_vectors(),
    zs(),
    activations()
    {

  for (size_t i = 0; i < units_per_layer.size() - 1; ++i) {
    size_t in_channels{units_per_layer[i]};
    size_t out_channels{units_per_layer[i+1]};

    // initialize to random Gaussian
    auto W  = mtx<T>::randn(out_channels, in_channels);
    weight_matrices.push_back(W);

    auto b  = mtx<T>::randn(out_channels, 1);
    bias_vectors.push_back(b);

    auto z = mtx<T>::randn(out_channels,1);
    zs.push_back(z);

    activations.resize(units_per_layer.size());
  }
}

inline auto sigmoid(double x) {
  return 1.0 / (1 + exp(-x));
}

inline auto d_sigmoid(double x){
  return (x * (1 - x));
}   


auto forward(Matrix<T> x) {
  assert(std::get<0>(x.shape) == units_per_layer[0] && std::get<1>(x.shape));

  activations[0] = x;
  Matrix prev(x);
  for (int i = 0; i < units_per_layer.size() - 1; ++i) {

    Matrix y = weight_matrices[i].matmul(prev);
    y = y + bias_vectors[i];
    y = y.apply_function(sigmoid);
    activations[i+1] = y;
    prev = y;
  }
  return prev;
}

void backprop(Matrix<T> &target) {
  assert(std::get<0>(target.shape) == units_per_layer.back());

  // determine the simple error
  // error = target - output
  Matrix<T> y = target;
  Matrix<T> y_hat = activations.back();
  Matrix<T> error = (target - y_hat);
  Matrix<T> last_z = zs[zs.size()-1];
  Matrix<T> delta_L = error.multiply_elementwise(last_z.apply_function(d_sigmoid));

  // backprop the error from output to input and step the weights
  for(int i = weight_matrices.size() - 2 ; i >= 0; --i) {
    // calculating error for previous layer
    Matrix<T> delta_l = weight_matrices[i].T().matmul(delta_L).multiply_elementwise(zs[i].apply_function(d_sigmoid));

    // calculating the change of weights and biases
    Matrix<T> nabla_w = delta_l.matmul(activations[i].T());
    Matrix<T> nabla_b = delta_l;

    // updating weights and biases
    weight_matrices[i] = weight_matrices[i] - nabla_w.multiply_scalar(lr);
    bias_vectors[i] = bias_vectors[i] - delta_l.multiply_scalar(lr);

    // updating the error term delta
    delta_L = delta_l;
  }
}

};

//Main.cpp
#pragma once
#include "Matrix.h"
#include "MLP.h"
#include <vector>
#include <iostream>
#include <fstream>
#include <math.h>

int main() {

  // init model
  std::vector<size_t> layers = {1,8,8,8,1};


  // open file to save loss, x, y, and model(x)
  std::ofstream my_file; 
  my_file.open ("data.txt");

  int max_iter{10000};

  const double PI {3.14159};
  

  MLP<double> model(layers);

  for (int i = 0; i < max_iter; i++){
    auto x = mtx<double>::randn(1, 1).multiply_scalar(PI);
    auto y = x.apply_function([](double v) -> double { return sin(v) * sin(v); });

  // forward and backward
  auto y_hat = model.forward(x); 
  model.backprop(y); // loss and grads computed in here

  // function that logs (loss, x, y, y_hat)
  log(my_file, x, y, y_hat); 
  }

  my_file.close();
}
Last edited on
I want to log the error and outputs in a file to plot some graphs later, how do I do the logging part?

I'm not sure what you mean, just open an output file and write to it. You can write to std::clog for logging, that output will go to stderr. It's up to you, I'm not sure what you want to do.

However personally, I'd use spdlog for logging, and use a std::ofstream to write the data to a file.
https://github.com/gabime/spdlog
closed account (E093605o)
the thing is, in line 133 it says "no instance of overloaded function "log" matches the argument list". Might it be the case that to log this file I need some function to have been implemented in Matrix.h?
If I'm reading the code right x, y, y_hat are all of type Matrix?

So you need a function:

1
2
3
4
template<typename T>
void log (std::ostream& file, const Matrix<T>& x, const Matrix<T>& y, const Matrix<T>& y_hat) {
    // use file as the stream to write data to
}


Note L133 should be my_file and not file

eg like the Matrix print function but use file instead of std::cout

For each Matrix, how do you want the data written?

What program is going to use this data? Does it have a format requirement?
closed account (E093605o)
Yes they're all of type Matrix. Once I logged this I thought of writing a small python script that plots these values to see if the neural net actually works.
I think it would be easiest to have one line per log.
Each Matrix denotes a vector here, input vector, true output vector and output vector of the NN. I would like them to be logged in one line.
If you want one line per Matrix - and Matrix is a 'vector of vector', how do you want to separate each row? or start line with the number of row/cols or ??

When devising log formats, this often depends upon what's easiest for the reader. As I don't use python, what's the easiest format of logs for python to read?
You might find it useful to overload the operator<< for the Matrix class eg:

1
2
3
4
5
6
7
8
9
10
11
template<typename T>
std::ostream& operator<<(std::ostream& os, const Matrix<T>& mat) {
    for (const auto& r : data) {
        for (const auto& e : r)
            os << e << ' ';

        os << '\n';
    }

    return os;
}


As data is public, this doesn't need to be a friend function.

Then log becomes something like:

1
2
3
4
5
6
template<typename T>
void log (std::ostream& file, const Matrix<T>& x, const Matrix<T>& y, const Matrix<T>& y_hat) {
    file << x << '\n';
    file << y << '\n';
    file << y_hat << '\n';
}

Last edited on
closed account (E093605o)
sorry I went for a run, thats why I'm replying so late. I wanted to put all the contents of one log() call into one line, maybe thats easier to parse because you can then read contents line by line and strip special separation characters. I think the easiest format for python would be \n separated strings.
Last edited on
Topic archived. No new replies allowed.