WriteProgram for emulator

closed account (L6b7X9L8)
I have just finished writing my WriteProgamV2()

Version 2 accepts a syntax similar to Assembly language and then converts the words into bytecode instructions for the emulated processor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
void WriteProgramV2()
{
	std::string input;								// Store input string
	std::string token;								// Place to store a token from input
	int LineNumber = 0;
	while(std::getline(std::cin, input, '\n'))		// get input string
	{
		if(input == "^") 
		{
			break;
		}
		else
		{
			int pass = 0; 	
			std::istringstream iss(input);				// Store input into the stream
			
			for(pass; pass < 3; pass++) 				// Count what pass it is on 0,1 or 2 ( Instruction(str), reg(str), value(reg(str) or int) )
			{
				iss >> token;
				switch(pass)
				{
				case 0:
						if(token == "MOV")
						{
							std::cout << "Move\t";
							Memory[LineNumber] = 10;
						}
						else if(token == "STORE")
						{
							std::cout << "Store\t";
							Memory[LineNumber] = 4;
						}
						else if(token == "ADD")
						{
							std::cout << "Add\n";
							Memory[LineNumber] = 3;
							pass += 2;
						}
						else if(token == "PRINT")
						{
							std::cout << "Print\t";
							Memory[LineNumber] = 5;
							pass += 1;
						}
				break;

				case 1:
						if(token == "A")
						{
							std::cout << "Accumulator\t";
							Memory[LineNumber] = 0;
						}
						else if(token == "B")
						{
							std::cout << "Base\t";
							Memory[LineNumber] = 1;
						}
				break;

				case 2:
						int val;
						val = atoi(token.c_str()); 
						std::cout << val << "\n"; 
						Memory[LineNumber] = val;
				break;

				default: std::cout << "Something messed up. Invalid instruction probably.\n";
				}
				LineNumber++; // Move to next clear block of memory for the next instruction
			}// END FOR
		}// END IF
	}// END WHILE
	std::cout << "\n\n";
	for(int i = 0; i < 256; i++)
	{
		std::cout << "$" << i << "\t" << (int)Memory[i] << "\n";
	}
}// END FUNCTION 



I am wondering if anyone would like to re-factor this code to make it more readable or faster or efficient or a better style or way. I have done this beyond my current abilities and learned from this, now I wondering if someone would like to teach me a better way.

Thanks in advance/
closed account (j3Rz8vqX)
Another means:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
void WriteProgramV2()
{
    std::string token;								// Store input string
    int LineNumber = 0;

    const std::string command[][4] = {{"MOV","STORE","ADD","PRINT"},{"A","B"}};
    const std::string prompt[][4] = {{"Move\t","Store\t","Add\n","Print\t"},{"Accumulator\t","Base\t"}};
    const int memory[][4] = {{10,4,3,5},{0,1}};
    const int passIncrement[][4] = {{0,0,2,1},{0,0}};
    int n[] = {sizeof(memory[0])/sizeof(int),sizeof(memory[1])/sizeof(int)};

    while(std::getline(std::cin, token))		// get input string
    {
        if(token == "^")
            break;
        else
        {
            for(int pass=0; pass < 3; ++pass){
                //case 0 && 1:
                if(pass>=0 && pass<2){
                    for(int i=0;i<n[pass];++i){
                        if(token == command[pass][i]){
                            std::cout << prompt[pass][i];           //Display the appropriate prompt.
                            Memory[LineNumber] = memory[pass][i];   //Assign the appropriate memory.
                            pass += passIncrement[pass][i];         //Increment pass appropriately.
                            break;
                        }
                    }
                //case 2:
                }else if(pass==3){
                    int val = atoi(token.c_str());
                    std::cout << val << "\n";
                    Memory[LineNumber] = val;
                //case 'Default':
                }else
                    std::cout << "Something messed up. Invalid instruction probably.\n";
                ++LineNumber;
            }// END FOR
        }// END IF
    }// END WHILE
    std::cout << "\n\n";
    for(int i = 0; i < 256; ++i)
        std::cout << "$" << i << "\t" << (int)Memory[i] << "\n";
}// END FUNCTION 
closed account (L6b7X9L8)
Your version looks much more compact, but I'm struggling to follow it. Lines 18-26 also line 10.
closed account (L6b7X9L8)
Also how do you tokenise the instructions without the sstream? Sorry if you have done this I'm struggling to follow your code I have just woke up. :)

Sample Input: MOV A 10
Sample Out : Move Accumulator 10
Instructions in Memory: 10 0 10 (10 Opcode, 0 Register 0, 10 value to put in register)
I'm not an expert on writing script parsers, but here's my input:


Unless you're planning on having more than 256 opcodes, a 2 byte opcode probably isn't necessary. You could condense this into 1 byte pretty easily.

IE: right now you have 10,0,X being MOV A, 'X'
and 10,1,X being MOV B, 'X'

Instead you could do something like 10,X being MOV A,'X' and 11,X being MOV B,'X'.

Of course exceeding 256 opcodes is certainly a possibility. So it might not be the best way to go. You could also do some variable width thing where common instructions like MOV get 1 byte, but more obscure ones get 2 or 3.

But whatever... that's a minor point.



The bigger concern here is that you are assuming all instructions are arranged the same way. That is, the first token is the instruction, the 2nd is either A or B, and the 3rd is an integer.

This won't make sense for a lot of instructions. MOV for example -- this will let you move something into A,B but how would you move something from A,B into memory? (Or is that what STORE does?)

It also doesn't really make sense for PRINT. What does "PRINT A 5" do? Does it print A, or does it print 5, or both?


Maybe instead... you could have a table that dictates the grammar for all these instructions. Then you parse the line and examine what kind of token you have. Then run those tokens through the table to see which opcode it matches.
closed account (L6b7X9L8)
First of all thanks for replying.


Unless you're planning on having more than 256 opcodes, a 2 byte opcode probably isn't necessary. You could condense this into 1 byte pretty easily.

IE: right now you have 10,0,X being MOV A, 'X'
and 10,1,X being MOV B, 'X'

Instead you could do something like 10,X being MOV A,'X' and 11,X being MOV B,'X'.


This is how it was originally done with LOADA and LOADB, but just like the way I program I try to find patterns, the pattern here is 'LOAD' moving one value into another. Then I wanted load to accept arguments, the register to load into and the value to put in it. I did want a MOV A,B but the only way I can think of doing is with another opcode, because the program is loaded into memory I'm unsure how to get the value of the registers without using the CPU..? I hope I make sense.



The bigger concern here is that you are assuming all instructions are arranged the same way. That is, the first token is the instruction, the 2nd is either A or B, and the 3rd is an integer.


Not always the case. In my original code, PRINT made the pass increment one more time before the end of the loop, so it skipped right to storing the value in memory. ADD on the other hand, completely skipped the rest of the loop, as the opcode doesn't take arguments, it just adds B into A.



Maybe instead... you could have a table that dictates the grammar for all these instructions. Then you parse the line and examine what kind of token you have. Then run those tokens through the table to see which opcode it matches.


I guess that's what you did in your example with the const strings?


Sorry Disch thought you were Dput ._.
Last edited on
I did want a MOV A,B but the only way I can think of doing is with another opcode, because the program is loaded into memory I'm unsure how to get the value of the registers without using the CPU..? I hope I make sense.


Yes that would probably need to be another opcode.

Not always the case. In my original code, PRINT made the pass increment one more time before the end of the loop, so it skipped right to storing the value in memory. ADD on the other hand, completely skipped the rest of the loop, as the opcode doesn't take arguments, it just adds B into A.


Ah... I missed that. That's a little confusing. Manipulating loop counters usually trips me up because I don't expect it.

It also might lead to confusion for the user. They might try to do "ADD A, 1" which would assemble without error... but then they'd be surprised when the program is actually doing "ADD A,B"


I guess that's what you did in your example with the const strings?


That was actually DPut's example, not mine.


But I was thinking of something even more involved. Have a table to associate strings with specific keywords... but then have another table to associate keywords and grammer with opcodes.
closed account (L6b7X9L8)

Ah... I missed that. That's a little confusing. Manipulating loop counters usually trips me up because I don't expect it.

It also might lead to confusion for the user. They might try to do "ADD A, 1" which would assemble without error... but then they'd be surprised when the program is actually doing "ADD A,B"


I agree it's rather awful code, but my inferior mind couldn't really devise another way. I have been writing this a 'prototype' if you will, just getting the theory down and what not. I didn't want to add this to my GitHub version even thought it works just because it seems pretty crappy.


That was actually DPut's example, not mine.


Yeah my bad I didn't realise.. sorry!


But I was thinking of something even more involved. Have a table to associate strings with specific keywords... but then have another table to associate keywords and grammer with opcodes.


I don't know if it's me or it sounds worse than it is, could you elaborate a little with a code example?

If you need to know all the ins and outs of how the processor works:

https://github.com/ZorgSpace/CPU_1001A
I don't want to write the whole thing... but here's some pseudo-code to give you the idea.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
enum class Token
{
  // keywords
  mov,
  store,
  add,
  print,
  a,
  b,
  
  // other kinds of tokens
  integer,
  
  
  // reserved to indicate 'nothing'
  nothing
};

//
//  .. have a table or a map or something which will let you get the above Token from a given string
//


struct Grammar
{
    Token   tokens[3];
    int     output[3];
};

// the 'output' will be what you write to the file.
// values > 0 could be the actual opcodes.
// values < 0 could be markers to replace with a provided literal
// and you could use 'end' as a sentinal to indicate no more output
const int end = -1000; // just something that isn't going to be used normally


// then have a lookup to get the output based on the given grammar

const Grammar[] = 
{
    {  {Token::mov,     Token::a,       Token::integer},        { 10,  0,  -1}   },// -1 indicates to use the first provided integer
                                                                                   //   so in this case... the 3rd token
    {  {Token::mov,     Token::b,       Token::integer},        { 10,  1,  -1}   },
    {  {Token::store,   Token::a,       Token::integer},        {  4,  0,  -1}   },
    {  {Token::store,   Token::b,       Token::integer},        {  4,  1,  -1}   },
    {  {Token::add,     Token::nothing, Token::nothing},        {  3,end, end}   },
      // ...
    




I'm a big believer of separating data from logic as much as possible. When you have an elseif chain like in your original post... the data and the logic are strongly intertwined which makes it very cumbersome.
Last edited on
closed account (L6b7X9L8)
I'm really sorry if I am wasting your time Disch...

Although your code looks great I don't feel comfortable implementing it. I don't know what an enum class is, I know what an enum and a class is, not an enum class, the only thing that comes to mind is an enum that can have multiple instances. I'll have to read up on them.

1
2
3
4
5
6
7
8
// the 'output' will be what you write to the file.
// values > 0 could be the actual opcodes.
// values < 0 could be markers to replace with a provided literal <---------- What?

// and you could use 'end' as a sentinal to indicate no more output
const int end = -1000; // just something that isn't going to be used normally

// then have a lookup to get the output based on the given grammar <-------- Again not sure how to do this 


I think the only thing I perfectly understand is the Grammer table, it just seems to be every possible series of tokens that are legal and what it would represent in memory. That's all I can decipher from your code, I'm sorry I'm a noob..
Although your code looks great I don't feel comfortable implementing it


That's fine. Do it however you want. I'm just throwing out ideas. Feel free to take or leave whatever you want.


I don't know what an enum class is, I know what an enum and a class is, not an enum class, the only thing that comes to mind is an enum that can have multiple instances.


An enum class is the same as an enum. Only more strongly typed.

The only differences are:
- You can't implicitly convert a Token to an int (like you could if it was an enum)
- You must refer to the Token identifiers with the Token scope.

Example:
1
2
3
4
5
6
7
8
9
enum Soft { sa, sb };
enum class Hard { ha, hb };

Soft s = sa;  // OK
Hard h = ha;  // ERROR .. 'ha' not in this scope
Hard h1 = Hard::ha;  // OK

int i = sa;  // OK
int j = Hard::ha;  // ERROR... 'ha' is not an int, it's a 'Hard' 


// values < 0 could be markers to replace with a provided literal <---------- What?


Take a look at line 44:
{ {Token::store, Token::a, Token::integer}, { 4, 0, -1} },

The output here is 4,0,X ... where X is whatever integer the user input. So here, a negative number indicates to use that integer.

Compare to something like this:

{ {Token::move, Token::integer, Token::integer}, { 14, -1, -2} },

Since the user is inputting 2 integers here... -1 would be "output the first integer they gave you" and -2 would be "output the 2nd integer they gave you".

// then have a lookup to get the output based on the given grammar <-------- Again not sure how to do this


That's the 'Grammar' table below.


I think the only thing I perfectly understand is the Grammer table, it just seems to be every possible series of tokens that are legal and what it would represent in memory. That's all I can decipher from your code, I'm sorry I'm a noob..


That's really the only part that mattered. The rest was just giving it context.
closed account (L6b7X9L8)
That's fine. Do it however you want. I'm just throwing out ideas. Feel free to take or leave whatever you want.


My problem is that I do want to use it haha, it seems more legible and has room for easier expansion, I don't like copy pasting code that I don't understand. I'll keep reading it and try and implement it another test.



An enum class is the same as an enum. Only more strongly typed.

The only differences are:
- You can't implicitly convert a Token to an int (like you could if it was an enum)
- You must refer to the Token identifiers with the Token scope.

Example:

1
2
3
4
5
6
7
8
9
enum Soft { sa, sb };
enum class Hard { ha, hb };

Soft s = sa;  // OK
Hard h = ha;  // ERROR .. 'ha' not in this scope
Hard h1 = Hard::ha;  // OK

int i = sa;  // OK
int j = Hard::ha;  // ERROR... 'ha' is not an int, it's a 'Hard'  




Thank you for the explanation! One question though, Why would one use an enum class? If you cannot convert them to int like a normal enum. How do you use them in context? Like in your example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
enum class Token
{
  // keywords
  mov,  // If these are a 'Token' If you want to check against the input string..
  store, //Would you do: if(string == Token::mov)? It doesn't seem possible
  add,
  print,
  a,
  b,
  
  // other kinds of tokens
  integer,
  
  
  // reserved to indicate 'nothing'
  nothing
};
I don't like copy pasting code that I don't understand. I'll keep reading it and try and implement it another test.


That's a good philosophy.

One question though, Why would one use an enum class? If you cannot convert them to int like a normal enum.


You'd use them because you can't convert them to an int. It makes them strongly typed.

An integer is kind of generic... is can represent pretty much anything.
A Token is a token -- it can only represent a token.

A specific, strong type makes your code clear and makes it more difficult to misuse a type.


How do you use them in context?


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// create/initialize/assign them like normal
Token foo = Token::mov;

// compare them like normal
if(foo == Token::store)
{
  //...
}
else
{
  //..
}

// can even put them in a switch like normal
switch(foo)
{
case Token::mov:
  // ...
  break;

case Token::store:
  // ...
  break;
}
closed account (L6b7X9L8)
That last example has hit home I think.

Going back to original code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
while(std::getline(std::cin, input, '\n'))		// get input string
	{
		if(input == "^") 
		{
			break;
		}
		else
		{
			int pass = 0; 	
			std::istringstream iss(input);				// Store input into the stream
			
			for(pass; pass < 3; pass++) 				// Count what pass it is on 0,1 or 2 ( Instruction(str), reg(str), value(reg(str) or int) )
			{
				iss >> token;
				switch(pass)
				{



And I am really sorry about pestering you with this, I just want to get it solid.

My code here keeps grabbing an input from the user until the 'break' character is inputted. At lines 10 and 14; Line 10 puts the string input into a stringstream, and at line 14 it puts the next token to be checked. My question is how do using this method, convert the string input into these Token types? Would I still do an if else chain with the string and then assign the Token token accordingly?

Again I am sorry if it's staring me in the face I don't know what's wrong with me today.
A map would probably be easier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
std::map<std::string,Token> dictionary;

dictionary["move"] = Token::move;
dictionary["add"]  = Token::add;
// etc



iss >> token;
auto i = dictionary.find(token);

if(i == dictionary.end())
{
    // given string was not in our dictionary
    //   here you'd probably check to see if the string is a literal
}
else
{
    // the string was in our dictionary.
    //  'i->second' will be the Token
}
closed account (j3Rz8vqX)
Also how do you tokenise the instructions without the sstream?
I wouldn't be able to... haha.

Apologies, it was written out quickly; attention was geared towards the management of prints and procedures in ifs.

You can reapply your token method:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
void WriteProgramV2()
{
    std::string input;								// Store input string
    std::string token;
    int LineNumber = 0;

    const std::string command[][4] = {{"MOV","STORE","ADD","PRINT"},{"A","B"}};
    const std::string prompt[][4] = {{"Move\t","Store\t","Add\n","Print\t"},{"Accumulator\t","Base\t"}};
    const int memory[][4] = {{10,4,3,5},{0,1}};
    const int passIncrement[][4] = {{0,0,2,1},{0,0}};
    int n[] = {sizeof(memory[0])/sizeof(int),sizeof(memory[1])/sizeof(int)};

    while(std::getline(std::cin, input))		// get input string
    {
        if(input == "^")
            break;
        else
        {
            std::istringstream iss(input);
            for(int pass=0; pass < 3; ++pass){
                iss >> token;
                //case 0 && 1:
                if(pass>=0 && pass<2){
                    for(int i=0;i<n[pass];++i){
                        if(token == command[pass][i]){
                            std::cout << prompt[pass][i];           //Display the appropriate prompt.
                            Memory[LineNumber] = memory[pass][i];   //Assign the appropriate memory.
                            pass += passIncrement[pass][i];         //Increment pass appropriately.
                            break;
                        }
                    }
                //case 2:
                }else if(pass==3){
                    int val = atoi(token.c_str());
                    std::cout << val << "\n";
                    Memory[LineNumber] = val;
                //case 'Default':
                }else
                    std::cout << "Something messed up. Invalid instruction probably.\n";
                ++LineNumber;
            }// END FOR
        }// END IF
    }// END WHILE
    std::cout << "\n\n";
    for(int i = 0; i < 256; ++i)
        std::cout << "$" << i << "\t" << (int)Memory[i] << "\n";
}// END FUNCTION  

Your questions:

Line 10: n represents the sizes of each array. Each array associates to a pass.
Array [0] being the first pass and array [1] being the second pass; pass three was hard coded. n[0] would be 4 and n[1] would be 2; case 1 and case 2.

Lines 18-26 would do what your switch, case, and ifs had done.

The "pass" would determine the first set of array data were to be used, or the second set; case 1 or case 2.

The counter "i" would determine which element to be called from the const arrays; there are two arrays, [1] has 4 arguments and [2] has 2 arguments, both being controlled by "n".

-

The idea behind the const tables were to reduce code size and possibly increase performance; by hashing(possibly the wrong word) instead.

In our case, there was a give and a take:

We lost the 4 comparison switch, and gained a 3 comparison for if; not much of improvement since "case 3" and "default" varied too greatly.

The embedded if conditions remain similar, other than losing a performance of "forcing" assignment of pass; since only add and print were the only ones with it originally.

In a situation of having more cases, with similar data, this would greatly improve the performance - not having to compare cases, but directly accessing each case; if all the cases had similar arguments, we wouldn't have had to implement if conditions. My original intent was to have an array of pointer functions and access cases directly instead of if conditions for cases 3 and default.

Possibly if you're okay with pseudo global: (using namespace)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
namespace caseNamespace{
    const std::string command[][4] = {{"MOV","STORE","ADD","PRINT"},{"A","B"}};
    const std::string prompt[][4] = {{"Move\t","Store\t","Add\n","Print\t"},{"Accumulator\t","Base\t"}};
    const int memory[][4] = {{10,4,3,5},{0,1}};
    const int passIncrement[][4] = {{0,0,2,1},{0,0}};
    int n[] = {sizeof(memory[0])/sizeof(int),sizeof(memory[1])/sizeof(int)};

    void case1(std::string token, int &pass, int LineNumber){
        for(int i=0;i<n[pass];++i){
            if(token == command[pass][i]){
                std::cout << prompt[pass][i];           //Display the appropriate prompt.
                Memory[LineNumber] = memory[pass][i];   //Assign the appropriate memory.
                pass += passIncrement[pass][i];         //Increment pass appropriately.
                break;
            }
        }
    }
    void case3(std::string token, int &pass, int LineNumber){
        int val = atoi(token.c_str());
        std::cout << val << "\n";
        Memory[LineNumber] = val;
    }
    void caseDefault(std::string token, int &pass, int LineNumber){
        std::cout << "Something messed up. Invalid instruction probably.\n";
    }
}
void WriteProgramV2()
{
    std::string input;								// Store input string
    std::string token;
    int LineNumber = 0;
    void (*ptrFunction[])(std::string token, int &pass, int LineNumber) =
        {&caseNamespace::case1,&caseNamespace::case3,&caseNamespace::caseDefault};

    while(std::getline(std::cin, input))		// get input string
    {
        if(input == "^")
            break;
        else
        {
            std::istringstream iss(input);
            for(int pass=0; pass < 3; ++pass){
                iss >> token;
                ptrFunction[pass](token,pass,LineNumber);
                ++LineNumber;
            }// END FOR
        }// END IF
    }// END WHILE
    std::cout << "\n\n";
    for(int i = 0; i < 256; ++i)
        std::cout << "$" << i << "\t" << (int)Memory[i] << "\n";
}// END FUNCTION 

Faulty function signature was exploited to create the function pointer.
Last edited on
closed account (L6b7X9L8)
Sorry guys been catching up on sleep.

@Disch

That Map keyword looks interesting, you are able to assign strings to equal other things? I guess not just an Enum Class? I'm going to read up on it now.


@Dput

Thank you for such a detailed post! I am copying and pasting this thread into a text file so I can use it as a reference. I have never used function pointers before.


So it seems this wasn't going to be easy as originally thought. I'm gonna knuckle down and get this re-written using the advice given me. I will keep this thread open for a period of time to see if anyone else would like to bring something to the table.

Cheers ever so much lads.
That Map keyword looks interesting, you are able to assign strings to equal other things? I guess not just an Enum Class? I'm going to read up on it now.


A map is an associative container. It has "keys" which it associates with "values". Both keys and values can be any type.... in this case, we have strings for keys, and Tokens for values.
Topic archived. No new replies allowed.