Some questions about Text Comparison

Forum

Forum
General C++ Programming
Some questions about Text Comparison

Some questions about Text Comparison

Hi guys, I'm new to C++ and I got some problems on the text comparison.
I tried google and based on some tutorials on youtube.

I need to say sorry since my English is bad

First of all, the limitations are I have no idea how to input the single word and the punctuation as a string array. So what I can do is just to put all the things into char array. So, in my case, if the length of two paragraphs are not the same, the results may be wrong. And also, i have no idea to do the count if some words are missing.

For example: Hello world! Today is Tuesday. (Test1)
Helol world! Today Tuesday. (Ans)

It should be count as 2 mistakes in the above case. However if i use char array, I don't know how to determine the 'missing words'.

The following is my code:

if (fin.is_open() && fin1.is_open())
	{
		//file opened successfully so we are here
		cout << "File Opened successfully!!!. Reading data from file" << endl;
		
		while (!fin.eof() && position < array_size)
		{
			fin.get(array[position]); //reading one character from file to array
			position++;
		}
		array[position - 1] = '\0'; //placing character array terminating character

		while (!fin1.eof() && position1 < array_size)
		{
			fin1.get(array1[position1]); //reading one character from file to array
			position1++;
		}
		
		int j = 0;

		cout << "Calculating..." << endl << endl;
		//this loop display all the charaters in array till \0 
		for (int i = 0; array[i] != '\0'; i++)
		{


			while (array[i + 1] == char(32) && array1[j + 1] != char(32))
			{
				if (space == -1)
				{
					count++;
					space = 0;
				}
				j++;
			}

			while (array[j + 1] == char(32) && array1[i + 1] != char(32))
			{
				
				if (space == -1)
				{
					count++;
					space = 0;
				}
				i++;
			}

			if (array[i] != array1[j] && space == -1)
			{
				count++;
				space = 0;
			}

			if (array[i] == char(32) && array1[j] == char(32))
				space = -1;
			
			j++;
		}



		cout << "The number of mistake(s) is/are : "<< count << "\n";
		cout << "Your expected number of mistakes is:" << expect << "\n";

		WINPAUSE;
	}
	else //file could not be opened
	{
		cout << "File could not be opened." << endl;
	}
	WINPAUSE;
	return 0;

Last edited on

SakurasouBusters (732)

What is this macro WINPAUSE?

starf15h (3)

just to avoid the exit of my debug window

starf15h (3)

Could anyone help me please T_T

mbozzi (3944)

You can use std::getline to read the entire line, and std::stringstream to break the line into individual tokens. That assumes that the two units of text you want to compare are on adjacent lines. If each unit of text is in different files, then you can do something like this

1
2
3

std::ifstream my_file("my-file.txt");
std::vector <std::string> text;
for (string tok; my_file >> tok; text.emplace_back(tok));

After which each element of text contains each whitespace-separated token in the file (or else the stream extraction failed -- you can query the state of the std::ifstream object after the loop to check.)

Here is a character-based implementation of the Wagner-Fischer edit-distance algorithm I had lying around. I don't have one templated nor one that is token-based, but it should be extremely easy to modify to accept a vector of strings (tokens).

int edit_dist(std::string const a, std::string const b) {
  std::vector <std::size_t> r0(b.size() + 1, 0);
  std::vector <std::size_t> r1(r0);

  /* Fill the row-based edit distance relative to the empty string. */
  std::iota(r0.begin(), r0.end(), 0);

  for (std::size_t i = 0; i < a.size(); i ++) {
    for (std::size_t j = 0; j < b.size(); j ++) {
      bool const subst_needed = a[i] != b[j];

      constexpr int subst_cost = 2;

      /* dynamically optimize */
      r1[0] = i + 1;
      r1[j + 1] = std::min({r0[j + 1] + 1,
                            r0[j]     + (subst_needed? subst_cost: 0),
                            r1[j]     + 1});
    }

    r0.swap(r1);
  }

  return r0[b.size()];
}

What is char(32)? Yeah, I know that it's ' ', but why not just write that -- magic numbers aren't good. Character literals are in the language for a reason! Use them! If you don't, you're assuming an ASCII-compatible character encoding by using the character codes directly.

(At the very least the functional-style cast should probably be removed in favor of a value initialization expression char {32}; or a static_cast, to silence compiler warnings.)

Last edited on

Topic archived. No new replies allowed.