problem with deletion of a map element and insertion of a new element in its place.

Hello everyone.
This question is from the book "Jumping into C++", chapter 19. I have removed "<" from HTML tags, so that the whole question is visible.

Write a program that reads in HTML text that the user types in (don’t worry, we’ll cover how to
read from a file later). It should support the following HTML tags: html>, head>, body>, b>,
i>, and a>. Each HTML tag has an open tag, e.g. html>, and a closing tag which has a forwardslash
at the start: /html>. Inside the tag is text that is controlled by that tag: b>text to be
bolded/b> or i>text to be italicized/i>. The head> /head> tags control text that is
metadata, and the body> /body> tags surround text that is to be displayed. a> tags are used
for hyperlinks, and have an URL in the following format: <a href=URLtext /a>.
Once your program has read in some HTML, it should simply ignore html>. It should remove
any text from the head> section so that it doesn't show up when you output it. It should then
display all text in the body, modifying it so that any text between b> and /b> will show up
with asterisks (*) around it, any text inside i> and /i> will show up with underscores (_)
around it, and any text with a <a href=linkurl link text /a> tag shows up as link text (linkurl).

--------------------------------------------------------------------------------------------------
Brief explanation of my attempt -
Step 1 - assigned the string between body tags to a new string variable "body".
Step 2 - created a map "check" and inserted parts of the string in "body" leaving out the string between b> /b> tags.
Step 3 - tried to replaced the inserted elements of map "check" which contain i> i/> tags, by first assigning the values of itr->first and itr->second to new variables, then deleting that element and inserting new elements in map after removing string between i> /i> tags.

**This is where I face the problem. The code doesn't give an error but the program doesn't return 0. If I remove the "check.erase(index)", it works, but that doesn't solve the purpose.**

I tried doing a similar thing in a separate code and it worked. I have included the other code after the following one, which is the attempted solution to the question above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
  int main()
    {
    map<int,string>check;

    string x = "<html>";
    string xx = "</html>";
    string y = "<head>";
    string yy = "</head>";
    string z = "<body>";
    string zz = "</body>";
    string a = "<b>";
    string aa = "</b>";
    string b = "<i>";
    string bb = "</i>";
    string c = "<a href";
    string ccc = ">";
    string cc = "</a>";

    string type = "<html>\n<head>I dont know what to write here</head>\n<body>\ngurasees is <b>my</b> good <i>name</i>. You can find <b>me</b> on the <i>web</i>. Link is <a href = www.google.com>gura</a>\n</body>\n</html>";

    int f = type.find(z)+z.size()+1;
    int g = type.find(zz)-f;
    string body = type.substr(f,g);

     map<int,string>::iterator itr;
     map<int,string>::iterator ends = check.end();

     //--------------------------STEP 2-----------------------------------------

     check.insert ({body.find(body[0]), body.substr (body.find(body[0]), body.find(a,0)-1)});
     
    int k = body.find(a, 0);
    k++;
    int p = body.find(aa,0);

    for (int i = body.find(a,k); i != string::npos && p != string::npos; i = body.find(a, i), p = body.find(aa, p))
    {
        check.insert({p+aa.size(),body.substr(p+aa.size(), i-1-(p+aa.size()))});
        i++; p++;
    }
        check.insert({p + aa.size() , body.substr(p + aa.size() , body.size() - 1 - p ) } ) ;

     //------------------------STEP 3-------------------------------------------

    for (itr = check.begin(); itr != ends; itr++)
    {
        cout << itr->second.find(b) << endl; //prints out correctly
        if (itr->second.find(b) != string::npos)
        {
            int index = itr->first;
            string italic_component = itr->second;
            
            check.erase(index);
            check.insert ({index, italic_component.substr(italic_component.find(italic_component[0]), italic_component.find(b, 0)-1)});
            int k = italic_component.find(b, 0);
            k++;

            int j = italic_component.find(bb, 0);

            for (int i = italic_component.find(b, k); i != string::npos && j != string::npos; i = italic_component.find(b, k), j = italic_component.find(bb, j) )
            {
                check.insert({index + j + bb.size() , italic_component.substr(j + bb.size(), i - 1 - (j+bb.size()))});
                i++; j++ ;
            }

            check.insert({j + index + bb.size() ,italic_component.substr(j + bb.size() , italic_component.size() - 1 - j ) } ) ;
        }
    }

    //------------------------------STEP 4-------------------------------------

     for ( int i = body.find( a , 0 ), j = body.find( aa , 0), k = body.find( b , 0 ), l = body.find( bb , 0); i != string::npos && j != string::npos && j != string::npos && k != string::npos; i = body.find(a, i ), j = body.find(aa,j), k = body.find(b, k), l = body.find(bb,l) )
    {
        check.insert( {i, "*" + body.substr(i+a.size(), j-(i+a.size())) + "*" } );
        check.insert( {k, "_" + body.substr(k+b.size(), l-(k+b.size())) + "_" } );
        i++; j++; k++; l++;
    }
    }


-------------------------THE OTHER CODE--------------------------------

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
       int main()
    {
    map<int,string>check;

    check[10] = "gurasees is there for you";
    check[20] = "aka is not born yet";
    check[30] = "sam please wait for years";

    map<int,string>::iterator itr;
    map<int,string>::iterator ends = check.end();

    for(itr = check.begin(); itr!= ends; itr++)
    {
        if(itr->second.find("born") != string::npos)
         {
             
             int index = itr->first;
             string italic = itr->second;

             check.erase(itr->first);
             

             check.insert({index, italic.substr(italic.find(italic[0]), 5)});
         }
    }
Last edited on
The code doesn't compile, doesn't give an error


If the code didn't compile that error messages would be generated.

That code compiles without error using VS 2022 (although there are several warnings re conversions).

If you have your compiler set to treat warnings as errors, then it won't compile - but again you'd have messages generated.

if you used range-for and structured bindings, then your code can be simplified. For the other code (not withstanding that the .substr() on L23 just gives the first 5 chars) perhaps:

1
2
3
4
5
6
7
8
9
	map<int, string> check;

	check[10] = "gurasees is there for you";
	check[20] = "aka is not born yet";
	check[30] = "sam please wait for years";

	for (auto& [index, italic] : check)
		if (italic.find("born") != string::npos)
			italic = italic.substr(italic.find(italic[0]), 5);


Similarly for you main code.
If I've understood the requirements, IMO you seem to be over-complicating this. Consider simply:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <string>
#include <iostream>

int main()
{
	const std::string type {"<html>\n<head>I dont know what to write here</head>\n<body>\ngurasees is <b>my</b> good <i>name</i>. You can find <b>me</b> on the <i>web</i>. Link is <a href = www.google.com>gura</a>\n</body>\n</html>"};

	const auto bods {type.find("<body>")};
	const auto bode {type.find("</body>", bods)};
	auto body {type.substr(bods + 6, bode == std::string::npos ? bode : bode - bods - 6)};

	for (size_t pos {}, tag {}; (tag = body.find('<', pos)) != std::string::npos; pos = tag + 3) {
		if (tag < body.size() - 2) {
			if (body[tag + 1] == 'b' && body[tag + 2] == '>')
				body.replace(tag, 3, "*");

			if (body[tag + 1] == 'i' && body[tag + 2] == '>')
				body.replace(tag, 3, "_");
		}

		if (tag < body.size() - 3) {
			if (body[tag + 1] == '/' && body[tag + 2] == 'b' && body[tag + 3] == '>')
				body.replace(tag, 4, "*");

			if (body[tag + 1] == '/' && body[tag + 2] == 'i' && body[tag + 3] == '>')
				body.replace(tag, 4, "_");
		}

		if (body[tag + 1] == 'a') {
			const auto equ {body.find('=', tag)};
			const auto tend {body.find('>', equ + 1)};
			const auto web {body.substr(equ + 1, tend - equ - 1)};
			const auto term {body.find("</a>", tend + 1)};

			body.replace(tag, term - tag + 4, web);
		}
	}

	std::cout << body << '\n';
}


which displays:


gurasees is *my* good _name_. You can find *me* on the _web_. Link is  www.google.com

The code doesn't compile, doesn't give an error


I had changed the statement to the following:

The code doesn't give an error but the program doesn't return 0.


Only the first iteration in step 3 goes through (i confirmed it by printing values) and after that program returns a huge negative number : -1073741819 (0xC000005), rather than returning 0.

I am trying to find what's happening using debugger but haven't been able to completely figure out yet. Basically, after going through the first iteration in step 3, the debugger gives out "Segmentation fault", while pointing at L45.

I've not come across
auto
and
replace
yet. I will read about it.
Thank you.
Last edited on
Works for me with clang 12.0.0
You need to compile with C++17
the debugger gives out "Segmentation fault", while pointing at L45.


What makes you think that ++itr is valid every time though the for loop? You're erasing and inserting into the map potentially within each loop - which can invalidate itr and hence invalidate ++itr.

For an alternative take, which may be more useful in some situations, consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
#include <string>
#include <iostream>
#include <stack>

int main()
{
	const std::string type {"<html>\n<head>I dont know what to write here</head>\n<body>\ngurasees is <b>my</b> good <i>name</i>. You can find <b>me</b> on the <i>web</i>. Link is <a href = www.google.com>gura</a>\n</body>\n</html>"};

	std::stack<std::string> tags;
	std::string data;
	bool intag {};
	bool gotbody {};
	bool notatag {true};

	for (const auto& ch : type) {
		if (ch == '<') {
			intag = true;
			tags.push({});
		} else if (ch == '>' && intag) {
			if (tags.top() == "body")
				gotbody = true;

			if (tags.top() == "/body")
				gotbody = false;

			if (gotbody) {
				if (tags.top() == "i" || tags.top() == "/i")
					data += '_';

				if (tags.top() == "b" || tags.top() == "/b")
					data += '*';

				if (tags.top()[0] == 'a') {
					notatag = false;
					data += tags.top().substr(tags.top().find('=') + 1);
				}

				if (tags.top() == "\a")
					notatag = true;
			}

			if (const auto tg {tags.top()}; tg[0] == '/')
				if (tags.pop(); tags.top() == tg.substr(1))
					tags.pop();

			intag = false;
		} else
			if (intag)
				tags.top() += ch;
			else if (notatag && gotbody)
				data += ch;
	}

	std::cout << data << '\n';
}



gurasees is *my* good _name_. You can find *me* on the _web_. Link is  www.google.com

@thmm Thank you for the response.

@seeplus
Thank you for another solution to the problem.

BTW the result should be
gurasees is*my* good_name_. You can find*me* on the_web_. Link is gura.


not
gurasees is*my* good_name_. You can find*me* on the_web_. Link is www.google.com


What makes you think that ++itr is valid every time though the for loop? You're erasing and inserting into the map potentially within each loop - which can invalidate itr and hence invalidate ++itr.


I understand that now. itr keeps pointing to the memory that has been freed (I guess freed is the right word?) making the memory invalid to use. This invalidates the pointer as well, so trying to increment it will result in program crash.

I learnt through debugging (in the smaller code) that when I erased the memory itr is pointing to, it is allocated back after using check.insert() (after check.erase()). In that case there is no segmentation fault as itr++ becomes valid. This is what was happening in the smaller code that I wrote, all of the times I ran it before I raised the question. Now that I started going through debugger again and again to understand what was happening, the program crashed a few times and I realised the problem.

Same with the main code, in that case commenting out L66 was making the code run smoothly without any segmentation fault. The reason was the same situation being repeated as described above.

But if I inserted another element (this means inserted 2 elements after deleting an element, once at L54 & another at L66 ), it was making it highly unlikely that the memory that was erased (that itr is still pointing to) will be allocated back again. Only once from all the times, the whole code ran, returning 0 in this case.
Please correct me if I am wrong.

So I reassigned itr = check.begin() after L66, making it valid again, which solves the problem.


Last edited on
Topic archived. No new replies allowed.