Bigram

count the number of occurrences of each bigram in a text
bigrams.size();

https://en.cppreference.com/w/cpp/container/vector/size
OK. I'll ask the 'stupid question of the day'. What's a bigram? I've looked on the internet and I'm still not really any wiser. Would you post some sample text with the calculations for the required result. Then I'll have a look once I understand what the code is supposed to do. :)
Hello, bommoddu,
If I do like this, I think it's not great because it's slow, can you help me?


I don't see anywhere in your second code where you call vector::size(), so based on what do measure it's "slow"?

I found the code for the frequencies on your forum.
If you're trying to make a program by copy\pasting random code found on internet into your IDE, that's not how to write programs.

You need solid foundations first, jumping ahead of time won't get you far.

EDIT:

You loop is wrong:
for (int i = 0, len = s.size(); i < len; i++)

Correct:

1
2
3
4
5
6
7
8
for (int i = 0, len = s.length(); i < len; i++)
{
	if (ispunct(s[i]))
	{
		s.erase(i--, 1);
		len = s.size();
	}
}


Remove #include <bits/stdc++.h> because it's not needed and non standard header.
Last edited on
If you want to make it faster don't us push_back(), instead use operator[]

This applies to your second for loop inside the main()

Tokenize and make_bigrams functions execute <1 and 1 miliseconds respectively which is very fast compared to for loop, but then your loop eats all of the remaining time.



Last edited on
Here is somewhat faster version of the loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
using iter_type = std::vector<std::pair<std::string, std::string>>::const_iterator;
for (std::string::iterator it = s.begin(); it != s.end(); ++it)
{

	const auto& bigrams = sentence_bigram(s);

	for (iter_type e = bigrams.cbegin(); e != bigrams.cend(); ++e)
	{

		testconcat = e->first + " " + e->second;
		m[*e] = 0;


		if (*it == ' ')
		{

			if (freq.find(testconcat) == freq.end())
			{
				freq[testconcat] = 1;
			}
			else
			{
				freq[testconcat]++;
			}

			testconcat = "";
		}
		else
		{

			testconcat[testconcat.length()] = *it;
		}
	}
}


EDIT:

Note that you need to turn on compiler optimizations which results in much faster code, almost instant on my computer.
Last edited on
> Knowing this I think that learning a programming language by heart at university
> is like learning to use a manual loom in the age of mechanization !
Yet you're here anyway, stumped for what to do.

University (the good ones) don't teach you programming languages.
They teach you how to program.
It's the difference between "how" you do things, and "what" you choose to perform any given task.

It's the difference between learning "how" to drive, and "what" you choose to do whatever you need doing. Everyone learns in something slow and underpowered. What you choose afterwards (sports car, MPV, SUV, truck, big-rig) depends on what you want.

You just jumped in behind the wheel and thought 'sh!t, what do I do now?'.
You've seen it done, it looked simple enough from the outside, but the cold hard reality of the situation is is that you don't know anything.
Instead of driving to your destination, you're in a ditch having failed to negotiate the first bend in the road.
Asking the cplusplus tow-truck team to drag your @ss out of the mess you've made for yourself.


This is the code 'simplified':

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <iostream>
#include <map>
#include <string>
#include <sstream>
#include <fstream>
#include <cctype>
#include <algorithm>
#include <vector>
#include <iterator>
#include <utility>

using StrPair = std::pair<std::string, std::string>;

auto tokenize(const std::string& s) {
	std::istringstream iss(s);
	std::vector<std::string> v { std::istream_iterator <std::string>(iss), std::istream_iterator< std::string>() };

	return v;
}

auto make_bigrams(const std::vector<std::string>& tokens) {
	std::vector<StrPair> bigrams;

	for (auto it = std::begin(tokens); it != std::prev(std::end(tokens)); ++it)
		bigrams.emplace_back(*it, *std::next(it));

	return bigrams;
}

auto sentence_bigram(const std::string& s) {
	return make_bigrams(tokenize(s));
}

int main() {
	std::ifstream ifs("myfile.txt");
	std::string s((std::istreambuf_iterator<char>(ifs)), (std::istreambuf_iterator<char>()));
	std::map<std::string, int> freq;

	s.erase(std::remove_if(s.begin(), s.end(), [](auto ch) {return std::ispunct(static_cast<unsigned char>(ch)); }), s.end());

	for (const auto& its : s)
		for (const auto& e : sentence_bigram(s))
			if (its == ' ')
				++freq[e.first + " " + e.second];

	for (const auto& f : freq)
		std::cout << f.first << " => " << f.second << '\n';
}


Note that in the OP code loop L72-91 is just what the above code does in L41-44
Now he's taken his toys and walked away in a hissy-fit as didn't like some truths...
Good riddance to bad rubbish.
We've seen lately a spate of ill-tempered new users expecting free help that caters to their programming world-view, and they get all uppity when reality doesn't conform to their whims.

Bye, and thanks for all the fish. 42!
Aw, man. I don't suppose anyone got any screenshots.
Cobrix wrote:
The little powers you have on this website is a danger to yourself. You insult people without knowing who they are in real life. And even if this is not the case of those who leave you to your pride when leaving the forum, one day you may have surprises. Your pretty code does not complete what you can have between the legs. So revel in the miserable power you have left!
wow, that's deep. Touched me in all the tickly places.
Last edited on
LOL. If I was scared a dumb Internet argument might spill into IRL, I'd be so scared of people I meet in person I wouldn't even be able to leave the house.
I don't know what happened here, but I very much doubt you were attacked gratuitously. If you did that thing you're doing now where you're making every post from a different account, it would not surprise if the others didn't like that.
Either way, acting like a lost little lamb who doesn't know what it did wrong after the shitstorm has passed is rather disingenuous.
I've done some fairly dangerous things, but I don't think telling a weirdo that he's not wanted here is one of them. Just go, man. You're wasting your time.
Damn, this thread certainly has gone off into the weeds of the Twilight Zone, but good.
Topic archived. No new replies allowed.