Strange error in string::substr

I'm solving a problem in codeforces
link: https://codeforces.com/edu/course/2/lesson/2/3/practice/contest/269118/problem/A

I'm using two different ways to do something but they give a pretty different results although are identical.

First method:

bool exist(string x){
int l = 0, r = n-1, ans = -1;
while(l <= r){
int mid = (l + r)/2;
string sub1 = s.substr(p[mid], s.length() - p[mid]);
string sub = sub1.substr(0, min(sub1.length(), x.length()));
if(sub >= x){
if(sub == x) ans = mid;
r = mid-1;
} else {
l = mid+1;
}
}
return (ans != -1);
}

Second method:
int first_occ(string x){
int l = 0, r = n-1, ans = -1;
while(l <= r){
int mid = (l + r)/2;
string sub = s.substr(p[mid], min(s.length() - p[mid], x.length()));
if(sub >= x){
if(sub == x) ans = mid;
r = mid-1;
} else {
l = mid+1;
}
}
return ans;
}

The difference is that first method gives TLE(Time Limit exceeded) which means that it takes more than two seconds while the first one runs in about 350ms which definitely means there is something going wrong

Full problem implementation: https://ideone.com/SIj1pr

Sorry for the bad style of the question but it seems format options don't work well :\
If there is something that isn't understandable because of question format just ask me please and I guide you
Last edited on
I find the code difficult to follow. Perhaps if you could describe the algorithm, that'll help my feeble mind.

I never use substr because it creates a new string object.

If you're interested in speed, you should avoid all the string copies, perhaps work with string_views instead.
Last edited on
Your link is just an advertisement; because we have not paid for the class we can't see the problem text.
that + the lack of code tags or any explains makes it unpossible to make much sense of what you are trying to do.
@kbw
Thx for your regard

The used algorithm is suffix array
https://en.wikipedia.org/wiki/Suffix_array#:~:text=In%20computer%20science%2C%20a%20suffix,Type

But I'm concerned in only two methods "first_occ" and "exist"
I guarantee everything outside these two functions have the same effect

and I used substr in both of them so the time should be doubled but it multiplied by 6
@jonnin

It isn't an advertisement, and the course is free, maybe you can't see the problem text because you're not registered

The used algorithm is suffix array
https://en.wikipedia.org/wiki/Suffix_array#:~:text=In%20computer%20science%2C%20a%20suffix,Type

But I'm concerned in only two methods "first_occ" and "exist"
I guarantee everything outside these two functions have the same effect

In both of them I do a binary search two check if the string x is exist as a substring in any suffix(array p is the suffix array)
One obvious difference is that exist uses substr twice per iteration while first_occ uses substr only once per iteration.
Why don't you return as soon as you determine the outcome? This would save any further search through the array.

if(sub == x) ans = mid;
could be (in the second case):
if(sub == x) return mid;


Anyway, why don't you just use std::string::find?
https://cplusplus.com/reference/string/string/find/
Last edited on
@peter87

Yes that's correct but this should double the time, what really happens is that the time multiplied nearly by 6
But isn't sub1 often considerably larger than sub? In that case it seems likely that it will increase the time much more than double it.
Last edited on
As re-formatted:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <string>
#include <algorithm>

bool exist(const std::string& x) {
	int l = 0, r = n - 1, ans = -1;

	while (l <= r) {
		int mid = (l + r) / 2;
		std::string sub1 = s.substr(p[mid], s.length() - p[mid]);
		std::string sub = sub1.substr(0, std::min(sub1.length(), x.length()));

		if (sub >= x) {
			if (sub == x)
				ans = mid;

			r = mid - 1;
		} else
			l = mid + 1;
	}

	return ans != -1;
}

int first_occ(const std::string& x) {
	int l = 0, r = n - 1, ans = -1;

	while (l <= r) {
		int mid = (l + r) / 2;
		std::string sub = s.substr(p[mid], std::min(s.length() - p[mid], x.length()));

		if (sub >= x) {
			if (sub == x)
				ans = mid;

			r = mid - 1;
		} else
			l = mid + 1;
	}

	return ans;
}


BUT n, s, p are undefined - so the code doesn't compile!

When posting code, it's very helpful if you post a complete, compilable example!

What are n, s, p supposed to be??

Post a main() function that defines and initialises these and uses these functions.

PS.
I do a binary search two check if the string x is exist as a substring in any suffix (array p is the suffix array)


So p is an array of std::string - so is n the number of elements in that array?

So what's s - or do you mean x not s??

If you're doing a binary search, then p needs to be sorted...
Last edited on
@seeplus, see:
ahmad alghadban wrote:
Full problem implementation: https://ideone.com/SIj1pr
Thanks. I missed that.

... OMG - what the *$%&! Where did this pre-historic mutant come from?

Paraphrasing a well-known 80's song:

It's your code, and I'll cry if I want to
Cry if I want to
Cry if I want to
You would cry too, if it happened to you


The code as formatted that compiles (although with warnings):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
#include <string>
#include <iostream>
#include <vector>
#include <utility>
#include <cstdio>

using namespace std;
using ll = long long;

struct suffix_array {
	string s;
	int n;
	vector<int> c, p;

	suffix_array(string s) {
		s += '$';
		this->s = s;
		this->n = s.length();

		int n = s.length();
		vector<int> c(n, 0), p(n, 0), cnt(max(n, 256), 0);

		for (char x : s)
			cnt[x]++;

		for (int i = 1; i < 256; i++)
			cnt[i] += cnt[i - 1];

		for (int i = 0; i < n; i++)
			p[--cnt[s[i]]] = i;

		c[p[0]] = 0;

		for (int i = 1; i < n; i++)
			c[p[i]] = (s[p[i]] != s[p[i - 1]]) ? c[p[i - 1]] + 1 : c[p[i - 1]];

		for (int i = 2; i <= 2 * n; i <<= 1) {
			vector<pair<int, int>> v;
			vector<int> pn = p, cn = c;

			for (int j = 0; j < n; j++) {
				v.push_back({ j, (j + i / 2) % n });
			}

			for (int& j : cnt)
				j = 0;

			for (int j = 0; j < n; j++)
				cnt[c[v[j].second]]++;

			for (int j = 1; j < n; j++)
				cnt[j] += cnt[j - 1];

			for (int j = 0; j < n; j++)
				pn[--cnt[c[v[j].second]]] = j;

			for (int& j : cnt)
				j = 0;

			for (int j = 0; j < n; j++)
				cnt[c[pn[j]]]++;

			for (int j = 1; j < n; j++)
				cnt[j] += cnt[j - 1];

			for (int j = 0; j < n; j++)
				p[--cnt[c[pn[n - 1 - j]]]] = pn[n - 1 - j];

			cn[p[0]] = 0;

			int classes = 1;

			for (int j = 1; j < n; j++) {
				pair<int, int> cur = { c[p[j]], c[(p[j] + i / 2) % n] };
				pair<int, int> prev = { c[p[j - 1]], c[(p[j - 1] + i / 2 + n) % n] };

				if (cur != prev)
					++classes;

				cn[p[j]] = classes - 1;
			}
			c.swap(cn);
		}

		// c.pop_back();
		// for(int &x : c) x--;
		// p.erase(p.begin());
		this->p = p;
		this->c = c;
	}

	void prt() {
		for (int x : p)
			cout << x << " ";

		cout << endl;
	}

	int first_occ(string x) {
		int l = 0, r = n - 1, ans = -1;

		while (l <= r) {
			int mid = (l + r) / 2;
			string sub = s.substr(p[mid], min(s.length() - p[mid], x.length()));

			if (sub >= x) {
				if (sub == x)
					ans = mid;

				r = mid - 1;
			} else {
				l = mid + 1;
			}
		}

		return ans;
	}

	bool exist(string x) {
		int l = 0, r = n - 1, ans = -1;

		while (l <= r) {
			int mid = (l + r) / 2;
			string sub1 = s.substr(p[mid], s.length() - p[mid]);
			string sub = sub1.substr(0, min(sub1.length(), x.length()));

			if (sub >= x) {
				if (sub == x)
					ans = mid;

				r = mid - 1;
			} else {
				l = mid + 1;
			}
		}
		return (ans != -1);
	}
};

char buf[300010];

string read() {
	scanf("%s", buf);
	return buf;
}

int main() {
	string s = read();
	suffix_array suf(s);

	int q;

	scanf("%d", &q);

	while (q--) {
		string x = read();

		puts(suf.exist(x) ? "Yes" : "No");
	}
}

Last edited on
OK. As a minor refactor, this is C++ code that compiles without any warnings:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
#include <string>
#include <iostream>
#include <vector>
#include <utility>

struct suffix_array {
	std::string s;
	size_t n {};
	std::vector<size_t> c, p;

	suffix_array(const std::string& ss) {
		s = ss + '$';
		n = s.length();

		std::vector<size_t> cnt(std::max<size_t>(n, 256));

		c.resize(n);
		p.resize(n);

		for (char x : s)
			++cnt[x];

		for (size_t i { 1 }; i < 256; ++i)
			cnt[i] += cnt[i - 1];

		for (size_t i {}; i < n; ++i)
			p[--cnt[s[i]]] = i;

		c[p[0]] = 0;

		for (size_t i { 1 }; i < n; ++i)
			c[p[i]] = (s[p[i]] != s[p[i - 1]]) ? c[p[i - 1]] + 1 : c[p[i - 1]];

		for (size_t i {2}; i <= 2 * n; i <<= 1) {
			std::vector<std::pair<size_t, size_t>> v;
			std::vector<size_t> pn { p }, cn { c };

			for (size_t j {}; j < n; ++j)
				v.push_back({ j, (j + i / 2) % n });

			for (auto& j : cnt)
				j = 0;

			for (size_t j {}; j < n; ++j)
				++cnt[c[v[j].second]];

			for (size_t j { 1 }; j < n; ++j)
				cnt[j] += cnt[j - 1];

			for (size_t j {}; j < n; ++j)
				pn[--cnt[c[v[j].second]]] = j;

			for (auto& j : cnt)
				j = 0;

			for (size_t j {}; j < n; ++j)
				++cnt[c[pn[j]]];

			for (size_t j ={1}; j < n; ++j)
				cnt[j] += cnt[j - 1];

			for (size_t j {}; j < n; ++j)
				p[--cnt[c[pn[n - 1 - j]]]] = pn[n - 1 - j];

			cn[p[0]] = 0;

			size_t classes { 1 };

			for (size_t j { 1 }; j < n; ++j) {
				std::pair<size_t, size_t> cur { c[p[j]], c[(p[j] + i / 2) % n] };
				std::pair<size_t, size_t> prev { c[p[j - 1]], c[(p[j - 1] + i / 2 + n) % n] };

				if (cur != prev)
					++classes;

				cn[p[j]] = classes - 1;
			}

			c.swap(cn);
		}

		//std::cout << "n = " << n << ", s = " << s << '\n';
		prt();
	}

	void prt() {
		for (const auto& x : p)
			std::cout << s[x] << ' ';

		std::cout << '\n';
	}

	int first_occ(const std::string& x) {
		size_t l {}, r { n - 1 };
		int ans { -1 };

		while (l <= r) {
			const auto mid { (l + r) / 2 };
			const auto sub { s.substr(p[mid], std::min(s.length() - p[mid], x.length())) };

			if (sub >= x) {
				if (sub == x)
					ans = static_cast<int>(mid);

				r = mid - 1;
			} else
				l = mid + 1;
		}

		return ans;
	}

	bool exist(const std::string& x) {
		size_t l {}, r { n - 1 };
		int ans { -1 };

		while (l <= r) {
			const auto mid { (l + r) / 2 };
			const auto sub1 { s.substr(p[mid], s.length() - p[mid]) };
			const auto sub { sub1.substr(0, std::min(sub1.length(), x.length())) };

			if (sub >= x) {
				if (sub == x)
					ans = static_cast<int>(mid);

				r = mid - 1;
			} else
				l = mid + 1;
		}

		return (ans != -1);
	}
};

std::string read() {
	std::string buf;

	std::getline(std::cin, buf);
	return buf;
}

int main() {
	const auto s { read() };
	suffix_array suf(s);

	size_t q {};

	std::cin >> q;
	std::cin.ignore();

	while (q--) {
		const auto x { read() };

		if (!x.empty())
			std::cout << (suf.exist(x) ? "Yes" : "No") << '\n';
	}
}


p is a vector of indexes into s with the ordering indicating sorted order.


qwerty
$ e q r t w y
2
ty
Yes
wa
No

For the 2 search functions, perhaps something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
int first_occ(const std::string& x) {
		size_t l {}, r { n - 1 };

		while (l <= r) {
			const auto mid { (l + r) / 2 };
			const auto bitr { s.begin() + p[mid] };
			const std::string_view sub { bitr, bitr + std::min(n - p[mid], x.length()) };

			if (sub == x)
				return static_cast<int>(mid);

			if (sub > x)
				r = mid - 1;
			else
				l = mid + 1;
		}

		return -1;
	}

	bool exist(const std::string& x) {
		return first_occ(x) != -1;
	}


Note requires:

 
#include <string_view> 

Last edited on
Building vector p can be greatly simplified by using a std::multimap. Consider as more 'modern' C++:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <string>
#include <iostream>
#include <vector>
#include <utility>
#include <string_view>
#include <map>

class suffix_array {
	std::string s;
	std::vector<size_t> p;

public:
	suffix_array(const std::string& ss) : s(ss) {
		std::multimap<char, size_t> chars;

		p.reserve(s.size());

		for (size_t i {}; const auto& c : s)
			chars.emplace(c, i++);

		for (const auto& [ch, indx] : chars)
			p.push_back(indx);

		//prt();
	}

	void prt() const {
		for (const auto& x : p)
			std::cout << s[x] << ' ';

		std::cout << '\n';
	}

	int first_occ(const std::string& x) const {
		for (size_t l {}, r { s.size() - 1 }; r < s.size() && l <= r; ) {
			const auto mid { (l + r) / 2 };
			const auto bitr { s.begin() + p[mid] };
			const std::string_view sub { bitr, bitr + std::min(s.size() - p[mid], x.length())};

			if (sub == x)
				return static_cast<int>(mid);

			if (sub > x)
				r = mid - 1;
			else
				l = mid + 1;
		}

		return -1;
	}

	bool exist(const std::string& x) const {
		return first_occ(x) != -1;
	}
};

std::string read() {
	std::string buf;

	std::getline(std::cin, buf);
	return buf;
}

int main() {
	const suffix_array suf(read());
	size_t q {};

	for (std::cin >> q >> std::ws; q--; )
		if (const auto x { read() }; !x.empty())
			std::cout << (suf.exist(x) ? "Yes" : "No") << '\n';
}


NB. Note changes to first_occ()
Last edited on
now, now... I only take prehistoric if it predates vector. This is clearly at least stone age.

Artist: Lesley Gore
Album: I'll Cry If I Want To
Released: 1963

which predates my first car, which had a 7 liter engine, by 3 years.
Last edited on
I was thinking of Barbara Gaskin's version from 1981.
Topic archived. No new replies allowed.