How many distinct substrings does a given string S have?
For example, if S = "abc", S has 7 distinct substrings: {"","a","b","c","ab","bc","abc"}. Note that
the empty string and S itself are considered substrings of S.
On the other hand, if S = "aaa", S has only 4 distinct substrings: {"","a","aa","aaa"}.
The first line of the input file contains N, the number of test cases. For each test case, a line
follows giving S, a string of from 1 to 1000 alphanumeric characters. Your output consists of one
line per case, giving the number of distinct substrings of S. Try to write an efficient program.
Sample Input
2
abc
aaa
Output for Sample Input
7
4
****please dont use pointer. i haven't learnt them yet. please help me, i sont even know from where to begin****
hi:
i think i have an idea about the solution, try searching for the combinatorial function in maths, this function calculates the number of subsets of a given set of values, you can then make an easy algorithm to subtract the identical subsets.
i'm not putting any code, you have to do that part yourself, programming is about experimenting.
In the first line i is the beginning of substring, in the second line j is the end position. It cannot be less than i. If j=i, then you have the empty string. You can start from i+1, but then you need to add it to your unique substrings. Of course, you are generating the empty substring a lot of times, but the program is supposed to keep only one of them. And it does :)
In the 3rd line j-i is the length of the substring. But I think the two loops are similar
1. Why have you used array instead of vector? Is it to optimize on time complexity (removes the need for multiple memory allocation in code)?
2. What is the significance of the number 1024*128? Why couldn't you just multiply it and define MAX_UNIQUE as const std::size_t MAX_UNIQUE = 131072 ; ? Any specific reason?
sorry but i can't use vector...can u tell me a simple way to do it?
and
****please dont use pointer. i haven't learnt them yet.
> without the need for sorting
The versions I posted are not more efficient. Standard library algorithms were not allowed; the intent was to make the code easy to understand; ergo linear search and append instead of sort/unique/erase.
> What is the significance of the number 1024*128? Why couldn't you just multiply it and define MAX_UNIQUE as const std::size_t MAX_UNIQUE = 131072 ; ?
I find it simpler to visualize the magnitude of the number as 128K if it is written as 1024*128 (or 128*1024 )
A string with distinct components has exactly s*(s+1)/2+1 substrings where s is the string size(e.g. the string "abc" has 3*4/2+1=7 substrings and the string "abc...xyz" i.e. english alphabet with 26 characters has 26*27/2+1=352 substrings). This fact can be proved as follows:
Let S be a string with s distinct characters.
S = "C1C2...Cs"
Now, selecting substrings whith one component, we have exactly ssubstrings:
"C1", "C2",...,"Cs".
Now substrings of two components:
"C1C2", "C2C3",...,"Cs-1Cs".
Important note: the index of the first component of each substring gives its rank in the sequence and, of course, the last one gives the number of substrings, s-1.
So, the last substring with three components will be Cs-2Cs-1Cs and its rank will be s-2 who is the very number of substrings whith 3 components, s-2.
Continuing this process we have for the penultimate selection by s-1:
"C1C2...Cs-1", "C2C3...Cs" and, obviously, 2 substrings whith s-1 components each one.
And finally, the last substring is S itself:
"C1C2...Cs", one substring.
Reviewing the entire process we have the final result, adding all the obtained results:
Total_number = s + s-1 + ... + 2 + 1 or:
Total_number = 1 + 2 + ... + s-1 + s and adding the two sums we have:
-------------------------------------------------------------------------------
2*Total_number =(s+1)+(s+1) + ... + (s+1)+(s+1) here we have s times (s+1) and finally:
Total_number = s*(s+1)/2.
Adding the empty string we have the number of distinct substrings of a given string S whith s distinct components.
Important note: this process may be used as a guide to write the necessary code for the final step: removing duplicates if S has some identical components.