finding duplicates in a string

Hi guys I am doing some challenges on codewars.com andthe objective of this kata is too find any duplicates in a string

I have been trying to solve this for almost 2 hours now and the solution I came up with isn't doing the trick.

it does find duplicates but it leaves out the duplicated letter from even appearing once

anybody know any good algorithms I can study to solve this problem?


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
  #include <iostream>
#include <vector>

using namespace std;

class TwoToOne
{
public:
    static bool testDuplicate(char a,const string& s1){

        bool found = false;
        bool duplicate = false;


        for(int i = 0; i < s1.size(); i++){

            if(a == s1.at(i) && !found){

                cout << "found" << endl;
                found = true;

                continue;
            }
            if(a == s1.at(i) && found){

                cout << "return true" << endl;
                duplicate = true;
                return true;
            }
        }
        return false;
    }

    static std::string longest(const std::string &s1, const std::string &s2){

           string newStr1;
           cout << s1.size() << endl;

           for(int i = 0; i < s1.size(); i++){

              if(!testDuplicate(s1.at(i),s1)){

                   cout << "adding" << endl;
                   char ch = s1.at(i);
                   newStr1.push_back(ch);
              }
           }

           return newStr1;
    }
};



int main()
{

   
    string str = "abbbbcdefgggghijklmnooooopqrstuvwxyz";
    string str2 = TwoToOne::longest(str,str);
    cout << str2 << endl;

    // prints acdefhijklmnpqrstuvwxyz

}
Last edited on
is there anything I can add to my function to help find the duplicates of letters

thanks
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <string>
#include <set>
using namespace std;


string removeDuplicates( string str, string &duplicates )
{                                                                    
   set<char> S, Sd;
   string reduced;
   duplicates = "";

   for ( char c : str )
   {
           if ( S .insert(c).second ) reduced    += c;
      else if ( Sd.insert(c).second ) duplicates += c;
   }
   return reduced;
}


int main()
{
   string duplicates;
   string str = "abbbbcdefgggghijklmnooooopqrstuvwxyz";

   string reduced = removeDuplicates( str, duplicates );

   cout << "Reduced string: " << reduced << '\n';
   cout << "Duplicates: " << duplicates << '\n';
}


Reduced string: abcdefghijklmnopqrstuvwxyz
Duplicates: bgo
Would've been nice if you posted the exact specifications with examples instead of just paraphrasing ;D
Count the number of Duplicates
Write a function that will return the count of distinct case-insensitive alphabetic characters and numeric digits that occur more than once in the input string. The input string can be assumed to contain only alphabets (both uppercase and lowercase) and numeric digits.

Example
"abcde" -> 0 # no characters repeats more than once
"aabbcde" -> 2 # 'a' and 'b'
"aabBcde" -> 2 # 'a' occurs twice and 'b' twice (bandB)
"indivisibility" -> 1 # 'i' occurs six times
"Indivisibilities" -> 2 # 'i' occurs seven times and 's' occurs twice
"aA11" -> 2 # 'a' and '1'
"ABBA" -> 2 # 'A' and 'B' each occur twice
source: https://www.codewars.com/kata/counting-duplicates
Last edited on
Create what I like to call a "frequency hash", which maps some token to a count of how often it was seen.
- every character seen (in this case an upper or lowercase version, up to you), increments the structure's count by 1
- analyze the map for anything that occurred twice or more

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#include <iostream>
#include <iomanip>
#include <string>
#include <vector>
#include <map>

using namespace std;

map<char, int> Freq(const string& s)
{
    map<char, int> m;
    for(auto& c : s)
        m[tolower(c)]+=1;
    return m;
}


int main() 
{
    vector<string> examples = 
    {
        "abcde",
        "aabbcde",
        "aabBcde",
        "indivisibility",
        "Indivisibilities",
        "aA11",
        "ABBA"
    };
    int count;
    for (auto& e : examples)
    {
        auto freq = Freq(e);
        count = 0;
        for (auto& kv : freq)
        {
            if (kv.second > 1)
                count += 1;
        }
        cout << setw(18) << e << " => " << count << endl;
    }

    return 0;
}

             abcde => 0
           aabbcde => 2
           aabBcde => 2
    indivisibility => 1
  Indivisibilities => 2
              aA11 => 2
              ABBA => 2
Thank-you for the clarification, @icy1.

Sticking with sets (though I definitely like your use of maps) I came up with the following.
(You'd be amazed by how long it took me to type "indivisibilities" properly!)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iostream>
#include <string>
#include <cstring>
#include <vector>
#include <set>
using namespace std;


int numberOfDuplicates( string str )
{                                                                    
   set<char> S, Sd;
   for ( char c : str )
   {
      if ( !isalnum( c ) ) continue;
      c = tolower( c );
      if ( !S.insert(c).second ) Sd.insert(c);
   }
   return Sd.size();
}


int main()
{
   vector<string> tests = { "abcde",
                            "aabbcde",
                            "aabBcde",
                            "indivisibility",
                            "indivisibilities",
                            "aA11",
                            "ABBA-yeah!" };
   
   for ( string s : tests ) cout << s << ": " << numberOfDuplicates( s ) << '\n';
}


abcde: 0
aabbcde: 2
aabBcde: 2
indivisibility: 1
indivisibilities: 2
aA11: 2
ABBA-yeah!: 2
Last edited on
Sticking with sets (though I definitely like your use of maps)

A plain old array would be faster and smaller.
dhayden wrote:
A plain old array would be faster and smaller.

Yep, this is true of course with a little math for the index offsets, though I like the readability and extensibility of map. Can use plain arrays and go straight into the count, discarding the generated data structure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
#include <iostream>
#include <iomanip>
#include <string>
#include <vector>

using namespace std;

// Use plain arrays with index offsets.  
//   Note that '0' is 48, and
//             'Z' is 90
int CountDups(const string& s)
{
    int count = 0;
    int i;
    int arr[43]{};
    for(auto& c : s)
    {
        i = toupper(c)-48;
        arr[i]+=1;
        if (arr[i]==2)
            count += 1;
    }
    return count;
}

int main() 
{
    vector<string> examples = 
    {
        "abcde",
        "aabbcde",
        "aabBcde",
        "indivisibility",
        "Indivisibilities",
        "aA11",
        "ABBA"
    };

    for (auto& e : examples)
        cout << setw(18) << e << " => " << CountDups(e) << endl;

    return 0;
}
for duplicates only you don't need any magic in ascii.

char count[256] = {0};
for(... the string)
count[string[index]]++;

and then count tells you what was duplicated.

it wastes a little space, but it skips the magic offsets etc.

the type of count may be more efficient on some machines as default word sized (usually, int) or it may be faster as char (or unsigned char). You can poke around with it to see if your hardware favors one or the other.

this is an adaptation of the 'bucket sort' algorithm which can be used to sort your data or count frequency / duplicates or similar tasks for very limited types of data (chars being a good use case).

is this a fastest wins contest, or cleanest code, or just 'do it' contest?
Last edited on
thanks guys :)
Topic archived. No new replies allowed.