finding duplicates in a string

Forum

Forum
Beginners
finding duplicates in a string

finding duplicates in a string

Apr 30, 2018 at 9:54pm

adam2016 (1529)

Hi guys I am doing some challenges on codewars.com andthe objective of this kata is too find any duplicates in a string

I have been trying to solve this for almost 2 hours now and the solution I came up with isn't doing the trick.

it does find duplicates but it leaves out the duplicated letter from even appearing once

anybody know any good algorithms I can study to solve this problem?

  #include <iostream>
#include <vector>

using namespace std;

class TwoToOne
{
public:
    static bool testDuplicate(char a,const string& s1){

        bool found = false;
        bool duplicate = false;


        for(int i = 0; i < s1.size(); i++){

            if(a == s1.at(i) && !found){

                cout << "found" << endl;
                found = true;

                continue;
            }
            if(a == s1.at(i) && found){

                cout << "return true" << endl;
                duplicate = true;
                return true;
            }
        }
        return false;
    }

    static std::string longest(const std::string &s1, const std::string &s2){

           string newStr1;
           cout << s1.size() << endl;

           for(int i = 0; i < s1.size(); i++){

              if(!testDuplicate(s1.at(i),s1)){

                   cout << "adding" << endl;
                   char ch = s1.at(i);
                   newStr1.push_back(ch);
              }
           }

           return newStr1;
    }
};



int main()
{

   
    string str = "abbbbcdefgggghijklmnooooopqrstuvwxyz";
    string str2 = TwoToOne::longest(str,str);
    cout << str2 << endl;

    // prints acdefhijklmnpqrstuvwxyz

}

Edit & run on cpp.sh

Last edited on Apr 30, 2018 at 9:55pm

May 1, 2018 at 12:42pm

adam2016 (1529)

is there anything I can add to my function to help find the duplicates of letters

thanks

May 1, 2018 at 1:37pm

lastchance (6980)

#include <iostream>
#include <string>
#include <set>
using namespace std;


string removeDuplicates( string str, string &duplicates )
{                                                                    
   set<char> S, Sd;
   string reduced;
   duplicates = "";

   for ( char c : str )
   {
           if ( S .insert(c).second ) reduced    += c;
      else if ( Sd.insert(c).second ) duplicates += c;
   }
   return reduced;
}


int main()
{
   string duplicates;
   string str = "abbbbcdefgggghijklmnooooopqrstuvwxyz";

   string reduced = removeDuplicates( str, duplicates );

   cout << "Reduced string: " << reduced << '\n';
   cout << "Duplicates: " << duplicates << '\n';
}

Edit & run on cpp.sh

Reduced string: abcdefghijklmnopqrstuvwxyz
Duplicates: bgo

May 1, 2018 at 2:05pm

icy1 (596)

Would've been nice if you posted the exact specifications with examples instead of just paraphrasing ;D

Count the number of Duplicates
Write a function that will return the count of distinct case-insensitive alphabetic characters and numeric digits that occur more than once in the input string. The input string can be assumed to contain only alphabets (both uppercase and lowercase) and numeric digits.

Example
"abcde" -> 0 # no characters repeats more than once
"aabbcde" -> 2 # 'a' and 'b'
"aabBcde" -> 2 # 'a' occurs twice and 'b' twice (bandB)
"indivisibility" -> 1 # 'i' occurs six times
"Indivisibilities" -> 2 # 'i' occurs seven times and 's' occurs twice
"aA11" -> 2 # 'a' and '1'
"ABBA" -> 2 # 'A' and 'B' each occur twice

source: https://www.codewars.com/kata/counting-duplicates

Last edited on May 1, 2018 at 2:06pm

May 1, 2018 at 2:20pm

icy1 (596)

Create what I like to call a "frequency hash", which maps some token to a count of how often it was seen.
- every character seen (in this case an upper or lowercase version, up to you), increments the structure's count by 1
- analyze the map for anything that occurred twice or more

#include <iostream>
#include <iomanip>
#include <string>
#include <vector>
#include <map>

using namespace std;

map<char, int> Freq(const string& s)
{
    map<char, int> m;
    for(auto& c : s)
        m[tolower(c)]+=1;
    return m;
}


int main() 
{
    vector<string> examples = 
    {
        "abcde",
        "aabbcde",
        "aabBcde",
        "indivisibility",
        "Indivisibilities",
        "aA11",
        "ABBA"
    };
    int count;
    for (auto& e : examples)
    {
        auto freq = Freq(e);
        count = 0;
        for (auto& kv : freq)
        {
            if (kv.second > 1)
                count += 1;
        }
        cout << setw(18) << e << " => " << count << endl;
    }

    return 0;
}

Edit & run on cpp.sh

             abcde => 0
           aabbcde => 2
           aabBcde => 2
    indivisibility => 1
  Indivisibilities => 2
              aA11 => 2
              ABBA => 2

May 1, 2018 at 2:30pm

lastchance (6980)

Thank-you for the clarification, @icy1.

Sticking with sets (though I definitely like your use of maps) I came up with the following.
(You'd be amazed by how long it took me to type "indivisibilities" properly!)

#include <iostream>
#include <string>
#include <cstring>
#include <vector>
#include <set>
using namespace std;


int numberOfDuplicates( string str )
{                                                                    
   set<char> S, Sd;
   for ( char c : str )
   {
      if ( !isalnum( c ) ) continue;
      c = tolower( c );
      if ( !S.insert(c).second ) Sd.insert(c);
   }
   return Sd.size();
}


int main()
{
   vector<string> tests = { "abcde",
                            "aabbcde",
                            "aabBcde",
                            "indivisibility",
                            "indivisibilities",
                            "aA11",
                            "ABBA-yeah!" };
   
   for ( string s : tests ) cout << s << ": " << numberOfDuplicates( s ) << '\n';
}

Edit & run on cpp.sh

abcde: 0
aabbcde: 2
aabBcde: 2
indivisibility: 1
indivisibilities: 2
aA11: 2
ABBA-yeah!: 2

Last edited on May 1, 2018 at 2:33pm

May 1, 2018 at 2:49pm

dhayden (5799)

Sticking with sets (though I definitely like your use of maps)

A plain old array would be faster and smaller.

May 1, 2018 at 4:05pm

icy1 (596)

dhayden wrote:
A plain old array would be faster and smaller.

Yep, this is true of course with a little math for the index offsets, though I like the readability and extensibility of map. Can use plain arrays and go straight into the count, discarding the generated data structure:

#include <iostream>
#include <iomanip>
#include <string>
#include <vector>

using namespace std;

// Use plain arrays with index offsets.  
//   Note that '0' is 48, and
//             'Z' is 90
int CountDups(const string& s)
{
    int count = 0;
    int i;
    int arr[43]{};
    for(auto& c : s)
    {
        i = toupper(c)-48;
        arr[i]+=1;
        if (arr[i]==2)
            count += 1;
    }
    return count;
}

int main() 
{
    vector<string> examples = 
    {
        "abcde",
        "aabbcde",
        "aabBcde",
        "indivisibility",
        "Indivisibilities",
        "aA11",
        "ABBA"
    };

    for (auto& e : examples)
        cout << setw(18) << e << " => " << CountDups(e) << endl;

    return 0;
}

Edit & run on cpp.sh

May 1, 2018 at 7:06pm

jonnin (11494)

for duplicates only you don't need any magic in ascii.

char count[256] = {0};
for(... the string)
count[string[index]]++;

and then count tells you what was duplicated.

it wastes a little space, but it skips the magic offsets etc.

the type of count may be more efficient on some machines as default word sized (usually, int) or it may be faster as char (or unsigned char). You can poke around with it to see if your hardware favors one or the other.

this is an adaptation of the 'bucket sort' algorithm which can be used to sort your data or count frequency / duplicates or similar tasks for very limited types of data (chars being a good use case).

is this a fastest wins contest, or cleanest code, or just 'do it' contest?

Last edited on May 1, 2018 at 7:11pm

May 8, 2018 at 4:21pm

adam2016 (1529)

thanks guys :)

Topic archived. No new replies allowed.

C++

Forum

finding duplicates in a string