c++ special characters

I am trying to solve the below using c++. Any idea how to display special characters using STL and open source libraries?
https://www.chegg.com/homework-help/questions-and-answers/write-c-code-able-analyse-text-block-covers-following-requirements-1-computes-start-positi-q89748284

I don't see how this tasks requires you to "display" special characters ?!

The task requires you to do two things, given a certain input text, which we can assume is given as a string (e.g. std::string):

1. Find all "smileys" (i.e. a colon followed by an optional dash followed by a bracket)

2. Find the top ten words (excluding smileys)


So, you will have to break down the given text (string) into space-delimited tokens. Each token that matches their definition of a "smiley" needs to be treated as such. Any other tokens need to be treated as "regular" words. I'd probably use a std::unordered_map to count the word occurrences: If a "new" word (i.e. a word not already in the map) is encountered, insert it into the map with an initial value of 1. And, if a re-occurring word (i.e. a word that's already in the map) is encountered, then simply increment its value in the map by one.

Also have a look at:
https://www.boost.org/doc/libs/1_36_0/libs/tokenizer/tokenizer.htm

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
#include <string>
#include <unordered_map>
#include <algorithm>

/* case-insensitive key comparison */
struct strcmp_ci
{
    bool operator()(const std::string& a, const std::string& b) const
    {
        return to_lower(a) < to_lower(b);
    }

    static std::string to_lower(const std::string str)
    {
        std::string buffer(str.length(), 0);
        std::transform(str.cbegin(), str.cend(), buffer.begin(), [](unsigned char c) { return std::tolower(c); });
        return buffer;
    }
};

/* map to store word counts */
static std::unordered_map<std::string, std::size_t, std::hash<std::string>, strcmp_ci> g_word_map;
typedef std::unordered_map<std::string, std::size_t>::iterator word_iter;
typedef std::unordered_map<std::string, std::size_t>::const_iterator word_citer;

/* increment count of the given word */
static std::size_t update_word_count(const std::string& word)
{
    word_iter iter = g_word_map.find(word);
    if (iter == g_word_map.end())
    {
        iter = g_word_map.insert(std::pair<std::string, std::size_t>(word, 0U)).first; // <-- does not exist in map yet, so insert!
    }
    return ++iter->second;
}

/* find the most frequent word in map */
static const std::string &get_most_frequent_word(void)
{
    static const std::string EMPTY;
    const std::string *most_frequent_word = NULL;
    std::size_t max = 0U;
    for (word_citer iter = g_word_map.cbegin(); iter != g_word_map.cend(); ++iter)
    {
        if (iter->second > max)
        {
            most_frequent_word = &iter->first;
            max = iter->second;
        }
    }
    return most_frequent_word ? (*most_frequent_word) : EMPTY;
}
Last edited on
into space-delimited tokens. Each token that matches their definition of a "smiley" needs to be treated as such. Any other tokens need to be treated as "regular" words


I don't see anything in the requirements that means a 'smiley' has to be delimited by white-space. Also, a definition of a 'word' is not given (other than delimited by white-space). If a word is simply a sequence of chars delimited by white-space, then "what" and "what?" are two different words. Also what about "what" and "what's" - are these both "what"? Also what about terminating punctuation? Is 'mat' different from 'mat.'? So is "a smiley:-]" one word including the ':-]' or one word 'smiley' together with the smiley ':-]'??

IMO those who set these sort of challenges really, really should be more precise with the requirements!
I'd assume punctuation marks (and other "special" chars) should be treated as separators too.

If a colon character is encountered anywhere, the next two characters can be checked to test for a "smiley" according to their definition.

But yeah, we can only speculate here...
Last edited on
Hi Kigar

Thanks for your valuable inputs here. For UCF Transformation format here, how can I use std::string for better portability. What use cases can we use to validate it?

For example, consider below case.
"hello 🌏"

It does not even characters between quotation marks, it is:
seven grapheme clusters, seven code points, ten bytes and ten code units (which is in UCF Transformation format), sixteen bytes and eight code units (which is in UCF Transformation format).

Found some good presentation here.
https://thephd.dev/_presentations/unicode/CppCon/2019/2019.09.20%20-%20Catching%20%E2%AC%86%EF%B8%8F%20-%20The%20(Baseline)%20Unicode%20Plan%20for%20C++23%20-%20ThePhD%20-%20CppCon%202019.pdf

Below is sample test code which I tried out.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <algorithm>
#include <iostream>
#include <string>
#include <vector>

auto main() -> int {
 std::vector<std::string> specialchars{"😀", "🔍", "🦑", "😁"};
 std::sort(specialchars.begin(), specialchars.end());
 for(const auto &f : specialchars) {
 std::cout << f << '\n';
 }
}


1
2
typedef std::unordered_map<std::string, std::size_t>::iterator word_iter;
typedef std::unordered_map<std::string, std::size_t>::const_iterator word_citer;


So retro C++98...
As a starter to compute the top 10 used words. This assumes that each word is white-space delimited and will remove leading and trailing non-alpha chars and also any chars following a ' (eg what's becomes what) and makes all words lower-case. Also any smileys found within a word will be removed (any at the start or end will also be removed as a smiley is non-alpha chars). Any other non-alpha chars (except ') within a word are kept.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#include <string_view>
#include <string>
#include <fstream>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <map>
#include <functional>
#include <utility>
#include <array>
#include <cctype>

class Word {
public:
	Word(const std::string& s_) : ss(s_) {}

	std::string get() const {
		std::string w;
		auto ret { w.begin() };
		auto ret1 { w.rbegin() };

		do {
			if (!(ss >> w))
				return {};

			ret1 = std::find_if(w.rbegin(), w.rend(), [](unsigned char c) {return std::isalpha(c); });
			ret = std::find_if(w.begin(), ret1.base(), [](unsigned char c) {return std::isalpha(c); });
		} while (ret == w.end() || ret1 == w.rend());

		auto last { ret1.base() };

		if (const auto ret2 { std::find(ret, last, '\'') }; ret2 != last)
			last = ret2;

		std::string word { ret, last };

		for (const auto& s : smileys)
			for (auto f { word.find(s) }; f != std::string::npos; f = word.find(s))
				word.erase(f, s.size());

		std::transform(word.begin(), word.end(), word.begin(), [](unsigned char c) { return static_cast<char>(std::tolower(c)); });

		return word;
	}

private:
	const std::array<std::string_view, 12> smileys { ":(", ":)", ":[", ":]", ":{", ":}", ":-(", ":-)", ":-[", ":-]", ":-{", ":-}" };
	mutable std::istringstream ss;
};

int main() {
	constexpr size_t to_display { 10 };

	if (std::ifstream ifs { "text.txt" }) {
		const std::string text { std::istreambuf_iterator<char>(ifs), std::istreambuf_iterator<char>() };
		std::map<std::string, size_t> wcnt;

		for (auto [wrd, w] {std::pair { Word { text }, std::string{} }}; !(w = wrd.get()).empty(); )
			++wcnt[w];

		std::multimap<size_t, std::string, std::greater<size_t>> topcnt;

		for (const auto& [wrd, cnt] : wcnt)
			topcnt.emplace(cnt, wrd);

		for (size_t i {}; const auto & [cnt, wrd] : topcnt)
			if (i++ < to_display)
				std::cout << wrd << "  " << cnt << '\n';
			else
				break;
	} else
		std::cout << "Cannot open file\n";
}

Last edited on
Hi seeplus

Can you please tell me contents of your text.txt? How can I provide unicode values and verify?
For example I want to use something like the one below test data and verify.

# group: Smileys & Emotion

# subgroup: face-smiling
1F600 # 😀 E1.0 grinning face
1F603 # 😃 E0.6 grinning face with big eyes
1F604 # 😄 E0.6 grinning face with smiling eyes
1F601 # 😁 E0.6 beaming face with smiling eyes
1F606 # 😆 E0.6 grinning squinting face
1F605 # 😅 E0.6 grinning face with sweat
1F923 # 🤣 E3.0 rolling on the floor laughing
1F602 # 😂 E0.6 face with tears of joy
1F642 # 🙂 E1.0 slightly smiling face
1F643 # 🙃 E1.0 upside-down face
1F609 # 😉 E0.6 winking face
1F60A # 😊 E0.6 smiling face with smiling eyes
1F607 # 😇 E1.0 smiling face with halo
Last edited on
Is it something like this way?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
#include<iostream>
#include <map>
#include <string>

using namespace std;

namespace specialchar {

    static std::map<std::string, std::string> SPECCHAR = {
        {":grinning face:" , u8"\U0001F600"},
        {":grinning face with big eyes:" , u8"\U0001F603"},
        {":grinning face with smiling eyes:" , u8"\U0001F604"},
        {":beaming face with smiling eyes:" , u8"\U0001F601"},
        {":grinning squinting face:" , u8"\U0001F606"},
        {":grinning face with sweat:" , u8"\U0001F605"},
        {":rolling on the floor laughing:" , u8"\U0001F923"},
        {":face with tears of joy:" , u8"\U0001F602"},
        {":slightly smiling face:" , u8"\U0001F642"},
        {":upside-down face:" , u8"\U0001F643"},
        {":winking face:" , u8"\U0001F609 "},
        {":smiling face with smiling eyes:" , u8"\U0001F60A "},
        {":smiling face with halo:" , u8"\U0001F607"}
    }; 
    
    std::string specialch(std::string s, bool esc=true) {
        int val = -1;
        int len = s.size();
        for (int i = 0; i < len; i++) {
            if (s[i] == *L":") {
                
                if(esc && i!=0 && s[i-1]=='\\')
                    continue;
                if (val == -1) {
                    val = i;
                }
                else {
                    if (i - val ==1) {
                        val = i;
                        continue;
                    }
                    std::map<std::string, std::string>::iterator it;
                    it = SPECCHAR.find(s.substr(val, i - val + 1));
                    if (it == SPECCHAR.end()) {
                        val = i;
                        continue;
                    }
                    std::string spe = it->second;
                    std::cout << s.substr(val, i - val + 1) << std::endl; 
                    s.replace(val, i - val + 1 , spe);
                    int return = i - val + 1 - spe.size();
                    len -= return;
                    i -= return;
                    val = -1;
                }
            }
        }
        return s;
    }
}


int main() {
    std::cout << specialchar::specialch("\n\n\n\nHappy for c++ :+1:") << std::endl;
    return 0;
}

//Got below errors, I am not sure what went wrong here.

main.cpp:9:47: error: no matching constructor for initialization of 'std::map<std::string, std::string>' (aka 'map<basic_string<char, char_traits<char>, allocator<char>>, basic_string<char, char_traits<char>, allocator<char>>>')
    static std::map<std::string, std::string> SPECCHAR = {
                                              ^          ~
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1052:9: note: candidate constructor template not viable: requires 4 arguments, but 13 were provided
        map(_InputIterator __f, _InputIterator __l,
        ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1043:9: note: candidate constructor template not viable: requires at most 3 arguments, but 13 were provided
        map(_InputIterator __f, _InputIterator __l,
        ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1062:5: note: candidate constructor template not viable: requires 3 arguments, but 13 were provided
    map(_InputIterator __f, _InputIterator __l, const allocator_type& __a)
    ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1116:5: note: candidate constructor not viable: requires 3 arguments, but 13 were provided
    map(initializer_list<value_type> __il, const key_compare& __comp, const allocator_type& __a)
    ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1038:14: note: candidate constructor not viable: requires 2 arguments, but 13 were provided
    explicit map(const key_compare& __comp, const allocator_type& __a)
             ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1098:5: note: candidate constructor not viable: requires 2 arguments, but 13 were provided
    map(map&& __m, const allocator_type& __a);
    ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1109:5: note: candidate constructor not viable: requires at most 2 arguments, but 13 were provided
    map(initializer_list<value_type> __il, const key_compare& __comp = key_compare())
    ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1124:5: note: candidate constructor not viable: requires 2 arguments, but 13 were provided
    map(initializer_list<value_type> __il, const allocator_type& __a)
    ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1144:5: note: candidate constructor not viable: requires 2 arguments, but 13 were provided
    map(const map& __m, const allocator_type& __a)
    ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1031:14: note: candidate constructor not viable: requires single argument '__comp', but 13 arguments were provided
    explicit map(const key_compare& __comp)
             ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1067:5: note: candidate constructor not viable: requires single argument '__m', but 13 arguments were provided
    map(const map& __m)
    ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1092:5: note: candidate constructor not viable: requires single argument '__m', but 13 arguments were provided
    map(map&& __m)
    ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1138:14: note: candidate constructor not viable: requires single argument '__a', but 13 arguments were provided
    explicit map(const allocator_type& __a)
             ^
/root/emsdk/upstream/emscripten/cache/sysroot/include/c++/v1/map:1023:5: note: candidate constructor not viable: requires 0 arguments, but 13 were provided
    map()
    ^
main.cpp:50:25: error: expected unqualified-id
                    int return = i - val + 1 - spe.size();
                        ^
main.cpp:51:28: error: expected expression
                    len -= return;
                           ^
main.cpp:52:26: error: expected expression
                    i -= return;
                         ^
4 errors generated.
Last edited on
This compiles as C++20 with VS:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#include <iostream>
#include <map>
#include <string>

namespace specialchar {
	static const std::map<std::string, std::u8string> SPECCHAR  {
		{":grinning face:" , u8"\U0001F600"},
		{":grinning face with big eyes:" , u8"\U0001F603"},
		{":grinning face with smiling eyes:" , u8"\U0001F604"},
		{":beaming face with smiling eyes:" , u8"\U0001F601"},
		{":grinning squinting face:" , u8"\U0001F606"},
		{":grinning face with sweat:" , u8"\U0001F605"},
		{":rolling on the floor laughing:" , u8"\U0001F923"},
		{":face with tears of joy:" , u8"\U0001F602"},
		{":slightly smiling face:" , u8"\U0001F642"},
		{":upside-down face:" , u8"\U0001F643"},
		{":winking face:" , u8"\U0001F609 "},
		{":smiling face with smiling eyes:" , u8"\U0001F60A "},
		{":smiling face with halo:" , u8"\U0001F607"}
	};

	std::string specialch(const std::string& s, bool esc = true) {
		int val { -1 };
		auto len { s.size() };

		for (size_t i {}; i < len; ++i) {
			if (s[i] == ':') {
				if (esc && i && s[i - 1] == '\\')
					continue;

				if (val == -1)
					val = static_cast<int>(i);
				else {
					if (i - val == 1) {
						val = static_cast<int>(i);
						continue;
					}

					const auto it { SPECCHAR.find(s.substr(val, i - val + 1)) };

					if (it == SPECCHAR.end()) {
						val = static_cast<int>(i);
						continue;
					}

					const auto spe { it->second };

					std::cout << s.substr(val, i - val + 1) << '\n';

					// ??? What's this supposed to do???
					//s.replace(val, i - val + 1, spe);

					const auto r { i - val + 1 - spe.size() };

					len -= r;
					i -= r;
					val = -1;
				}
			}
		}

		return s;
	}
}

int main() {
	std::cout << specialchar::specialch("\n\n\n\nHappy for c++ :+1:") << '\n';
}


but what are you trying to do with L52-59 ?? What is the function overall supposed to do?
Can you please tell me contents of your text.txt



Fou:(r score and seven ye:{ars ago our fathers brought forth on this cont:-{inent, a:-{ new nation, conceived in Liberty,
and dedicated to the proposition th:-{a:-{t all !!men!! are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated,
can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field,
as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting
and proper that we should do this.

But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground.
The? brave men, living and dead, who struggled here's, have consecrated it, far above our poor power to
add or detract. The world will little note, nor long remember what we say here, but it can never forget
what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which
they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great
task remaining before us -- that from these honored dead we take increased devotion to that cause for which
they gave ?the? last full measure of devotion -- that we here's highly resolve that these dead shall not have
died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government
of the people, by the people, for the people, shall not perish from the earth.


displays:


that  13
the  11
we  10
here  8
to  8
a  7
and  6
can  5
for  5
have  5

Hi seeplus

Thanks for sharing the content.
In the above function, I am just trying to check if colon is escaped or not, then check string to see if any text contents got replaced or not; if so then need to identify what text is replaced here. I am not sure if my approach is correct or not. Anyhow, punctuation can be skipped here, but we might need to count its position as well here.

Last edited on
For info, this is an update to my code above which will also split words delimited by specified chars - rather than just white-space. Also it will remove all non-alpha chars within a word.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
#include <string_view>
#include <string>
#include <fstream>
#include <iostream>
#include <algorithm>
#include <sstream>
#include <map>
#include <functional>
#include <utility>
#include <array>
#include <queue>
#include <cctype>

class Word {
public:
	Word(const std::string& s_) : ss(s_) {}

	std::string getw() const {
		std::string w;
		typename std::string::iterator ret {};
		typename std::string::reverse_iterator ret1 {};

		do {
			if (w = get(); w.empty())
				return {};

			ret1 = std::find_if(w.rbegin(), w.rend(), [](unsigned char c) {return std::isalpha(c); });
			ret = std::find_if(w.begin(), ret1.base(), [](unsigned char c) {return std::isalpha(c); });
		} while (ret == w.end() || ret1 == w.rend());

		std::string word { ret, ret1.base()};

		for (const auto& s : smileys)
			for (auto f { word.find(s) }; f != std::string::npos; f = word.find(s))
				word.erase(f, s.size());

		if (const auto fnd { word.find_first_of(delims) }; fnd != std::string::npos) {
			words.push(word.substr(fnd + 1));
			word.assign(word, 0, fnd);
		}

		if (const auto ret2 { std::find(word.begin(), word.end(), '\'') }; ret2 != word.end())
			word.assign(word.begin(), ret2);

		std::erase_if(word, [](unsigned char ch) {return !std::isalpha(ch); });
		std::transform(word.begin(), word.end(), word.begin(), [](unsigned char c) { return static_cast<char>(std::tolower(c)); });

		return word;
	}

private:
	const std::array<std::string_view, 12> smileys { ":(", ":)", ":[", ":]", ":{", ":}", ":-(", ":-)", ":-[", ":-]", ":-{", ":-}" };
	const char* const delims { ".?!-" };
	mutable std::istringstream ss;
	mutable std::queue<std::string> words;

	std::string get() const {
		std::string w;

		if (!words.empty()) {
			w = words.front();
			words.pop();
			return w;
		}

		return (ss >> w) ? w : "";
	}
};

int main() {
	constexpr size_t to_display { 10 };

	if (std::ifstream ifs { "text.txt" }) {
		//const std::string text { std::istreambuf_iterator<char>(ifs), std::istreambuf_iterator<char>() };

		const std::string text { "the!..c:(at!..s&at..!on.!.the's..m:]a:]t.." };
		std::map<std::string, size_t> wcnt;

		for (auto [wrd, w] {std::pair { Word { text }, std::string{} }}; !(w = wrd.getw()).empty(); )
			++wcnt[w];

		std::multimap<size_t, std::string, std::greater<size_t>> topcnt;

		for (const auto& [wrd, cnt] : wcnt)
			topcnt.emplace(cnt, wrd);

		for (size_t i {}; const auto & [cnt, wrd] : topcnt)
			if (i++ < to_display)
				std::cout << wrd << "  " << cnt << '\n';
			else
				break;
	} else
		std::cout << "Cannot open file\n";
}



the  2
cat  1
mat  1
on  1
sat  1

Hi seeplus

Thanks for the update. Do you mean to say that this will handle cases where there is multiple white-space (here I assume that those white-space refers to spaces, tabs and newlines characters)? For the same text file, it now shows only five words at the top rather than ten words which was at top earlier.
Last edited on
Note L74 and L76! The above version just uses a string constant for testing (L76) and doesn't read the file (L74).
Ok, got it. I see that you had hard coded, rather than reading from the file.
Thanks everyone for your inputs,this thread can be closed.
Topic archived. No new replies allowed.