smoothen vector<string_view>

I've split my string into vector<string_view> and now I'd like to get back the original string (or a list of substrings if elements were deleted). my question is, does string_view::substr allow for increasing size according to the standard? any suggestions for using algorithms or ranges/view in smoothen()? (I'm aware splitting could be done with split_view too.) how about using a view_interface instead of vector, would I still be able to distinguish the individual words after join_view, or would it just become a complicated kind of string?
here's my test-code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <vector>
#include <string_view>
#include <cassert>

using namespace std::string_view_literals;

std::vector<std::string_view>& smoothen(std::vector<std::string_view>& v){
    std::vector<bool> del{false};
    for(auto prev=v.begin();auto& e:v) if(&e!=&*prev) {
        if(prev->end()==e.begin()){
            *prev=prev->substr(0,prev->size()+e.size());
            del.push_back(true);
        } else del.push_back(false);
        ++prev;
    }
    size_t j{0};
    for(size_t i{0};i<v.size();++i) if(!del[i]){
        if(i!=j) v[j]=v[i];
        ++j;
    }
    v.resize(j);
    return v;
}
int main(...){
    constexpr auto s="this is my string"sv;
    std::vector<std::string_view> v;
    size_t p=0;
    for(auto n=s.find(' ')+1;n!=std::string_view::npos;n=s.find(' ',n)+1){
        v.push_back(s.substr(p,n-p));
        p=n;
    }
    v.push_back(s.substr(p,s.size()-p));
    assert(smoothen(v).front()==s);
}
Last edited on
does string_view::substr allow for increasing size according to the standard?


I'm not sure I'm understanding what is meant by this.

std::string_view is a read-only view referencing the underlying data. You can have multiple views of the same underlying data - but you can't change the underlying data via string_view. Neither can you concatenate etc data to a string_view.

Perhaps you could provide some more detail.
Last edited on
the "more detail" is in my example program!
my question is if c++20 standard allows for what I'm doing in line 11 or if an exception must be thrown in that line instead or if it's undefined behaviour.
Do you mean like this test code:

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <string_view>
#include <iostream>
using namespace std;

int main()
{
	const auto sv {"abcdefghijklmn"sv};

	const auto s1 {sv.substr(1, 4)};
	const auto s2 {s1.substr(0, 7)};

	cout << sv << '\n' << s1 << '\n' << s2 << '\n';
}


where s2 'increases' the view from s1 but within the scope of the underlying data??

Well this displays:


abcdefghijklmn
bcde
bcde


Showing that the view s2 can't extend past the underlying view s1.

However, consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#include <string_view>
#include <iostream>
using namespace std;

int main()
{
	const auto sv {"abcdefghijklmn"sv};

	const auto s1 {sv.substr(1, 4)};
	const auto s2 {s1.substr(0, 7)};
	const auto s3 {string_view(s1.begin(), s1.begin() + 7)};

	cout << sv << '\n' << s1 << '\n' << s2 << '\n' << s3 << '\n';
}


which displays:


abcdefghijklmn
bcde
bcde
bcdefgh


where s3 is defined from a constructor.

This is undefined if the specified range is not valid for the underlying data.
Last edited on
many thanks. solves the mystery why it doesn't work. en.cppreference.com didn't mention anything about that behaviour of string_view::substr() and string_view::size().
Last edited on
> Showing that the view s2 can't extend past the underlying view s1.

With an appropriate constructor, it can (if the underlying range of characters is a valid range).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <string_view>
#include <iostream>

int main()
{
    using namespace std::string_view_literals;
	
    constexpr auto sv {"abcdefghijklmn"sv}; // abcdefghijklmn

    constexpr auto s1 {sv.substr(1, 4)}; // bcde
	
    // const auto s2 {s1.substr(0, 7)};
    constexpr std::string_view s2( s1.data(), 7 ) ; // bcdefgh
    
    constexpr std::string_view s3( s1.data()-1, 8 ) ; // abcdefgh
    
    
    std::cout << sv << '\n' << s1 << '\n' << s2 << '\n' << s3 << '\n' ;
}

http://coliru.stacked-crooked.com/a/0a37a656f686bbcb
all that is nifty and stuff, but the actual problem can be solved by
string s1;
... whatever, get data into s1
string s2{s1};
... code to mess up s1
....
s2 is still the original string.
if you deleted something, you can either apply the same to s2 (if possible, and it should be) or you end up back trying to do the above complexity (seems best to avoid, in this specific use case?)
Last edited on
It seems to me that OP's problem would be better solved with just the original string and a vector of offsets/pointers into it. The point of a string_view is to be a read-only slice of an object. If you're talking about shifting or splicing a string_view, it's probably not what you want. The code that uses the string_view should not assume that there's any more memory to extend into.
> the actual problem can be solved by ...

Yes, one can always make a deep copy of a string (or construct a string from a C-style NTBS) instead of creating a view; that was how it used to be normally done before 2017.

What's smoothen() supposed to do?
Its not the views, its the manipulation then trying to go back to the original that I was saying can be solved via a copy. The view part is fine, views don't cost much and are a wonderful addition.
Well the first issue is that the split doesn't work! It goes into an infinite loop.

 
n=s.find(' ',n)+1


when find is not found, npos + 1 is 0 - so the condition test fails so the loop never exits. Try:

1
2
3
4
for (auto n = s.find(' '); n != std::string_view::npos; p = n + 1, n = s.find(' ', p))
	v.push_back(s.substr(p, n - p));

v.push_back(s.substr(p, s.size() - p));

Do you want something like this?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <vector>
#include <string_view>
#include <iostream>

using namespace std::string_view_literals;

std::vector<std::string_view> smoothen(const std::vector<std::string_view>& v) {
	if (v.empty())
		return {};

	std::vector<std::string_view> sv {v.front()};

	for (auto itr = v.begin() + 1; itr != v.end(); ++itr)
		if (sv.back().data() + sv.back().size() == itr->data())
			sv.back() = std::string_view(sv.back().data(), itr->data() + itr->size());
		else
			sv.push_back(*itr);

	return sv;
}

std::vector<std::string_view> split(std::string_view s, bool incspace)
{
	std::vector<std::string_view> v1;
	size_t p = 0;

	for (auto n = s.find(' '); n != std::string_view::npos; p = n + 1, n = s.find(' ', p))
		v1.push_back(s.substr(p, n - p + incspace));

	v1.push_back(s.substr(p, s.size() - p));

	return v1;
}

int main() {
	constexpr auto s = "this is my string"sv;

	const auto v1 {split(s, false)};
	const auto v2 {split(s, true)};

	const auto sm1 {smoothen(v1)};
	const auto sm2 {smoothen(v2)};

	for (const auto& s1 : v1)
		std::cout << "!" << s1 << "!" << '\n';

	std::cout << '\n';

	for (const auto& s1 : sm1)
		std::cout << "!" << s1 << "!" << '\n';

	std::cout << '\n';

	for (const auto& s1 : v2)
		std::cout << "!" << s1 << "!" << '\n';

	std::cout << '\n';

	for (const auto& s2 : sm2)
		std::cout << "!" << s2 << "!" << '\n';
}



!this!
!is!
!my!
!string!

!this!
!is!
!my!
!string!

!this !
!is !
!my !
!string!

!this is my string!


which 'merges' adjacent string_views if the beginning of the next follows from the ending of the current.

In this case, if the split doesn't include the ' ', then next doesn't follow from the current and in this case no joins are done and the result is the same.

if the split does include the ' ', then the next does follow from the current and in this case the next and current are merged into one.
Last edited on
thanks, that's exactly the clearheadedness I were looking for. hope you're aware that when giving split() the parameter "false" then the same can be achieved by c++20 stl functions in more readable way.

as for explanation of what I'm trying to achieve: I discovered the library "dtl" which creates a diff between 2 vectors. so when I feed it vectors of string_view where each element contains either a whole word (or maybe alphanumeric sequence) or a single character, it would tell me at which characters or words the two strings differ, just like some fancy diff-visualizer widgets do. so I can program a widget to visualize the differences with colours or further analyze why the words differ (when analyzing plagiarism or spam). all I have to do is to smoothen the output whenever it's continuous, to save on space...
hope you're aware that when giving split() the parameter "false" then the same can be achieved by c++20 stl functions in more readable way.


I just 'fixed' the original code and did a quick mod to get what I wanted to test smoothen().

What's your version of split() ?
Last edited on
Topic archived. No new replies allowed.