ideas for multiple sorting/copying of arrays

Hello,

I'm working on a project which involves manipulating a two-dimensional array. Please bear with me while I explain the purpose of the program. Assume that we have this table:

1
2
3
4
5
6
7
8
9
10
11
SUCCESS   1   9   2
FAILURE   5   9   1
SUCCESS   4   8   4
SUCCESS   0   5   7
SUCCESS   7   8   6
FAILURE   6   9   9
FAILURE   8   0   2
SUCCESS   2   0   1
FAILURE   1   2   1
FAILURE   4   7   5
FAILURE   3   3   5


"SUCCESS" and "FAILURE" are actually represented by numerical values, I just made the table this way in order to explain it better. The actual tables are all consisted of double values and have several thousands of records and about 20-30 columns.

The idea is to find successivevalues of each column that give the highest "success rate". Something like "when column D is between 3 and 5, column C is between 1 and 2, and column B is between 6 and 8, then the "success rate" is 90%.

Of course we also need a minimum number of records that fulfill the above criteria.

Anyway, I'm not looking for code here, just some ideas of how I should implement it. I'm currently thinking about this: Begin by sorting by one column. Eg, let's start by column B:

1
2
3
4
5
6
7
8
9
10
11
SUCCESS   0   5   7
SUCCESS   1   9   2
FAILURE   1   2   1
SUCCESS   2   0   1
FAILURE   3   3   5
SUCCESS   4   8   4
FAILURE   4   7   5
FAILURE   5   9   1
FAILURE   6   9   9
SUCCESS   7   8   6
FAILURE   8   0   2


Then, get all possible combinations of successive records, as long as their total number is above a certain minimum, and copy them to a new array. Eg get all records where column B is 1-2, then all records where column B is 1-3, then 1-4, 1-5, 1-6, 1-7, 1-8, 2-3, 2-4, 2-5, 2-6, 2-7, 2-8, etc.

In those new arrays, count the ratio of "SUCCESS" and if it is above a certain threshold, report it. Then continue with doing the same thing for column C, etc.

Obviously this will be very time consuming. If we have 20 columns and in each column we have 10 "ranges" of values, then we'd have to create about 10 trillion new arrays. I have some ideas about how to reduce the number of iterations, but for the time being I wonder whether my basic idea is efficient or if there is any other way to do the same thing in a faster way.

If you've read to this point, congratulations :D

What you can do is create a sort comparator function for each column. Then sort your whole array according to the first column. Then locate each range of values for which the first column is equal and sort that range according to the second column. Keep doing that for each column.

You can use std::equal_range() to locate the sub-arrays ranges within the whole array:
http://cplusplus.com/reference/algorithm/equal_range/

I hope that made sense, and I think you're going to need recursion for that approach.

EDIT:

Well you probably don't need recursion, but that's what the problem suggests to me. It can be done oherwise.
Last edited on
Thanks, this makes sense because it doesn't involve any array copying. I should have realized that no copying is needed since no data is being modified. It also will improve performance because only ranges are being sorted. And yes, probably recursion is the way to go.

I've been thinking of more ways to reduce iterations. Eg, don't check large ranges of values if the smaller ones don't show any better "success rates". Ie, if in a certain column ranges 3-5 and 6-8 don't have improved "success" rate over the full column, then there is no point in checking the cumulative range 3-8.
Topic archived. No new replies allowed.