So for a course I am taking I am have a question which asks me to do the following.
Given: A two column txt file, with in the first column being strings and the second column being integers,open/read text file into an array and find the top x(50) integers in the second column with its corresponding string.
Note:Txt file have 999 enterties in each column
Condition: Can not sort arrays.
So far: I have successful read the text file and stored each column into an individual array of its corresponding data type. So let first column array be X and second column array be Y.
So I know how I would find the top int value in the array, but my problem exist were how I would obtain the top 50.
What I have considered was to find the max num in the array. Store the int in a top50i int array and the corresponding string in a top50s string array, then assign the int value stored to zero.
Then when I go to find the max value again the previous max value will no longer be there and I will obtain the second max value store assign zero and so on.
Note: I will be using these values later on, so storing them in a separate array I think would be wise.
This problem is a good example of looking for ways to simplify the problem.
A sorted list of numbers would make this problem easy.
1,6,8,92,145,678,3456
It should be obvious that the top numbers are on the right side of this list.
But you do mention that sorting is not allowed.
So your only option is too scan through the list N times and each time taking the maximum, but ignoring values if they have already been encountered. Which is the solution that you wrote in your post. However, what if these integers are negative numbers? Assigning 0 would create a new maximum. So instead assigning a really large negative number would be better, but there is an even better solution than that. You can make an additional bool array that keeps track of what numbers are in use from the original array.
#include <iostream>
#include <vector>
int main() {
int unsorted[] = {5,7,12,-5,-4,3,2,1,56,890,-1043};
std::vector<bool> tracker(11, false);
constint n = 5; //we want the top n largest numbers
std::vector<int> topN;
for(int i = 0; i < n; i++) {
//an integer needs to store the maximum in the list.
//but it cannot be one that has already been marked
//as true in tracker.
//so first we will loop until the first un-marked number.
int unmarked_index = 0;
for(; unmarked_index < tracker.size(); unmarked_index++) {
if(!tracker[unmarked_index]) {
//we found one that has not been visited yet
break;
}
}
//the max variable will hold the maximum after the next
//loop finishes. first we must initialize it
//and the loop we just came from gives us the
//index to initialize with.
//also, we must keep track of where the max is found
//so that it can be marked in tracker after the loop
int max = unsorted[unmarked_index];
int max_index = unmarked_index;
for(int j = unmarked_index+1; j < tracker.size(); j++) {
if(!tracker[j] && unsorted[j] > max) {
//if j is not marked and j is > max
max = unsorted[j];
max_index = j;
}
}
//time to mark max as used
tracker[max_index] = true;
//and add max to topN
topN.push_back(max);
}
for(int i = 0; i < topN.size(); i++) {
std::cout << topN[i];
if(i != topN.size() - 1) std::cout << ", ";
}
std::cout << std::endl;
return 0;
}
Also say if you were to have x values being equal to the max.
How would I go about storing the each value x with it corresponding string
in an array of 50 elements.?
#include <iostream>
#include <algorithm>
#include <set>
struct myclass {
constint * Arr;
myclass( constint * arr )
: Arr( arr ) {}
booloperator() ( int lhs, int rhs ) {
return Arr[lhs] < Arr[rhs];
}
};
int main () {
constexpr size_t TopN = 4;
int myints[] = {3,7,2,5,9,4,7};
std::set<int> index( {0,1,2,3,4,5,6}); // indices of the elements of myints
std::set<int> tops;
while ( tops.size() < TopN ) {
auto it = std::max_element( std::begin(index), std::end(index), myclass( myints ) );
// move index of max element from 'index' to 'tops'
tops.insert( *it );
index.erase( it );
}
// show TopN values
for ( auto x : tops ) {
std::cout << x << ' ' << myints[x] << '\n';
}
return 0;
}
How would I go about storing the each value x with it corresponding string
Create a struct that has string and int members, or use std::pair<std:.string,int>
Yet another approach. Finds the top N values in an array of all positive integers.
It's done in a single pass through the array, so this could be done while reading the file.
#include<iostream>
#include<vector>
#include <cstdlib>
int main()
{
constint aSz = 50;
int a[aSz];
for(auto& x: a) x = rand()%99 + 1;// fill array with 50 random positive ints
for(auto x: a){ std::cout << x << ' '; }
std::cout << '\n';// show the array values
int N;// the number of highest values to find
std::cout << "How many largest values? (1 to " << aSz << "): ";
std::cin >> N;
// the maxs': mx[0] = highest max, mx[N-1] = lowest max
std::vector<int> mx( N, 0 );// Initialize all = 0
for(int i=0; i<aSz; ++i)// for each element in a
{
for(int j=0; j<N; ++j)// for each max starting with the highest
{
if( a[i] > mx[j] )// 1st one we hit only (hence the break below)
{
for(int k=N-1; k>j; --k)// copy them downward from the highest max
mx[k] = mx[k-1];
mx[j] = a[i];// the one we hit
break;
}
}
}
std::cout << "\nThe " << N << " highest values are:\n";
for(auto x: mx) std::cout << x << ' '; std::cout << '\n';
return 0;
}
@Cubbi. That's cool. Can it work as values are read one by one from a file? Or, is it necessary to read and store all 999 values from the file into an array, then process this array?
My method is meant to find the N highest values as they are found while reading a file, or while taking user input.
Generating values on the fly instead:
#include<iostream>
#include<vector>
#include <cstdlib> //for rand and srand
int main()
{
constint aSz = 50;
int N;// the number of highest values to find
std::cout << "How many largest values? (1 to " << aSz << "): ";
std::cin >> N;
// the maxs': mx[0] = highest max, mx[N-1] = lowest max
std::vector<int> mx( N, 0 );// Initialize all = 0
for(int i=0; i<aSz; ++i)// for each element in a
{
int value = rand()%99 + 1;// generate values on the fly
std::cout << value << ' ';
for(int j=0; j<N; ++j)// for each max starting with the highest
{
if( value > mx[j] )// 1st one we hit only (hence the break below)
{
for(int k=N-1; k>j; --k)// copy them downward from the highest max
mx[k] = mx[k-1];
mx[j] = value;// the one we hit
break;
}
}
}
std::cout << "\nThe " << N << " highest values are:\n";
for(auto x: mx) std::cout << x << ' '; std::cout << '\n';
return 0;
}
EDIT: Explaining method.
I wrote that code in response to a thread where the problem was to find the two highest values in an array (which is why an array was involved).
The code is just a generalization of:
#include <fstream>
#include <string>
#include <vector>
using std::string;
using std::ifstream;
using std::ofstream;
using std::vector;
using std::move;
using std::endl;
// This struct keep our pairs
struct S {
string s;
int x;
//this is used for indirect sorting resulting array by descending
booloperator<(const S& param) const
{
return x > param.x;
};
};
int main()
{
ifstream in("input.txt");
ofstream out("output.txt");
vector<S> v;
for (;;) {
S add;
in >> add.s >> add.x;
if (!in) break;
vector<S>::iterator pos = lower_bound(v.begin(), v.end(), add);
v.insert(pos, add);
if (v.size() > 50) v.resize(50);
}
for (S s : v)
out << s.s << ' ' << s.x << endl;
}
Solution 2. No sorting. Many scannings of input file.
#include <fstream>
#include <string>
using std::ifstream;
using std::ofstream;
using std::string;
using std::endl;
int main()
{
ifstream in("input.txt");
ofstream out("output.txt");
int cnt = 0;
int oldmax = 0;
while (cnt < 50) {
int max = 0;
for (;;) {
string foo;
int curval;
in >> foo >> curval;
if (!in) break;
if (max < curval && (curval < oldmax || !oldmax)) max = curval;
}
if (!max) break;
in.clear();
in.seekg(0, in.beg);
for (;;) {
string foo;
int curval;
in >> foo >> curval;
if (!in) break;
if (max != curval) continue;
out << foo << ' ' << curval << endl;
if (++cnt >= 50) break;
}
oldmax = max;
in.clear();
in.seekg(0, in.beg);
}
}
One scan, no sort. Requires a container "foo" to store/refer the TopN values.
Scanning has to happen in two parts. The first TopN input values are simply added to the foo.
The second part scans the remaining input. Each new value is compared against the then smallest element of foo and that element is updated if the new value is larger.
1 2 3 4
while ( in >> value ) {
auto it = std::min_element( std::begin(foo), std::end(foo), comp );
if ( comp( *it, value ) ) *it = value;
}
The 'comp' is a predicate functor.
Note: the Top3 of {2,2,1,2,2,3} are {3,2,2} so only half of the elements with value 2 are included in Top3. The various methods presented in this thread will select different elements from those four candidates.
Algorithms to do this are pretty cool. There may be others but I remember learning one that's very similar to quicksort, except that the goal is to find a pivot point at the n'th element.
The OP wanted a solution that did not use sorting.
Although I find this to be odd because essentially this is sorting the way it has been done so far.
Partitioning is a better more efficient way to go. I saw something somewhere that split a trillion elements let me find that.
Edit: I found the link finally http://matpalm.com/median/algorithm.html
Basically you need to find the kth order statistic where k is the length of the list - N + 1
And everything to the right of that is part of topN. Including the kth order statistic also.
@dhayden: I had not seen your post. I was merely commenting that most of the solutions including mine were giving sorted output when actually this is a partitioning problem that does not require output to be sorted.