Grouping Data Ppints

closed account (zbREy60M)
Hello, I am wondering how to "group" together data.

This is my scenario: I have a program that generates many values. I need to be able to graph these with "intensity" vs "value". To be a bit clearer, I need to know how many of the same numbers I am getting.

Let's say I generate 100,000 values, I need to be able to have the values on the X axis, and how many of the same value I have on the Y axis.

For example, if I generate the numbers 2, 3, 2, 4, 5, then I want to be able to get a graph that will show that I have one 3, one 4, one 5, and two 2's. The problem is that I am graphing values that have a very high precision such as 2.7958349, and I will not get the same value twice. So I need a way of separating all the values from say 2.797300 to 2.797400, and adding up how many I have, and then graphing all of my data points.

Sorry if this seems convoluted! It's a bit difficult to explain without being able to draw you guys a picture. Thanks!
So you want a histogram?
I think I can suggest two methods. In both you'll need a function which returns a unique value for any number in every range and the same value for any numbers in the same range. This could be simply int f(double d){ return d*10000; }. Now all numbers in [ 2.7973; 2.7974 ) will return 27973.

method 1. Declare an std::map<int, int> M. Iterate through the array and for every value V in it do M[ f(V) ]++;. Now all the data is paired. For an example of how to read the pairs, see http://www.cplusplus.com/reference/stl/map/begin/

method 2. Sort the array (you can use std::sort). Now iterate through it. Have a counter variable. While f( current element ) == f( previous element ), increment counter. When this comparison fails, you have your pair. Notice that in this algorithm you can start using the points on the first iteration. You don't have to use any intermediate representation. Of course, you can, if you need to.

While the first method is easier to write, if you are going to have many distinct ranges, it might be slow and use too much memory. The second one works in constant memory and the time it takes won't vary as much with different numbers of ranges.
Last edited on
closed account (zbREy60M)
Yes! A histogram is exactly what I need. I think I will try the first method. I clicked on the link that you gave me, but I still don't really get what I need to do. I'm very, very new at this. Here is an example of my code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
long double x = RadialComponents.GetRadius()*cos (AngularComponents.GetAngle()*6.283185307179586476925286766559)*15329.729; 
long double y = RadialComponents.GetRadius()*sin (AngularComponents.GetAngle()*6.283185307179586476925286766559)*15329.729;
long double z = ZedComponents.GetZed()*153.29729;
long double a = AgeComponents.GetAge() * 10000000.0 * 31556926.0; // Random number generated between 0 and 1, then multiplied by 10 millions years, multiplied by the number of seconds within a year
long double p = PeriodComponents.GetPeriod(); //p measured in seconds
long double prefrequency = 2.0 / p;
//WARNING:E is the  NOT mathematical constant e.
//It is the const var of ellipticity
long double Ellipse2=Ellipse*Ellipse;
const long double pi=3.1415926535897932384626433832795,
	pi4=pi*pi*pi*pi,
	G=6.673E-11;
long double beta= (32.0 * pi4 * G * 1E+45) / (2.421E+42);
int value = -4;
long double othervalue = -.25;
long double postfrequency = pow( (pow(prefrequency, value) + beta * Ellipse2 * a), othervalue);

printf("Gravitar # %4f %16f %16f %16f %16f %16f %32f %16f %16f \n" , R+1 ,  x , y , z , a , p , Ellipse , prefrequency , postfrequency );

if (myfile.is_open())
{
//	myfile << " " << x << "  " << y << "  " << z << " " << endl;
//	myfile << " " << a << "  " << p << " " << prefrequency << " " << postfrequency << endl;
	myfile << postfrequency << R+1 << endl;
}
}

myfile.close();

system("PAUSE");
return 0;
}


After all these objects are given a certain age, they "spindown" over time and produce all of those numbers, which are given by the variable postfrequency. How do I put the variable postfrequency in this "map" that you suggest? Thanks for the help, and sorry for the stupidity.
I don't know what your code does, but if you have the things you need in an array, you can do this
1
2
3
4
5
6
7
double array[a_lot];
std::map<int, int> histogram;

for( int i = 0; i < a_lot; i++ ) histogram[ array[i]*10000 ]++;

for( std::map<int,int>::iterator it = histogram.begin(); it != histogram.end(); it++ )
   std::cout << "a value in range [ " << it->first/10000.0 << " ; " << (it->first+1)/10000.0 << ") appeared " << it->second << " times!\n";
Topic archived. No new replies allowed.