importing ascii data file for analysis

Ok this may be a tad long but I am trying to explain what I am looking to do clearly. If anyone can help or point me in the right direction i would greatly appreciate it! I have a good start on the program and I am at the point I need to load the data.....

basically I will have multiple data files contained on my HD that I need to pick some info out of, for example files 01-02 shown below....(actual files have hundreds of rows)

File01.txt (contents below)
<name>,<date>,<vol>,<cost>
AAC,20090102,8.31,34.5
AAU,20090102,0.68,36.4

File02.txt (contents below)
<name>,<date>,<vol>,<cost>
AAC,20090103,8.31,65.2
AAU,20090103,0.68,34.4

**where date is in form yearmonthday**
**files contained on HD will be from file01.txt through filexx.txt, number of files will continually grow so I just need to load however many files are available***

what Im looking to do..

-First off I would like the user to select a 'name' (each file will have the name listed once in each file), for example let the user select the name AAU

-I would then like to.....

-Load files 01-XX (for whatever number of files exist on HD, maybe only 65 files exist, therefore load 01-65)

-Then I need to select out all the DATA contained in the ROW labeled "AAU" for each of the files (65 in this case)

-Then print this data in columns or an array!? (in this case, the name, date, vol, and cost)

-Then have this data which is now loaded readily available to analyze using equations and such....

It has been a few years since Ive done this programming so I am a bit rusty, any help would be awesome I know this is probably quite easy for many of you.


Thanks!
Last edited on
For loading all the files, you *could* just open a vector of fstreams (or whatever you want) with the no create parameter, and loop through 01-99 until the function fails. Then, try looping through the files/grabbing the data into another vector and then parsing it using the std::string functions. Btw, this would probably be slow/memory hogging depending on how many files you have and how big they are...I don't really know what you are doing but could you just put all the stuff in one file or something?
Would you happen to know of any examples of what you just described?! sorry I guess that is a bit over my head at the moment.

As far as what I am doing, the data sheets are data from the NY stock exchange. I checked and any given one of the files are around 1500 lines long. I add a new sheet for each day, therefore I may want to run around 30 of these files (but only selecting out one row from each of the pages). I am ok if it takes a long time to process, I just need to make sure to do it in a way that doesn't max out the ram of course.
Would a simple grep be of any use?

> grep ^AAU File*.txt

Search through the files with a key of AAU would be costly because of the choice of how the files are stored. Is it possible to change the format or must you analyze them as is?
hmm i have not heard of this grep, I will look into that.

The other option for file format I can obtain is a excel (.csv)
which imports it into excel nice and clean, but I think it looks almost identical in a text editor.

I suppose I could work on the "single" select part of this later where i am seeking out only one particular name,in the mean time would it be easier to just have it load all data!?

if thats the case and these are my files ....
File01.txt (contents below)

<name>,<date>,<vol>,<cost>
AAC,20090102,8.31,34.5
AAU,20090102,0.68,36.4

File02.txt (contents below)

<name>,<date>,<vol>,<cost>
AAC,20090103,8.31,65.2
AAU,20090103,0.68,34.4

(((file03.txt , file04.txt.........., fileXX.txt))) and so on

...I need to load all files and place the data in arrays as follows.

So starting at the top of each file is AAC, i would need 3 arrays.

array_1[] = {date from file01,date from file 02,.........,date from file XX}
array_2[] = {vol from file01,vol from file 02,.........,vol from file XX}
array_3[] = {cost from file01,cost from file 02,.........,cost from file XX}


then it would move on to the next row of each file ( in this case AAU)

array_1[] = {date from file01,date from file 02,.........,date from file XX}
array_2[] = {vol from file01,vol from file 02,.........,vol from file XX}
array_3[] = {cost from file01,cost from file 02,.........,cost from file XX}


...this would need to work for files 01-XX (# of files variable), as well as continue down each file (over 1000 lines) until it ended.


**perhaps I should note that I think it is OK if the arrays are overwritten each time, because I plan on doing some calculations on the data that is in the arrays inbetween each step. the values I obtain from these calculations will be used to rank that particular row.

btw,..to what extent would i be limited if i didn't want to overwrite these arrays? lets say I wanted 3 arrays per 'name/row' each with 100 cells or so, and had 1000 or so names/rows.) Is this just ridiculous or is it possible?

sorry im not sure if thats the best way to explain it, I feel like I may not be making total sense? haha sorry
Last edited on
Topic archived. No new replies allowed.