count type of apparations in a csv file

Hi,
I have to count the occurrences of each categories (Action, Adventure, Animation ...) in a csv file. The csv file is movies.csv and is here http://files.grouplens.org/datasets/movielens/ml-latest-small.zip
I have to read it with independent threads by category and save the result in a txt file protecting the writing with mutex or semaphores.
I don't know how I could count them and write each category in a file.
I think it is easy for someone who has used this type of functions.
Thank you.
I would strongly suggest you solve this problem first without using any threads at all.

When you have a working program, and a far better understanding of what the problem is, then you can THINK about how threads might help with this problem.

When you've thought of a plan, then think how you can better structure the code you have (aka refactoring) to ease your transition into threads. It's all still single threaded at this stage, so you can keep testing your working code to make sure it still works.

Only then, when you're armed with a plan, and code amenable to threading, do you actually put threads in there.

Premature addition of threads is just a crap-shoot waiting to happen.
I have tried to solve the problem without using threads but I have not succeeded.
My knowledge is quite limited so I need help
Thanks for the help anyway.
I have tried to solve the problem without using threads but I have not succeeded.

So show what you've tried, you'll have much better luck getting help after you show your effort.

ghosts in the machine?

if you are tasked to do this with threads, you have to cook up some dumb use for them: the problem as described does not need threads and in fact they just get in the way.

perhaps, if this solves your problem:
read entire file into memory or memory map it or the like.
then kick off threads. each thread will take a category (via an enum or string or something parameter, eg myenum::action or "Action" or whatever).
each thread will go through the entire file looking for the records that belong to it, and save those to the file that belong to it (eg "Action.txt"). Count as you go, whatever is needed.

You should not need any mutex or other thread stuff here; there is no point because nothing you should be doing should need to be blocking. But you can put some in if you need to. The OS already knows what to do when you have a bunch of open files etc.
Last edited on
Topic archived. No new replies allowed.