
please wait
The next thread will be one that handles and displays a progress bar |
|
|
And the last being a thread that copies the files over. |
Is it feasible to have more than one thread running the same function, specifically having two threads searching for files, one starting from the top, one starting from the bottom? |
Would it be more feasible to have one thread search directories for files and pass said files to a different thread to evaluate the extensions? |
While threading, do you need to use mutexes for variables that aren't written to, specifically my list of valid extensions that I am searching for? |
I'd this a valid understanding on them? |
Where or when do detach and swap come into play |
do mutexs lock themselves, do the need to be specifically locked and unlocked, I'm leaning more towards the latter, or is there some other process? |
is there anyway to detect when a thread has finished its processing without the use of a separate variable |
inter thread communication variables seem to be something new altogether. I'll have to learn more about them, but an atomic variable, from what I read before is similar to a static variable, correct? Is it also the most commonly used one? |
if you, or someone else, could throw in an extra idea as to a way of using threads in the program, even if its a completely new feature, I'd greatly consider it since I'm becoming determined to use them. |
So, you're recommending creating an md5 hash for each file copied and checking it after its copies to ensure it copied properly? |
WikiPedia wrote: |
---|
Hash functions are destructive, akin to lossy compression, as the original data is lost when hashed. Unlike compression algorithms, where something resembling the original data can be decompressed from compressed data, the goal of a hash value is to uniquely identify a reference to the object so that it can be retrieved in its entirety. |
Should I create my own algorithm to create a hash? |
I don't believe the boost filesystem supports a native hash function. |
Another thread will wait for notification, process the file (create a hash based off of that file), and pass onto another thread that it can copy |
main thread launches the separate threads (mainly to keep the clutter down), I'll have one thread that scans the directories, adds the paths to a list, and notifies another thread that it's ok to process that item. Another thread will wait for notification, process the file (create a hash based off of that file), and pass onto another thread that it can copy, then waits confirmation that file completed and check the hash. A third thread to purely copy the files to and from, waiting for hashes to be created and then checked. |
|
|
I don't believe the boost filesystem supports a native hash function. Why would it? And what do you mean by "native"? |
Hash the file as you're copying it, otherwise you'll have to read each file twice. To do this, you can simply load chunks of it to memory with standard file I/O functions, do the processing, then write the chunk to the destination. |
|
|
|
|
generate_destination
function? I figured it would have been more beneficial to have three lists, three threads to process them. I suppose it doesn't matter either way as there is plenty going on already.
Wouldn't this be slower than using boost? |
I see what you mean and could just use my main thread (hashing, new destination, copying, checking hash, moving to the next one), and a helper thread to scan the directories. |
My problem is that, this is a very large what if, let's say the helper thread is scanning and comes across a very small .gif early on. The main thread hashes, copies, checks hash, then what? Even if the main thread has a long list of files to copy over, and catches up to the helper thread, there will be no more paths in the list, but the main thread won't know if the helper thread is done, or if it's still scanning. Obviously it would have to wait, but how would it be smart enough to make sure it's not waiting for another path, when the other thread is actually done? From the condition variable that cubbi described, it looks like the thread would just wait until the condition is met. I don't want it to just wait. |
Also, I'm not sure if you missed it, forgot, or it just doesn't matter, but I plan on having three separate folders depending on the media. Is this just going to be handled by the generate_destination function? |
I figured it would have been more beneficial to have three lists, three threads to process them. |
Suppose that using Boost's copy method lets you copy a file at 200 MiB/s, while using standard library functions lets you copy at 100 MiB/s (this is an exaggerated example. In reality the difference won't be so big if you're doing things right). If you try to copy a 200 MiB file using the standard library (because you need to hash), it'll take 2 seconds. If you hash first (using the standard library) and then use Boost, it'll take 3 seconds. You can't use Boost and hash at the same time*. |
Scanning should happen in the main thread, since it's an I/O-heavy operation. |
Let's suppose for a moment that we have a program that behaves like you're describing up to that last sentence. What would you do during that wait or how would you eliminate it? Since this is an I/O-bound program, sooner or later, the CPU will have to wait while the device catches up. There's no way around this problem, since the CPU is much, much faster. |
What would you gain from this? |