Turns out that when I update the v1 vector with two threads then the time of execution is around 1600k. For comparison when the v1 is updated by one thread then time is a little bit more than 100k.
I have Intel Core i5-3230M with four cores. In theory each thread should be computed by different core. What is wrong? Why is the perfomance going down with multithreading?
Your computer can probably do roughly 2 billion calculations per second. You may want to try a more processor intensive test. Remember starting a thread is nontrivial. You're also accessing the same object in both threads, be careful with thread safety.
If you want to profile your code there are many options available. Profiling will allow you to see what your process was doing during execution to discover stragglers.