For the last few days, I've been playing around with Mysql scalability - my original attempt was to insert a billion rows into a simple table - two ints, the first is the primary key (a sequence number) and the second is an unsigned int, a random 32-bit number. There is also a unique key on the second column.
The hardware is a cheap HP ML110 2-core Xeon 3065@2.33GHz I bought a couple years ago with 4GB of RAM - the database is on a USB drive (for experimental purposes), and the OS is 32-bit Ubuntu.
Failing to get a completed run with 1 billion rows, I attempted smaller numbers with various settings, eventually discovering that --key_buffer_size, --sort_buffer_size, and --myisam_max_sort_file_size all have impact on how quickly the run completes.
Eventually, I found that 10 million rows completes in 1m32s while 100 million rows completes in 23m 23s, which is not scaling linearly due to the load being memory-bound - insufficient memory causes swaps to happen so disk I/O becomes the limiting factor as CPU usage drops to single digit percentages.
There are also posts on the web describing the difference between key construction by sorting (fast) vs by keycache (slow - acts row by row) - this is probably the key factor as I attempted to scale up to a billion rows.
http://venublog.com/2010/01/04/performance-comparison-of-repair-by-sorting-or-by-keycache/
The interesting thing is, it's the primary key and secondary key generation that really kills the load (which makes sense when you guesstimate how much memory is required) - I was able to load 1 billion rows in about 20m if there was no primary key and no secondary keys using LOAD DATA INFILE - rather amazing on a $400 box (excl memory) I bought two years ago.