• Forum
  • Lounge
  • Mysql scalability - insert a billion row

 
Mysql scalability - insert a billion rows!

Jun 12, 2011 at 6:06pm
For the last few days, I've been playing around with Mysql scalability - my original attempt was to insert a billion rows into a simple table - two ints, the first is the primary key (a sequence number) and the second is an unsigned int, a random 32-bit number. There is also a unique key on the second column.

The hardware is a cheap HP ML110 2-core Xeon 3065@2.33GHz I bought a couple years ago with 4GB of RAM - the database is on a USB drive (for experimental purposes), and the OS is 32-bit Ubuntu.

Failing to get a completed run with 1 billion rows, I attempted smaller numbers with various settings, eventually discovering that --key_buffer_size, --sort_buffer_size, and --myisam_max_sort_file_size all have impact on how quickly the run completes.

Eventually, I found that 10 million rows completes in 1m32s while 100 million rows completes in 23m 23s, which is not scaling linearly due to the load being memory-bound - insufficient memory causes swaps to happen so disk I/O becomes the limiting factor as CPU usage drops to single digit percentages.

There are also posts on the web describing the difference between key construction by sorting (fast) vs by keycache (slow - acts row by row) - this is probably the key factor as I attempted to scale up to a billion rows.

http://venublog.com/2010/01/04/performance-comparison-of-repair-by-sorting-or-by-keycache/

The interesting thing is, it's the primary key and secondary key generation that really kills the load (which makes sense when you guesstimate how much memory is required) - I was able to load 1 billion rows in about 20m if there was no primary key and no secondary keys using LOAD DATA INFILE - rather amazing on a $400 box (excl memory) I bought two years ago.
Last edited on Jun 12, 2011 at 8:27pm
Jun 12, 2011 at 7:38pm
If this database is for a single client application, you might want to take a look at SQLite. It should be significantly faster than MySQL because of no IPC.
Jun 12, 2011 at 8:06pm
thanks for the tip, helios!

since only my server will be talking to the database, this could be an excellent option for me - I will look into what APIs are available and what features have been implemented in SQLite
Jun 13, 2011 at 12:54pm
First load the data without indexes. Then create the indexes (the primary key constraint etc). MySQL can't do bulk inserts into indexes efficiently. Anyway, why MySQL? Try H2 database. It is a lot faster.
Last edited on Jun 13, 2011 at 12:54pm
Jun 17, 2011 at 2:32pm
Or you can try out sharding. I know I'm biased (I'm working there) - but if you really want to speed up your inserts - try out http://sharding.scalebase.com
Jun 17, 2011 at 2:41pm
actually, I need to write only once - the rest of the time, I need to read
Jun 17, 2011 at 3:06pm
I think you can expect reads to run also much faster when you shard, since indexes are much smaller, and if you run queries across multiple databases, they run in a distributed way. It depends on how many reads you try to do.
Topic archived. No new replies allowed.