Mysql scalability - insert a billion row

Forum

Forum
Lounge
Mysql scalability - insert a billion row

Mysql scalability - insert a billion rows!

For the last few days, I've been playing around with Mysql scalability - my original attempt was to insert a billion rows into a simple table - two ints, the first is the primary key (a sequence number) and the second is an unsigned int, a random 32-bit number. There is also a unique key on the second column.

The hardware is a cheap HP ML110 2-core Xeon 3065@2.33GHz I bought a couple years ago with 4GB of RAM - the database is on a USB drive (for experimental purposes), and the OS is 32-bit Ubuntu.

Failing to get a completed run with 1 billion rows, I attempted smaller numbers with various settings, eventually discovering that --key_buffer_size, --sort_buffer_size, and --myisam_max_sort_file_size all have impact on how quickly the run completes.

Eventually, I found that 10 million rows completes in 1m32s while 100 million rows completes in 23m 23s, which is not scaling linearly due to the load being memory-bound - insufficient memory causes swaps to happen so disk I/O becomes the limiting factor as CPU usage drops to single digit percentages.

There are also posts on the web describing the difference between key construction by sorting (fast) vs by keycache (slow - acts row by row) - this is probably the key factor as I attempted to scale up to a billion rows.

http://venublog.com/2010/01/04/performance-comparison-of-repair-by-sorting-or-by-keycache/

The interesting thing is, it's the primary key and secondary key generation that really kills the load (which makes sense when you guesstimate how much memory is required) - I was able to load 1 billion rows in about 20m if there was no primary key and no secondary keys using LOAD DATA INFILE - rather amazing on a $400 box (excl memory) I bought two years ago.

Last edited on

helios (17607)

If this database is for a single client application, you might want to take a look at SQLite. It should be significantly faster than MySQL because of no IPC.

kfmfe04 (788)

thanks for the tip, helios!

since only my server will be talking to the database, this could be an excellent option for me - I will look into what APIs are available and what features have been implemented in SQLite

rapidcoder (1010)

First load the data without indexes. Then create the indexes (the primary key constraint etc). MySQL can't do bulk inserts into indexes efficiently. Anyway, why MySQL? Try H2 database. It is a lot faster.

Last edited on

liranzelkha (2)

Or you can try out sharding. I know I'm biased (I'm working there) - but if you really want to speed up your inserts - try out http://sharding.scalebase.com

kfmfe04 (788)

actually, I need to write only once - the rest of the time, I need to read

liranzelkha (2)

I think you can expect reads to run also much faster when you shard, since indexes are much smaller, and if you run queries across multiple databases, they run in a distributed way. It depends on how many reads you try to do.

Topic archived. No new replies allowed.

C++

Forum

Mysql scalability - insert a billion rows!