RDBMS Model for Record Variants?

Forum

Forum
Lounge
RDBMS Model for Record Variants?

RDBMS Model for Record Variants?

I have a non-C++ question about RDBMS design.

Years ago, while working on an in-house OODB, we had a facility for creating record variants (sorry - I don't know the correct DB terminology for this feature).

For example, say record1 was created at t1, record2 was created at t2, and record3 at t3, where t3 came after t2 which came after t1.

These records are variants of each other, in the sense that field values were modified over time. There was an easy-to-use query where you could find a snapshot of your table at time t - underneath, the query engine was smart enough to recover the latest record at time t, while ignoring the others.

Internally, we used to call this feature time travel, because it allowed you to see your table at any time t, in the past - no information was ever lost.
Conceptually, it's like adding an axis Z called time, and allowing you to travel along this axis, seeing tables change over time.

Has anyone seen something like this modeled in MySQL or any RDBMS?
There must be a way this can be modeled cleanly (the tricky part seems to be the query to take only the newest variant for each record).

kfmfe04 (788)

I found an interesting like here:

http://stackoverflow.com/questions/125877/versioning-database-persisted-objects-how-would-you#126029

but if anyone has had experience in this area, please feel free to share - ty.

kfmfe04 (788)

I've almost finished implementing type 2 SCD as mentioned here:

http://en.wikipedia.org/wiki/Slowly_changing_dimension

looks like it's about 200 lines of template code, less than I thought would be needed

essentially, I have two template classes: TimeTraveler< typename T > and TimeMachine< typename T>

TimeTraveler represents an entity which changes over time - it will hold all the instances of DB records which represents the same entity over contiguous time periods - I also have an internal template class that decorates the core object with tm_begin, tm_end, and tm_seqno (time machine attributes). TimeTraveler ensures that the instances have increasing tm_seqno and contiguous tm_begin, tm_end periods.

TimeMachine will hold all instances of TimeTraveler for easy lookup of entities with a key. Each TimeMachine has its own DateTime so you can pull up all instances for a given time.

An important feature of TimeTraveler is not allowing update, insert, and deletes to be directly called to the DB. In fact, you want to create new time-contiguous records every time you intend to do one of these operations. To implement this model, accessors are strictly const. However, there is one method where you can ask for a mutable instance (internally, we will make a copy of the newest record and return a handle to this instance back to the user for modification). Internally, we keep track of which instances are dirty and need writing back out to the DB (inserts since old records are immutable).

In one call to TimeMachine, we will scan for all dirty records and dump out a CSV file for easy loading via LOAD DATA INFILE.

To ensure data consistency in the face of multithreading, we must be careful with assigning tm_seqno. The algorithm here is, provide a function whose sole argument is the number of instances (new seqno) needed. The caller will scan TimeMachine for dirty records (essentially, new records since historical records are immutable), count them up, and pass this number in (call it n). The function must, then, lock the database, query for the top seqno allocated, add n to this number, update the top seqno in the database, and then unlock the database. This atomicity is needed so that a call from a different thread will not interfere with this block of operations.

This last block is just a note to myself - the last part which I will implement next

kbw (9488)

I've worked on a system that used type 4 SCD. I thought it worked well, but the data wasn't "slow" and the history tables got large quickly.

kfmfe04 (788)

interesting - was it for work or for personal?

I just found SELECT ... FOR UPDATE for mysql:

http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html

guess I won't have to use an explicit lock after all - going to try it out now

edit:

looks like this block of code will do it

START TRANSACTION;

SELECT tm_seqno_next FROM sandbox_db.tbl_attrs 
WHERE db_name='sandbox_db' AND tbl_name='test_timemachine' 
FOR UPDATE;

UPDATE sandbox_db.tbl_attrs SET tm_seqno_next = tm_seqno_next + 25
WHERE db_name='sandbox_db' AND tbl_name='test_timemachine';

COMMIT;

suppose there are two processes A and B trying to update tm_seqno_next which is 0 to start

A reaches Line 5 first, so when B reaches Line 5, it gets blocked (potentially timing out if A takes too long, but under normal circumstances, not a problem)

B will continue to be blocked on Line 5 until A completes with an update to the value of 25

B will unblock on Line 5 with a value of 25 and completes with an update to the value of 50

so works as expected

Last edited on

Topic archived. No new replies allowed.