This is the API for the Btree module. The symbols mentioned in this document are exported from the db.btree package.
Background
Opening and Closing
open-btree
create-btree
create-memory-btree
close-btree
sync-btree
Storing and Retrieving data
get-btree
get-btree-ext
set-btree-ext
Cursors
create-cursor
position-cursor
position-cursor-ext
cursor-get
cursor-next
cursor-previous
cursor-get-ext
cursor-next-ext
cursor-previous-ext
cursor-delete
unbind-cursor
Miscellaneous
copy-btree
merge-btrees
analyze-btree
btree-user-data
dump-btree
Replace Hook
Encoding
Examples
Advanced Examples
Opening a Btree
Accessing data
Large Btrees and Merging
Index
Btrees are well suited for storage on disk and thus they are the basis for most database systems.
The format of a btree on disk is portable across all ACL platforms.
The Allegro implementation of btrees has these properties
In theory it's possible or more than once process to read the same btree. In practice it will only work on Unix since on the Windows operating system forbids multiple processes open the same file for read/write (and we don't yet suppport opening a btree file read-only).
If a process is modifying a btree then no other process should read it. On Unix there's nothing to prevent multiple writers or readers of the same file so it's the programmers responsibility to ensure that this does not occur.
(open-btree filename &key (if-exists :open) (if-does-not-exist :error) (unique-keys nil) cache-size split compare-function buffer-pool large-block-allocator readahead partial-writes read-only) |
Opens an existing btree or creates a new btree.
if-exists determines what open-btree does if the btree named already exists
if-does-not-exist determines what is done if the btree named does not exist yet:
Note that a btree can be opened by at most one process at a time. The btree code does not do the file locking necessary to allow more than one reader or writer.
If unique-keys is true then at most one value is associated with a given key. Attempting to store a second value for a given key will cause the first value to be deleted. You can only specify the unique-keys behavior when you create a btree. If you open an existing btree then the unique-keys argument is ignored.
If cache-size is given it should be an the integer number of bytes of buffers that the btree code may maintain for this btree. The actual number of bytes of buffers at any given time may be larger than this maximum. This number is used as a general guide to the size of the buffer cache. If cache-size is not given then the btree code will maintain all blocks it encounters in the buffer cache. This is undesirable if the btree is large.
When data is inserted into a btree it is placed inside the appropriate block. Eventually the block gets filled up and must be split into two blocks. If split is given it should be a floating point number between 0 and 1 and it specifies how much of a split block is left in the original block (the rest being moved to the new block). A value of 0.5 would split the data evenly. The default is 0.7. You would choose a split value closer to 1.0 if you felt that it's likely that the next keys to be inserted will be larger than the previous keys inserted.
If compare-function is given it should be a function of six values
arr1 start1 end1 arr2 start2 end2where arr1 and arr2 are (unsigned-byte 8) vectors and the start and end values denote the range of the array to consider.
This should compare the values and return
1 if arr2 > arr1 0 if arr2 = arr1 -1 if arr2 < arr1
The default comparison function is a simple lexicographic comparison function. With that comparison function
The btree code caches disk blocks in buffers created from lisp vectors allocated in old space. When the cache of buffers get too large the btree code frees buffer to reduce the cache size. Normally freeing a buffer means eliminating pointers to it so it can be garbage collected. However since the buffers are allocated in old space they only get garbage collected when a global gc is done. Also after freeing a buffer the btree code will likely need to allocate a new buffer at some time in the future. Thus we have the buffer-pool object. A buffer pool is a free list of buffers. When the btree code wants to free a buffer from the cache and the btree has an associated buffer-pool, the buffer is added to the buffer pool rather than being left to be garbage collected. When the btree code needs a buffer it first checks the buffer pool and reuses a buffer from there instead. A buffer-pool can be shared by multiple btrees thus ensuring that freed buffers soon find a new use.
The value of buffer-pool can be a buffer pool object. A buffer pool object is created by evaling (make-btree-buffer-pool). That buffer pool will by used by this btree. You share a buffer pool among btrees by passing in the same buffer pool object when the btrees are opened.
If buffer-pool is any other non-nil value then a buffer pool object will be created and used only by this btree.
large-block-allocator is an optional function that the btree code will call to allocate large simple vectors of (unsigned-byte 8). The function takes two arguments
(lambda (kind size) ....)
The value of kind is
size is the number of bytes needed (but returning a larger vector is ok.) The function can return either a vector of at least the desired size or nil. If the function returns nil then the btree code will do the allocation. If a block is returned by the function it will not be cached in the btree's buffer cache. |
readhead if given should be an integer. readahead is used to efficiently read the btree file into the operating system's file cache. The fact that btree blocks are read somewhat randomly will fail to trigger the code in the operating system to automatically read the btree file into the cache in the background, thus the need for the readahead argument. The value of readahead specifies how many bytes are read into the file cache at a time. A value such as 20,000,000 is reasonable. Using readahead a file that's being merged has been seen to speed up the merging operation by a factor of 10. Readahead is not so useful if the btree will be accessed infrequently or over a long period of time as the operating system's file cache will soon be replaced by other more active files.
partial-writes specifies whether the btree code can write out just some of the modified buffers when its forced to write out modified buffers due to the buffer cache filling up. Normally all modified buffer are written out since that ensures that the btree on disk is valid most of the time. If you specify true for partial-writes then the btree on disk will be invalid most of the time and will only become valid when you call sync-btree. Specifying true for partial-writes does make the btree code faster as it eliminates many unnecessary writes.
read-only of true will cause the btree file to be opened for reading only. Any action that will cause a btree block to be written will signal an error. Reading objects from the btree is not a problem. Adding new objects to the btree will work as well until a modified btree buffer needs to be written to the disk.
open-btree returns a btree object
(create-btree filename &key (if-exists :supersede) (if-does-not-exist :create) (unique-keys nil) split cache-size compare-function buffer-pool large-block-allocator partial-writes) |
Just like open-btree except that the defaults for the keyword arguments are such that a new btree will be created whether or not the btree already exists.
(create-memory-btree &key (unique-keys nil) split buffer-pool compare-function large-block-allocator partial-writes) |
Create a btree in memory only. This btree need not be closed as it doesn't have an associated file to close.
(close-btree btree &key abort) |
If the value of abort is true then the sync-btree is not done.
(sync-btree btree) |
Keys and values are always simple-vectors of type unsigned-byte 8.
There are two types of functions in the API:
(get-btree btree key) |
(setf (get-btree btree key) value) |
If unique-keys is true and if there is already a value for the given key then that existing value is deleted before the new value is stored.
returns: value
(get-btree-ext btree key kstart kend) |
the answer is in the returned vector from indicies start to end-1
(set-btree-ext btree key kstart kend value vstart vend) |
The return value is undefined.
A cursor is an object that traverses a given btree. Objects in a btree are sorted by key thus a cursor can give the user access to objects in the sorted order.
Cursors float with changes to the tree. If you position a cursor at an item and then insert or delete something in the btree before that item the cursor will continue to point at that item. If you delete the item the cursor is pointing to it will then point to the following item
You do not have to call a function to close a cursor when you're finished with it but you should call
(unbind-cursor cursor)to disassociate the cursor from a particular spot in the btree. If that isn't done the cursor object will never be garbage collected until the btree is. Also it will mean that the btree has to perform work to keep the cursor floating correctly.
A cursor can be primed. A primed cursor will treat a request for the next or previous item as a request for the current item at the cursor and will then set the cursor to not-primed mode.
The prime feature allows you to position the cursor and prime it and then use only next commands to read all the data beginning with the data found by the position command. Below is a function to count the items in a btree. If we had not specified :prime t in the call to position-cursor the code would have had to call cursor-get the first time around the loop and then cursor-next subsequently. By priming the cursor we can write the loop using only cursor-next.
(defun items-in-a-btree (btree) (let ((cur (create-cursor btree))) (position-cursor cur nil :kind :first :prime t) (let ((count 0)) (loop (if* (null (cursor-next cur)) then (return)) (incf count)) (unbind-cursor cur) count)))
(create-cursor btree) |
A program need not close a cursor when it's finished but it should call unbind-cursor.
(position-cursor cursor key &key value kind prime) |
If value is nil
If value is given
|
(position-cursor-ext cursor key kstart kend &key value vstart vend prime) |
Also if value is given then this acts like position-cursor with a value given. In this case vstart defaults to 0 and vend defaults to the length of value.
(cursor-get cursor &key (key t) (value t) (kind :get) prime) |
the keyword arguments key and value control whether the key and value are returned (or if nil is returned in place of the key or value not returned).
If kind is
(cursor-next cursor &key (key t) (value t) prime) |
If the cursor was already at the last entry of the btree then nil is returned.
(cursor-previous cursor &key (key t) (value t) prime) |
If the cursor was already at the first entry of the btree then nil is returned.
(cursor-get-ext cursor &key (key t) (value t) (kind :get)) |
key kstart kend value vstart vendThe key and value vectors returned are the buffers used by the btree code, thus they should not be modified and they are subject to being changed by subsequent btree operations.
kind can be :get, :next, :previous or :first
(cursor-next-ext cursor &key (key t) (value t)) |
Return nil if the cursor is already at the end of the btree.
(cursor-previous-ext cursor &key (key t) (value t)) |
Return nil if the cursor is already at the beginning of the btree.
(cursor-delete cursor &key prime) |
prime, if true, will prime the cursor after the delete
(unbind-cursor cursor) |
(copy-btree &key from to) |
The insertion is done via the set-btree-ext function so the unique-keys value of the to btree will have an effect.
This function is deprecated in favor of the more general merge-btrees function.
(merge-btrees target-btree source-btrees) |
If you're merging btrees to create a btree that will never be modified it makes sense to specify a split value close to 1.0 for the target-btree. This is because the target-btree will be populated with keys in ascending order and thus the new key values will always be stored in the new block after the split.
(analyze-btree btree) |
(btree-user-data btree) |
This setf'able slot is a place for user code to store whatever data it wants in the btree object.
(dump-btree btree &optional filename) |
You can specify a hook function that gets called when a value associated with a key is going to be replaced with another value. This gives the program a chance to modify the old value to include the new value.
You specify the hook function this way:
(setf (btree-replace-hook btree) 'my-hook-function)
Note that this hook function only gets called when a value is replaced. A value can only be replaced when the btree is created with :unique-keys t specified.
The hook function takes these seven arguments
btree
oldvalue oldvstart oldvend newvalue newvstart newvend |
where oldvalue and newvalue are (unsigned-byte 8) and vectors and the start and end values indicate a range of indicies inside those vectors (i.e. all indicies beginning with Xstart and less than Xend).
The hook function is called when the existing value (denoted by oldvalue) is going to be replaced with the new value (denoted by newvalue)
The hook function can do one of three things.
nil
altvalue - a usb8 vector altstart - integer altend - integer |
Lisp has a built-in function for encoding strings as usb8 vectors.
cl-user(7): (defun enc-string (str) (string-to-octets str :external-format :utf-8 :null-terminate nil)) enc-string cl-user(8): (enc-string "test") #(116 101 115 116) 4 cl-user(9):
The utf-8 external format is a good general purpose encoding for any size character.
Since most lisp objects can be printed and then that printed form read by the lisp reader, one option is to print the object to encode as a string and then use string-to-octets to encode it. This is a simple way to encode objects but not particularly fast.
cl-user(9): (defun enc-value (value) (enc-string (write-to-string value))) enc-valuecl-user(10): (enc-value '(a b c d)) #(40 97 32 98 32 99 32 100 41) 9
Decoding values is accomplished with octets-to-string:
cl-user(13): (defun dec-value (encoded) (read-from-string (octets-to-string encoded :external-format :utf-8))) dec-valuecl-user(14): (dec-value (enc-value '(a b c d))) (a b c d) 9 cl-user(15):
One word of caution: whenever you're reading forms which could have been modified by someone untrustworthy there's a chance that they introduced uses of the #. (sharp-period) reader macro to cause evaluation at read time. In that case you'll want to bind *read-eval* to nil around the user of the read function.
The enc-value function show above will work for integers but it's probably not the encoder you'll want to use.
The following function encodes a non-negative integer in a fixed sized vector in big-endian format.
(defun enc-pos-int (value bytes &optional arr) (let ((res nil)) (loop (if* (zerop value) then (return) else (push (logand #xff value) res) (setq value (ash value -8))))(if* (> bytes (length res)) then (dotimes (i (- bytes (length res))) (push 0 res)))
(if* (and arr (>= (length arr) (length res))) then (let ((i -1)) (dolist (val res) (setf (aref arr (incf i)) val)) arr) else (make-array (length res) :element-type '(unsigned-byte 8) :initial-contents res))))
Here we try it out:
cl-user(12): (enc-pos-int 10 5) #(0 0 0 0 10)cl-user(13): (enc-pos-int 23423410 5) #(0 1 101 105 178)
cl-user(14): (enc-pos-int (1- (expt 2 33)) 5) #(1 255 255 255 255)
If you want to do sorting using the btree's built-in comparison function then you must encode all integers into the same length vector. This does waste space for small numbers. If space is critical you could write an encoder that encoded each number in the fewest bytes and then write your own btree comparsion that understood this encoding.
This function will decode an encoded value:
(defun dec-pos-int (val &optional (start 0) (end (length val))) (let ((res 0)) (dotimes (i (- end start)) (setq res (+ (aref val (+ start i)) (ash res 8))))res))
as demonstrated here:
cl-user(16): (dec-pos-int (enc-pos-int 21234 6)) 21234
Using the encoding functions discussed in the previous section we'll show some examples of creating btrees.
Here we create a btree where we store for each number from 0 to 999 their square:
cl-user(6): (use-package :db.btree) t cl-user(7): (setq bt (create-btree "foo.bt")) #<db.btree::btree [1] foo.bt @ #x1001257ff2> cl-user(8): (dotimes (i 1000) (setf (get-btree bt (enc-pos-int i 4)) (enc-pos-int (* i i) 5))) nil
Now we use the btree to find the square of 25
cl-user(9): (get-btree bt (enc-pos-int 25 4)) #(0 0 0 2 113) cl-user(10): (dec-pos-int *) 625
A cursor is an object that can move through a btree allowing you to retrieve keys and values and to delete keys and values
When you create a cursor you specify the btree that it will scan.
cl-user(12): (setq cur (create-cursor bt)) #<db.btree::cursor @ #x71c5d62a>
When a cursor is first created it doesn't point anywhere in the btree. Thus the operations on the cursor just return nil.
cl-user(13): (cursor-get cur) nil cl-user(14): (cursor-next cur) nil
You can specify where a cursor should point in a number of ways. Here we tell the cursor to point at the first element in the btree (and since keys are sorted, this will be a pointer to the lowest key in the key sorting order).
cl-user(15): (position-cursor cur nil :kind :first) nil
We can retrieve the key and value at the cursor with cursor-get. It returns the key and value as two values:
cl-user(16): (cursor-get cur) #(0 0 0 0) #(0 0 0 0 0)
We can tell even without decoding these usb8 arrays that the key and value are both 0.
To advance the cursor to the next value and retrieve it you use cursor-next:
cl-user(17): (cursor-next cur) #(0 0 0 1) #(0 0 0 0 1) cl-user(18): (cursor-next cur) #(0 0 0 2) #(0 0 0 0 4) cl-user(19): (cursor-next cur) #(0 0 0 3) #(0 0 0 0 9) cl-user(20): (cursor-next cur) #(0 0 0 4) #(0 0 0 0 16)
You'll note that after positioning the cursor we used cursor-get to retrieve the first value and cursor-next to retrieve subsequent ones. If you're writing a loop to retrieve values it's undesireable to have to call one function the first time through the loop and another function on subsequent calls. Thus cursors can be primed which means that they are in a special state so that a cursor-next will not move the cursor before retrieving the value. Also calling cursor-next will un-prime the cursor.
Here we once again position the cursor at the first item but this time we prime it as well. Then we can just use cursor-next to retrieve all the values:
cl-user(21): (position-cursor cur nil :kind :first :prime t) nil cl-user(22): (cursor-next cur) #(0 0 0 0) #(0 0 0 0 0) cl-user(23): (cursor-next cur) #(0 0 0 1) #(0 0 0 0 1) cl-user(24): (cursor-next cur) #(0 0 0 2) #(0 0 0 0 4)
You can position the cursor at the last item in the tree and then scan backwards with cursor-previous.
cl-user(25): (position-cursor cur nil :kind :last :prime t) nil cl-user(26): (cursor-previous cur) #(0 0 3 231) #(0 0 15 58 113) cl-user(27): (dec-pos-int *) 999 cl-user(28): (cursor-previous cur) #(0 0 3 230) #(0 0 15 50 164) cl-user(29): (dec-pos-int *) 998
You can position the cursor at a particular key. Note that in this case position-cursor returns t indicating that it found the key in the table:
cl-user(36): (position-cursor cur (enc-pos-int 385 4)) t cl-user(37): (cursor-get cur) #(0 0 1 129) #(0 0 2 67 1)cl-user(38): (dec-pos-int *) 385
If you specify a key not in the btree then position-cursor returns nil and the cursor is set on the next item after the location where the key would have been found. Here we choose a key value 2000 which is bigger than the biggest key in the table: 999. Thus position-cursor returns nil and sets the cursor after the end of the table. cursor-previous brings the cursor to the last key in the table: 999.
cl-user(39): (position-cursor cur (enc-pos-int 2000 4)) nilcl-user(40): (cursor-get cur) nil
cl-user(41): (cursor-previous cur) #(0 0 3 231) #(0 0 15 58 113)
cl-user(13): (dec-pos-int *) 999
You can position the cursor at a particular key and value as well and position-cursor will return t if it found the pair and nil if it did not:
cl-user(41): (position-cursor cur (enc-pos-int 385 4) :value (enc-pos-int (* 385 385) 5)) t cl-user(42): (position-cursor cur (enc-pos-int 385 4) :value (enc-pos-int (* 200 200) 5)) nil
You use a cursor to specify which values to delete from the btree. In the example below we position the cursor at key value 5. We delete that value at which point the cursor moves to the next key value, 6. We move the cursor back one value and we end up at key value 4 since key value 5 was deleted.
cl-user(47): (position-cursor cur (enc-pos-int 5 4)) t cl-user(48): (cursor-get cur) #(0 0 0 5) #(0 0 0 0 25) cl-user(49): (cursor-delete cur) t cl-user(50): (cursor-get cur) #(0 0 0 6) #(0 0 0 0 36) cl-user(51): (cursor-previous cur) #(0 0 0 4) #(0 0 0 0 16)
When you're finished using a cursor for a while it's best to unbind it. This disassociates it from any block in the btree and this allows the cursor to be garbage collected should no references exist to the cursor from the heap.
cl-user(52): (unbind-cursor cur) #<db.btree::cursor @ #x71b06c82>
So far all the btree operations have been done on the copy of the btree in memory. In order to write the btree to the disk you must sync-btree:
cl-user(53): (sync-btree bt) nil
When you're finished with the btree you should close it. The close-btree function will call sync-btree to ensure that the btree is completely written to the disk before the btree file is closed.
cl-user(54): (close-btree bt) #<db.btree::btree [1] foo.bt @ #x1001139632>
If your btrees will have few items in them or will be rarely accessed then you can make use of the examples in the previous section to manipulate your btrees. However if you're going to be storing millions and even billions of items in your btrees or will be repeatedly accessing them then you'll want to program in a more space and time efficient manner. This section will show you how to use the btrees in the most efficient manner.
Here we ask for at most a one megabyte cache. Since each btree block is 4096 bytes this cache can hold up to 256 blocks from the btree.
cl-user(16): (setq bt (create-btree "bar.bt" :cache-size (* 1024 1024))) #<db.btree::btree [2] bar.bt @ #x1001225782>
If you don't specify a cache size then there is no limit to the size of the cache. This is ok for small btrees but if the btree is huge it could mean using up all of the Lisp heap for the btree cache.
The cache consists of lisp vectors with copies of blocks from the disk. When the cache gets larger than the limit specified some blocks are removed from the cache and allowed to be garbage collected. The garbage collection of these vectors can take some time to occur since the lisp vectors are all allocated in old space and only a global garbage collection will recover unused old space.
Rather than require a global garbage collection in order to recover and reuse a cache block you should specify that you want the btree to use a buffer pool to manage the reuse:
cl-user(18): (setq bt (create-btree "bar.bt" :cache-size (* 1024 1024) :buffer-pool t)) #<db.btree::btree [3] bar.bt @ #x1001229ed2>
If a btree uses a buffer pool then when blocks are freed from the cache they are put in the buffer pool and when a new block is needed the buffer pool is checked before allocating one from the heap.
You can even share a buffer pool between btrees.
cl-user(19): (let ((bp (make-btree-buffer-pool))) (setq bt (create-btree "bar.bt" :cache-size (* 1024 1024) :buffer-pool bp)) (setq bt2 (create-btree "bar2.bt" :cache-size (* 1024 1024) :buffer-pool bp))) #<db.btree::btree [5] bar2.bt @ #x1001236602>
Keys and values in the normal version are simple vectors of type (unsigned byte 8). In the extended version keys and values are subsequences of simple vectors of type (unsigned-byte 8).
One big caveat about the use of the extended version: any returned values are only valid until the next call to a btree function on that btree. This is because the returned values are pointers to the btree buffers and on any btree call a given btree buffer may be used to store some other value.
The following example will compare the use of the normal and extended functions. We're going to open the same btree we built in the section above named "Examples." We want to find the number of keys between 0 and 199 that have a given value in the btree. First we'll show how to do this without a cursor and next we'll show how to write this using a cursor.
Using the normal functions we write it this way.
(defun run-normal (bt checknum) (let ((found 0)) (dotimes (keynum 200) (let* ((key (enc-pos-int keynum 4)) (val (get-btree bt key))) (if* (and val (equal checknum (dec-pos-int val))) then (incf found)))) found))
We're aware that a given key may not appear in the btree so we only call our dec-pos-int function if an actual value was returned. In this function we're allocating a new usb8 vector when we call our enc-pos-int function and the btree get-btree function (should there be a value at that key).
If we use the extended functions we would write it this way:
(defun run-ext (bt checknum) (let ((kbuf (make-array 10 :element-type '(unsigned-byte 8))) (found 0)) (dotimes (keynum 200) (let* ((key (enc-pos-int keynum 4 kbuf))) (multiple-value-bind (value vstart vend) (get-btree-ext bt key 0 4) (if* (and value (equal checknum (dec-pos-int value vstart vend))) then (incf found))))) found))
Since our goal is to minimize allocation we've passed a vector to enc-pos-int for it to fill with the encoded value. We call get-btree-ext and tell it the bounds with the vector that contain the key. It returns nil or a vector and the start and end indicies showing where the value is located in the returned vector. dec-pos-int is able to decode just that region of the vector.
When you run the two functions you see that the extended version does a lot less allocation (32 bytes of other bytes vrs 9600, and 802 cons bytes vrs 1802.)
cl-user(14): (time (run-normal bt 16)) ; cpu time (non-gc) 10 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 10 msec user, 0 msec system ; real time 1 msec ; space allocation: ; 1,802 cons cells, 9,600 other bytes, 0 static bytes 1
cl-user(15): (time (run-ext bt 16)) ; cpu time (non-gc) 0 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 0 msec user, 0 msec system ; real time 1 msec ; space allocation: ; 802 cons cells, 32 other bytes, 0 static bytes 1 cl-user(16):
If you are scanning a btree through a range of key values it's usually more efficient to scan the values in the btree using a cursor.
Using normal functions the code is as follows. We create a cursor and set it at the first key value greater than or equal to zero. Then we scan through the btree until there are no more keys or the key is greater or equal to 200. Each time cursor-next is called two usb8 vectors must be allocated to hold the key and value vectors returned.
(defun cursor-normal (bt checknum) (let ((found 0) (cursor (create-cursor bt))) (position-cursor cursor (enc-pos-int 0 4) :prime t) (loop (multiple-value-bind (key value) (cursor-next cursor)
(if* (or (null key) (>= (dec-pos-int key) 200)) then (return))
(if* (equal checknum (dec-pos-int value)) then (incf found)))) found))
The extended function cursor-next-ext returns pointers to the btree buffers and thus doesn't need to allocate new vectors.
(defun cursor-extended (bt checknum) (let ((found 0) (cursor (create-cursor bt))) (position-cursor cursor (enc-pos-int 0 4) :prime t) (loop (multiple-value-bind (key kstart kend value vstart vend) (cursor-next-ext cursor) (if* (or (null key) (>= (dec-pos-int key kstart kend) 200)) then (return)) (if* (equal checknum (dec-pos-int value vstart vend)) then (incf found)))) found))
The difference in space allocation in the two cases is apparent from the time macro:
cl-user(23): (time (cursor-normal bt 16)) ; cpu time (non-gc) 0 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 0 msec user, 0 msec system ; real time 1 msec ; space allocation: ; 12 cons cells, 9,760 other bytes, 0 static bytes 1cl-user(24): (time (cursor-extended bt 16)) ; cpu time (non-gc) 0 msec user, 0 msec system ; cpu time (gc) 0 msec user, 0 msec system ; cpu time (total) 0 msec user, 0 msec system ; real time 1 msec ; space allocation: ; 12 cons cells, 64 other bytes, 0 static bytes 1 cl-user(25):
Btrees can exhibit poor performance when the size of the btree on disk is much larger than the size of memory cache and random keys are accessed. When this occurs most of the time in the btree code is spent waiting to read a block from the disk.
This test program illustrates the problem. In this test we build a btree of 10,000,000 randomly chosen keys. We allocate a 1 MB cache and the btree file grows to 175 MB when it reaches 10,000,000 keys.
(defun build-random-btree (size) (let ((bt (create-btree "random.bt" :cache-size (* 1024 1024) :buffer-pool t)) (kbuf (make-array 10 :element-type '(unsigned-byte 8))) (value (enc-pos-int 222 3)) (last-time (get-universal-time)) (key)) (dotimes (i size) (setq key (enc-pos-int (random 10000000) 4 kbuf)) (set-btree-ext bt key 0 4 value 0 (length value)) (if* (zerop (mod (1+ i) 1000000)) then (sync-btree bt) (let ((time (get-universal-time))) (format t "~12d ~d secs~%" (1+ i) (- time last-time)) (setq last-time time)))) (close-btree bt)))
In the test program we print the amount of time it took for each 1,000,000 keys we added to the btree. The time printed isn't cumulative - it's just the time to insert the most recent 1,000,0000 entries.
cl-user(4): (build-random-btree 10000000) 1000000 24 secs 2000000 76 secs 3000000 130 secs 4000000 165 secs 5000000 254 secs 6000000 291 secs 7000000 395 secs 8000000 344 secs 9000000 445 secs 10000000 496 secs #<db.btree::btree [1] random.bt @ #x10012f48b2> cl-user(5):
It's pretty clear that the cost per insert is growing quite a bit as the btree grows.
There are a few ways to ease this problem. One would be to allocate as big a cache as you can when you're doing a large number of inserts.
The second method is to sort the keys before you insert them. Then instead of randomly inserting into the btree you'll insert in a way that the btree code is more likely to find the block it needs in the cache.
You can sort the keys in a variety of ways, one of them to use another btree. If you put the new keys in an empty btree they can be inserted quickly. Then you can use the btree merge function to add them to the big btree you're building.
This procedure is shown in this function, where we put each 1,000,000 keys in a btree and then merge it into the big btree. We could put the first 1,000,000 keys in the destination btree right away but for simplicity we insert the first 1,000,000 keys just like the other 9,000,000 keys.
(defun build-merge-random-btree (millions) (let* ((bp (make-btree-buffer-pool)) (bt (create-btree "random.bt" :cache-size (* 1024 1024) :buffer-pool bp)) (kbuf (make-array 10 :element-type '(unsigned-byte 8))) (value (enc-pos-int 222 3)) (last-time (get-universal-time)) (key)) (dotimes (i millions) (let ((newbt (create-btree "scratch.bt" :cache-size (* 40 1024 1024) :buffer-pool bp :partial-writes t))) (dotimes (j 1000000) (setq key (enc-pos-int (random 10000000) 4 kbuf)) (set-btree-ext newbt key 0 4 value 0 (length value))) (merge-btrees bt (list newbt)) (close-btree newbt :abort t)) (sync-btree bt) (let ((time (get-universal-time))) (format t "~12d ~d secs~%" (* (1+ i) 1000000) (- time last-time)) (setq last-time time)))))
Here are the timing for this method. It's dramatically better than the first method. In fact we barely see a slow down as we reach 10,000,000 keys.
cl-user(4): (build-merge-random-btree 10) 1000000 11 secs 2000000 16 secs 3000000 16 secs 4000000 16 secs 5000000 17 secs 6000000 17 secs 7000000 17 secs 8000000 17 secs 9000000 17 secs 10000000 18 secs nil cl-user(5):