Daybreak is a simple and very fast key value store for ruby. It has user defined persistence, and all data is stored in a table in memory, so ruby niceties are available. Daybreak is faster than pstore and dbm.

The source is at Github and you can install it with:

$ gem install daybreak

(v0.3.0) | API Docs | Issue Tracker

Overview

Daybreak stores data in an append-only file, and values inserted into the database are marshalled ruby objects. It includes Enumerable for functional methods like map and reduce and emulates the interface of a simple ruby hash. Here is the basic api:

  require 'daybreak'

  db = Daybreak::DB.new "example.db"

  # set the value of a key
  db['foo'] = 2

  # set the value of a key and flush the change to disk
  db.set! 'bar', 2

  # You can also use atomic batch updates
  db.update :alpha => 1, :beta => 2
  db.update! :alpha => 1, :beta => 2

  # all keys are cast to strings via #to_s
  db[1] = 2
  db.keys.include? 1 # => false
  db.keys.include? '1' # => true

  # ensure changes are sent to disk
  db.flush

  # open up another db client
  db2 = Daybreak::DB.new "example2.db"
  db2['foo'] = 3

  # Ruby objects work too
  db2['baz'] = {:one => 1}
  db2.flush

  # Reread the changed file in the first db
  db.load
  p db['foo'] #=> 3
  p db['baz'] #=> {:one => 1}

  # Enumerable works too!
  1000.times {|i| db[i] = i }
  p db.reduce(0) {|m, k, v| m + k.last } # => 499500

  # Compaction is always a good idea. It will cut down on the size of the Database
  db.compact
  p db['foo'] #=> 1
  db2.load
  p db2['foo'] #=> 1

  # DBs can accessed from multiple processes at the same
  # time. You can use #lock to make an operation atomic.
  db.lock do
    db['counter'] += 1
  end

  # If you want to synchronize only between threads, prefer synchronize over lock!
  db.synchronize do
    db['counter'] += 1
  end

  # DBs can have default values
  db3 = Daybreak::DB.new "example3.db", :default => 'hello!'
  db3['bingo'] #=> hello!

  # If you don't like Marshal as serializer, you can write your own
  # serializer. Inherit Daybreak::Serializer::Default
  db4 = Daybreak::DB.new "example4.db", :serializer => MyJsonSerializer

  # close the databases
  db.close
  db2.close
  db3.close
  db4.close

You can provide your own serializer, see Daybreak::Serializer::Default if you want a different serialization strategy (for example, JSON). You can also provide your own format, see Daybreak::Format if you want to format your database log differently.

Architecture

When a Daybreak database is opened it reads the append only file and mirrors the data in an in memory hash table for fast reads.

Writes to a Daybreak database are asynchronous and each write is queued. If you want to commit immediately to the file call flush after a write.

Daybreak is multi process safe. Synchronization with the other processes is done by calling load or lock. load updates the in memory hash table with new database records from the filesystem. Use lock if you want to make operations atomic across process boundaries.

If you only want to synchronize between different threads, prefer synchronize over lock. Be aware that Daybreak is not thread-safe by default, so all (!) accesses have to be wrapped by synchronize (This statement is true at least on interpreters without global interpreter lock (Rubinius, JRuby)).

Writes with duplicate keys are simply appended to the end of the file. From time to time you will want to run compact which will remove old commits from the file and create a smaller logfile. This will shrink the space necessary to store the data on disk. You can also compact from a background process.

File Format

Daybreak stores its data in a very simple file format. Each Daybreak file is an append only log consisting of 32 bit big endian key length, 32 bit big endian value length, key data and value data. Every key-value pair also has an associated 32 bit CRC field to protect against bad data. The special value 0xFFFFFFFF for the value length denotes a deleted record. Here is how a database of one record might look:

32 bit Key length 32 bit Value length Key Value CRC32
(...)0000101 (...)0001010 hello <marshalled value> (...)11010

These values are all read into an in memory hash table and commits to the database are queued for writing. A reminder: Call flush if you want commits to block and be written to the filesystem.

In the Wild

Testing & Benchmarks

Daybreak is tested using Travis-CI. We also run benchmarks there, which compare Daybreak against DBM, GDBM and Hash.

If you are interested in benchmarks, you can also take a look at the Moneta benchmarks, where Daybreak is compared to virtually all existing key/value stores. It seems to be the fastest persistent database from all the Moneta backends.

=============================================================================
Summary uniform_medium: 3 runs, 1000 keys
=============================================================================
                         Minimum  Maximum    Total     Mean   Stddev    Ops/s
Memory            sum         17       19       55       18        0    53725
Daybreak          sum         20       26       68       22        2    44036
LevelDB           sum         40       44      129       43        1    23176
TDB               sum         40       53      148       49        6    20192
GDBM              sum         39       70      151       50       14    19832
DBM               sum         38       77      171       57       16    17491
LRUHash           sum         56       99      211       70       20    14177
Sqlite            sum        134      167      438      146       15     6845
File              sum        333      444     1190      396       46     2519
HashFile          sum        471      494     1451      483        9     2066
Redis             sum        656      818     2218      739       65     1352
MemcachedDalli    sum        700     1051     2532      844      150     1184
MemcachedNative   sum        822      979     2661      887       66     1127
Client            sum        906      970     2814      938       26     1065
Sequel            sum       2090     2635     6992     2330      227      429
Mongo             sum       2053     2704     7108     2369      265      422
DataMapper        sum       7984    11287    27909     9303     1428      107
Couch             sum      15481    18786    51336    17112     1349       58
Riak              sum      15597    22437    56838    18946     2794       52
PStore            sum      15975    26684    59356    19785     4887       50
ActiveRecord      sum      27526    32525    89807    29935     2044       33
RestClient        sum     122103   122781   367042   122347      307        8

Change Log

0.3.0
Speed up read performance, and a slight change to Daybreak::Format which now is responsible for reading the entire database in one go, and yielding records as they are parsed.
0.2.4
Fix possible infinite loops when the worker thread thows an error.
0.2.3
Fix a bug with utf-8 strings (thanks pepe).
0.2.2
Move file handling bits to Journal, and fix a bug with compact!, and rename sync to load (or sunrise if you're feeling fun).
0.2.1
Add bulk updates with update and it's friend update!. and add a subclass fix (thanks ch1c0t).
0.2.0
Pretty much a complete rewrite by minad to allow for multi-process safety and thread safety. Huge speed improvements and the ability to define custom formats and serializers.
Note: Old db formats from previous versions will need to be upgraded, use the converter to upgrade your old databases.
0.1.3
Simplify internals, and speed up both reading and writing.
0.1.2
Fix compact! segfault or deadlock on 1.8.7-p371, and huge cleanup and speedup thanks to minad!
0.1.1
Fix file handling and possible segfault on some systems when using clear
0.1.0
Make Daybreak compatible with Moneta, and add a delete operation. This represents a slight change to the log file format. (thanks minad)
0.0.4
Fix a bug in compact! to allow for inhherited DBs (thanks jlapier)
0.0.3
Add support for windows rubies (thanks to rob99 for help tracking down the issue.)
0.0.2
Fix bug with calls to empty!.
0.0.1
Initial release.

License

Copyright (c) 2012 - 2013 ProPublica

MIT License

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Daybreak is a project of ProPublica.