- Author:
- Walter Weinmann <walter.weinmann@gmail.com>
- Status:
- Draft
- Type:
- Standards Track
- Created:
- 06-Dec-2016
- Erlang-Version:
- OTP 20.0
- Post-History:
- 6-Dec-2016

This EEP proposes the creation of a new module named b_trees for the administration of b-trees. Both the optional persistence and the sort order should be implemented by pluggable functionality.

This document has been placed in the public domain.

```
{MinimumSubtrees,
MaximumKeys,
SizeKeyValues,
SortFunction/2,
State,
Tree}
```

`Tree`

is composed of nodes of the form

```
{KeyNumber,
SubtreeNumber,
[{Key, Value}],
[Tree]}
```

and the “empty b-tree” node

```
nil
```

`State`

is a tuple composed of the following parameters:

```
{StateTarget,
DeleteFunction/3,
InsertFunction/3,
LookupFunction/3}
```

Since the b-trees are always balanced, there is no need for a balance operation.

```
b_tree() = {pos_integer(),
pos_integer(),
non_neg_integer(),
sort_function(),
state(),
tree()}
```

A general balanced tree.

```
iterator() = [{key_values(), subtrees()}]
```

A general balanced tree iterator.

Types:

```
Tree1 = Tree2 = Tree3 = b_tree() | gb_trees:tree()
```

Copies tree Tree1 to an empty tree Tree2. Both trees may be either of type b-tree or binary tree (gb_trees). Returns the new tree Tree3 of the same type as tree Tree2.

Types:

```
Key = any()
B-Tree1 = B-Tree2 = b_tree()
```

Removes the node with key Key from b-tree B-Tree1 and returns the new b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1, crashes otherwise.

Types:

```
Key = any()
B-Tree1 = B-Tree2 = b_tree()
```

Removes the node with key Key from b-tree B-Tree1 if key Key is present in b-tree B-Tree1, otherwise does nothing. Returns the new b-tree B-Tree2.

Types:

```
Order = pos_integer()
B-Tree = b_tree()
```

Returns a new empty b-tree. The order is defined as the maximum number of children nodes a non-leaf node may hold.

Types:

```
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
```

Inserts key Key with value Value into b-tree B-Tree1 if key Key is not present in b-tree B-Tree1, otherwise updates the current value of key Key to value Value in b-tree B-Tree1. Returns a new b-tree B-Tree2.

Types:

```
B-Tree1 = B-Tree2 = b_tree()
List = [{Key, Value}]
```

Turns an ordered list List of key value tuples into a b-tree. The given b-tree B-Tree1 must be empty. The list must not contain duplicate keys.

Types:

```
Key = any()
B-Tree = b_tree()
Value = any()
```

Retrieves the value stored with key Key in b-tree B-Tree. Assumes that key Key is present in b-tree B-Tree, crashes otherwise.

Types:

```
B-Tree = b_tree()
```

Returns the height of b-tree B-Tree as an integer. Assumes that b-tree B-Tree is non-empty.

Types:

```
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
```

Inserts key Key with value Value into b-tree B-Tree1 and returns the new
b-tree B-Tree2. Assumes that key Key is **not** present in b-tree B-Tree1,
crashes otherwise.

Types:

```
Key = any()
B-Tree = b_tree()
```

Returns `true`

if key Key is present in b-tree B-Tree, otherwise `false`

.

Types:

```
B-Tree = b_tree()
```

Returns `true`

if b-tree B-Tree is an empty b-tree, otherwise `false`

.

Types:

```
B-Tree = b_tree()
Iterator = iterator()
```

Returns iterator Iterator that can be used for traversing the entries
of b-tree B-Tree; see `next/1`

. The implementation of this iterator is
very efficient; traversing the whole b-tree using `next/1`

is only slightly
slower than getting the list of all key-value pairs using `to_list/1`

and traversing that. The main advantage of the iterator approach is that
it does not require the complete list of all key-value pairs to be built
in memory at one time.

Types:

```
Key = any(9
B-Tree = b_tree()
Iterator = iterator()
```

Returns iterator Iterator that can be used for traversing the entries
of b-tree B-Tree; see `next/1`

. The difference, as compared to the
iterator returned by iterator/1, is that the first key greater than
or equal to key Key is returned.

Types:

```
B-Tree = b_tree()
Key = any()
```

Returns the keys in b-tree B-Tree as an ordered list.

Types:

```
B-Tree = b_tree()
Key = any()
Value = any()
```

Returns a tuple {Key, Value}, where Key is the largest key in b-tree B-Tree, and Value is the value associated with this key. Assumes that b-tree B-Tree is not empty.

Types:

```
Key = any()
B-Tree = b_tree()
Value = any()
```

Looks up key Key in b-tree B-Tree. Returns {value, Value}, or none if key Key is not present.

Types:

```
Function = fun((Key, Value1) -> Value2)
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value1 = Value2 = any()
```

Maps function Function(Key, Value1) -> Value2 to all key value pairs of b-tree B-Tree1. Returns the new b-tree B-Tree2 with the same set of keys as b-tree B-Tree1 and the new set of values.

Types:

```
Iterator1 = Iterator2 = iterator()
Key = any()
Value = any()
```

Returns the tuple {Key, Value, Iterator2}, where Key is the smallest
key referred to by iterator Iterator1, and iterator Iterator2 is the
new iterator to be used for traversing the remaining nodes, or the
atom ‘**none**’ if no nodes remain.

Types:

```
B-Tree1 = B-Tree2 = b_tree()
Name : Value = sort : Function = fun((Key1, Key2) -> equal |
greater |
less)
| state : {StateTarget,
Function = fun(StateTarget, delete, Key)
-> true,
Function = fun(StateTarget, insert, Subtrees)
-> Key,
Function = fun(StateTarget, lookup, Key)
-> Subtrees}
```

Sets the parameter Name to value Value in the empty b-tree B-Tree1 and returns the new b-tree B-Tree2. This function can only be used in conjunction with an empty b-tree.

Types:

```
B-Tree = b_tree()
```

Returns the number of key value pairs in b-tree B-Tree as an integer. Returns 0 (zero) if b-tree B-Tree is empty.

Types:

```
B-Tree = b_tree()
```

Returns the number of total nodes and the number of leaf nodes in b-tree B-Tree as a tuple of two integers. Returns {0, 0} (zero) if b-tree B-Tree is empty.

Types:

```
B-Tree = b_tree()
Key = any()
Value = any()
```

Returns tuple {Key, Value}, where Key is the smallest key in b-tree B-Tree, and Value is the value associated with this key. Assumes that b-tree B-Tree is not empty.

Types:

```
Key1 = Key2 = any()
equal = greater = less = atom()
```

Returns the atom ‘**greater**’ if Key1 > Key2, the atom ‘**less**’
if Key1 < Key2 and otherwise the atom ‘**equal**’.

Types:

```
Key1 = Key2 = any()
equal = greater = less = atom()
```

Returns the atom ‘**less**’ if Key1 > Key2, the atom ‘**greater**’
if Key1 < Key2 and otherwise the atom ‘**equal**’.

Types:

```
Key = any()
B-Tree1 = B-Tree2 = b_tree()
```

Removes the node with key Key from b-tree B-Tree1 and returns the new b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1, crashes otherwise.

Types:

```
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value = any()
```

Returns tuple {Key, Value, B-Tree2}, where Key is the largest key in b-tree B-Tree1, Value is the value associated with this key, and b-tree B-Tree2 is this b-tree with the corresponding key value pair deleted. Assumes that b-tree B-Tree1 is not empty.

Types:

```
B-Tree1 = B-Tree2 = b_tree()
Key = any()
Value = any()
```

Returns tuple {Key, Value, B-Tree2}, where Key is the smallest key in b-tree B-Tree1, Value is the value associated with this key, and b-tree B-Tree2 is this b-tree with the corresponding key value pair deleted. Assumes that b-tree B-Tree1 is not empty.

Types:

```
B-Tree = b_tree()
Key = any()
Value = any()
```

Converts b-tree B-Tree into an ordered list of key value tuples.

Types:

```
Key = any()
Value = any()
B-Tree1 = B-Tree2 = b_tree()
```

Updates key Key to value Value in b-tree B-Tree1 and returns the new b-tree B-Tree2. Assumes that key Key is present in b-tree B-Tree1.

Types:

```
B-Tree = b_tree()
Value = any()
```

Returns the values in b-tree B-Tree as an ordered list, sorted by their corresponding keys. Duplicates are not removed.

```
{StateTarget, DeleteFunction, InsertFunction, LookupFunction}
StateTarget = any()
DeleteFunction(StateTarget, delete, Key) -> true
InsertFunction(StateTarget, insert, Subtrees) -> Key
LookupFunction(StateTarget, lookup, Key) -> Subtrees
```

Examples for state targets are a Dets table or a Mnesia table. The delete
function takes a state target, the atom `delete`

and a key as arguments
and returns the atom `true`

if successful. The insert function takes a
state target, the atom `insert`

and a subtrees data structure as arguments
and returns a key if successful. The lookup function takes a state target,
the atom `lookup`

and a key as arguments and returns a subtrees data
structure if successful.

The following examples are based on Mnesia.

```
persistence_by_mnesia(_, delete, SubtreesKey)
when is_list(SubtreesKey) ->
true;
persistence_by_mnesia(StateTarget, delete, SubtreesKey) ->
F = fun() ->
ok = mnesia:delete({StateTarget, SubtreesKey}),
true
end,
mnesia:activity(transaction, F);
persistence_by_mnesia(_, insert, []) ->
[];
persistence_by_mnesia(StateTarget, insert,
[{_, _, [{Key, _} | _], _} | _] = Subtrees) ->
SubtreesKey = list_to_binary(Key),
F = fun() ->
ok = mnesia:write(StateTarget,
#subtrees{subtreesKey = SubtreesKey,
subtrees = Subtrees}, write),
SubtreesKey
end,
mnesia:activity(transaction, F);
persistence_by_mnesia(_, lookup, SubtreesKey)
when is_list(SubtreesKey) ->
SubtreesKey;
persistence_by_mnesia(StateTarget, lookup, SubtreesKey) ->
F = fun() ->
[{subtrees, SubtreesKey, Subtrees}] = mnesia:read(StateTarget,
SubtreesKey),
Subtrees
end,
mnesia:activity(transaction, F).
```

Creating the Mnesia table:

```
-record(subtrees, {subtreesKey, subtrees}).
{atomic, ok} = mnesia:create_table(StateTargetName, [{record_name,
subtrees}]),
```

Creating the b-tree:

```
BTree1 = b_trees:empty(500),
BTree2 = b_trees:set_parameter(BTree1, state,
{StateTargetName,
fun persistence_by_mnesia/3,
fun persistence_by_mnesia/3,
fun persistence_by_mnesia/3}),
```

```
FunctionName(Key1, Key2) -> equal | greater | less
Key1 = Key2 = any()
```

The sort function takes two keys as arguments and returns the atom `less`

if Key1 < Key2, the atom `greater`

if Key1 > Key2 and otherwise the
atom `equal`

.

```
-spec sort_descending(key(), key()) -> sort_result().
sort_descending(Key_1, Key_2) ->
if
Key_1 < Key_2 -> greater;
Key_1 > Key_2 -> less;
true -> equal
end.
```

```
BTree1 = b_trees:empty(500),
BTree2 = b_trees:set_parameter(BTree1, sort, fun sort_descending/2),
```

B-trees are self-balancing tree data structures that keep data sorted and allow searches, sequential access, insertions, and deletions in logarithmic time. B-trees are a generalization of a binary search trees in that a node can have more than two children. Unlike self-balancing binary search trees, the b-tree is optimized for systems that read and write large blocks of data. B-trees are a good example of a data structure for external memory.

The functional design of the module b_trees is based on the module gb_trees:

```
b_trees | gb_trees
------------------|---------
n/a | balance/1
copy/2 | n/a
delete/2 | delete/2
delete_any/2 | delete_any/2
empty/1 | empty/0
enter/3 | enter/3
from_dict/2 | from_orddict/1
get/2 | get/2
height/1 | n/a
insert/3 | insert/3
is_defined/2 | is_defined/2
is_empty/1 | is_empty/1
iterator/1 | iterator/1
iterator_from/2 | iterator_from/2
keys/1 | keys/1
largest/1 | largest/1
lookup/2 | lookup/2
map/2 | map/2
next/1 | next/1
set_parameter/3 | n/a
size_key_values/1| size/1
size_nodes/1 | n/a
smallest/1 | smallest/1
sort_ascending/2 | n/a
sort_descending/2| n/a
take/2 | take/2
take_any/2 | take_any/2
take_largest/1 | take_largest/1
take_smallest/1 | take_smallest/1
to_list/1 | to_list/1
update/3 | update/3
values/1 | values/1
```

The functions `delete/2`

and `insert/3`

are implementations of the algorithms
of Cormen, Thomas; Leiserson, Charles; Rivest, Ronald; Stein, Clifford (2009),
Introduction to Algorithms (Third ed.), MIT Press and McGraw-Hill, pp. 484-504,
ISBN 0-262-03384-4. Chapter 18: B-Trees.

No issues - except module name collisions.

The reference implementation can be fetched from Github:

```
https://github.com/walter-weinmann/b_trees
```