-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathdesign.tex
More file actions
167 lines (141 loc) · 7.61 KB
/
Copy pathdesign.tex
File metadata and controls
167 lines (141 loc) · 7.61 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{bytefield}
\usepackage{float}
\title{Haskey - Design Document}
\author{Henri Verroken}
\date{August 2017}
\setlength{\parskip}{5pt}
\begin{document}
\maketitle
\newcommand{\hex}{\texttt}
\section{Binary format}
This section describes the on-disk binary format. Figure~\ref{packet:generic-page} shows the binary format of a generic page. The binary format of each page type starts with the xxHash 64-bit checksum\footnote{Seeded with a zero word.} of all the data after the checksum field. It is followed by a field that indicates the page type. The binary format of the remaining data is page type specific. The type field can be one of the following values:
\begin{itemize}
\item \textbf{Empty (\hex{0x00}):} An empty, unused page. The remaining data is garbage.
\item \textbf{Meta (\hex{0x20}):} A meta-page of the database (Section~\ref{sec:binary-format-meta-pages}).
\item \textbf{Overflow (\hex{0x40}):} An overflow page, containing overflow data (Section~\ref{sec:binary-format-overflow-pages}).
\item \textbf{Leaf node (\hex{0x60}):} A leaf node page, containing actual key-value pairs (Section~\ref{sec:binary-format-leaf-nodes}).
\item \textbf{Index node (\hex{0x80}):} An index node page, containing a branch node of the tree (Section~\ref{sec:binary-format-index-nodes}).
\end{itemize}
\begin{figure}[H]
\centering
\begin{bytefield}{32}
\bitheader{0-31} \\
\bitbox[lrt]{32}{Checksum (BE)} \\
\bitbox[lrb]{32}{} \\
\bitbox{8}{Type}\bitbox[lrt]{24}{} \\
\wordbox[lrb]{3}{Remaining data}
\end{bytefield}
\caption{Binary format of a generic page, of a certain type.}
\label{packet:generic-page}
\end{figure}
The remaining data in Figure~\ref{packet:generic-page} can be compressed using lz4. In that case Figure~\ref{packet:generic-compressed-page} shows the binary format of a generic packet. The first field is the xxHash 64-bit checksum\footnote{Seeded with the zero word.} of all (partially compressed) data after the checksum field. The type field indicates the page type, which is \texttt{OR}ed with \hex{0x01} to indicate that the page contents are compressed. The data is compressed using the \texttt{lz4} library.
\begin{figure}[H]
\centering
\begin{bytefield}{32}
\bitheader{0-31} \\
\bitbox[lrt]{32}{Checksum (BE)} \\
\bitbox[lrb]{32}{} \\
\bitbox{8}{Type \texttt{OR} \hex{0x1}}\bitbox[lrt]{24}{} \\
\wordbox[lrb]{3}{Compressed remaining data}
\end{bytefield}
\caption{Binary format of a generic page with compressed data, of a certain type.}
\label{packet:generic-compressed-page}
\end{figure}
\subsection{Meta pages}\label{sec:binary-format-meta-pages}
A meta-page is used to store meta-data about the database. Figure~\ref{packet:meta-page} shows the binary format of a meta-page. The remaining data field is encoded using the standard \texttt{Generic}-derived \texttt{Binary} instance of a \texttt{ConcurrentMeta} record, and contains the following data. No further optimizations were made to the binary format.
\begin{itemize}
\item The transaction id of the most recent commited transaction
\item The number of pages in both the data file and the index file
\item A pointer to the root of the main tree, along with its height.
\item Pointers to the roots of the free trees of both the data file and the index file, along with their heights.
\item A pointer to the root of the overflow tree.
\item Two collections of page IDs, one for the data file and one for the index file, that are immediately ready for reuse by the next transaction.
\end{itemize}
\begin{figure}[H]
\centering
\begin{bytefield}{32}
\bitheader{0-31} \\
\bitbox[lrt]{32}{Checksum (BE)} \\
\bitbox[lrb]{32}{} \\
\bitbox{8}{Type = \hex{0x20}}\bitbox[lrt]{24}{} \\
\wordbox[lrb]{3}{\texttt{ConcurrentMeta}}
\end{bytefield}
\caption{Binary format of a meta page, the binary format of the \texttt{ConcurrentMeta} record is not optimized.}
\label{packet:meta-page}
\end{figure}
\subsection{Overflow pages}\label{sec:binary-format-overflow-pages}
An overflow page is used to store data that is too big to store in a leaf node. Figure~\ref{packet:overflow-page} shows the binary format of an overflow page. The remaining data field is simply the encoded version of the value using its user-provided \texttt{Binary} instance.
\begin{figure}[H]
\centering
\begin{bytefield}{32}
\bitheader{0-31} \\
\bitbox[lrt]{32}{Checksum (BE)} \\
\bitbox[lrb]{32}{} \\
\bitbox{8}{Type = \hex{0x40}}\bitbox[lrt]{24}{} \\
\wordbox[lrb]{3}{Value}
\end{bytefield}
\caption{Binary format of an overflow page, the value field is simply the encoded version of the value using its user-provided \texttt{Binary} instance.}
\label{packet:overflow-page}
\end{figure}
\subsection{Leaf nodes}\label{sec:binary-format-leaf-nodes}
A leaf node contains actual key-value pairs or key-overflow pairs. Figure~\ref{packet:leaf-node-page} shows the binary format of a leaf node page. It contains a 3-byte value $N$ indicating the number of key-value pairs. The (key, value) tuple is encoded using its \texttt{Binary}-instance, but the value is preceded by a byte indicating whether or not it is an overflow value, as shown in Figure~\ref{packet:non-overflow-value} and~\ref{packet:overflow-value}.
The 3-byte $N$ value means that we need at least 3-byte keys to get $2^{24} - 1$ unique keys. An extra byte to encode the overflow information, means every leaf entry requires at least 4 bytes to fill up the 3-byte counter. This sets an upper bound on the page size of $2^{24} \times 4\, \textrm{bytes} = 67\, \textrm{MB}$, which is deemed plenty.
\begin{figure}[H]
\centering
\begin{bytefield}{32}
\bitheader{0-31} \\
\bitbox[lrt]{32}{Checksum (BE)} \\
\bitbox[lrb]{32}{} \\
\bitbox{8}{Type = \hex{0x60}}\bitbox{24}{$N$ (BE)} \\
\wordbox[lrt]{1}{$N$ (key, value/overflow) tuples} \\
\skippedwords \\
\wordbox[lrb]{1}{}
\end{bytefield}
\caption{Binary format of a leaf node page.}
\label{packet:leaf-node-page}
\end{figure}
\begin{figure}[H]
\centering
\begin{bytefield}{32}
\bitheader{0-31} \\
\bitbox{8}{Overflow = \hex{0x0}}
\bitbox{24}{\texttt{Binary}-encoded value}
\end{bytefield}
\caption{Binary format of a non-overflow value.}
\label{packet:non-overflow-value}
\end{figure}
\begin{figure}[H]
\centering
\begin{bytefield}{40}
\bitheader{0-39} \\
\bitbox{8}{Overflow = \hex{0x1}}
\bitbox{32}{32-bit overflow counter}
\end{bytefield}
\caption{Binary format of an overflow value.}
\label{packet:overflow-value}
\end{figure}
\subsection{Index nodes}\label{sec:binary-format-index-nodes}
An index node contains key-page pairs. Figure~\ref{packet:index-node-page} shows the binary format of an index node page. It contains a byte indicating the height of the branch node, and a 3-byte $N$ value indicating the amount of keys in the index node.
The 3-byte $N$ value means that we need at least 3-byte keys to get $2^{24} - 1$ unique keys, along with $2^{24}$ 8-byte page IDs. This sets an upper bound on the page size of $2^{24} \times 11\,\textrm{bytes} = 180\,\textrm{MB}$, which is deemed plenty.
The 1-byte height value, means the maximum amount of leaf nodes is $2^{255}$, which is also plentiful.
\begin{figure}[H]
\centering
\begin{bytefield}{32}
\bitheader{0-31} \\
\bitbox[lrt]{32}{Checksum (BE)} \\
\bitbox[lrb]{32}{} \\
\bitbox{8}{Type = \hex{0x80}}\bitbox{8}{Height}\bitbox{16}{$N$ (MSB)} \\
\bitbox{8}{$N$ (LSB)}\bitbox[lrt]{24}{} \\
\wordbox[lr]{1}{$N$ keys} \\
\skippedwords \\
\wordbox[lrb]{1}{} \\
\wordbox[lrt]{1}{$(N + 1)$ 8-byte page IDs} \\
\skippedwords \\
\wordbox[lrb]{1}{}
\end{bytefield}
\caption{Binary format of an index node page.}
\label{packet:index-node-page}
\end{figure}
\end{document}