forked from jmscott/blobio
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME
300 lines (221 loc) · 10.7 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
Synopsis:
A trivial protocol to manage immutable blobs over a network.
Description:
The blobio environment implements a simple client/server protocol for
associating crypto hash digests as keys for immutable, binary large
objects stored on a network. The blobs are referenced with a uri
(uniform resource identifier) called a uniform digest, shortened to
'udig', with a syntax like
algorithm:digest
where 'algorithm' is the hash algorithm, like 'sha' for example, and
'digest' is the ascii hash value of the blob. The colon character
separates the algorithm and the digest. The algorithm matches the
perl5 regex
^[a-z][a-z0-9]{0,7}$
and the digest matches the perl5 regex
^[[:graph:]]{32,128}$
^[[:ascii:]{32,128}$
An example udig is sha:cd50d19784897085a8d0e3e413f8612b097c03f1
which describes the string "hello, world\n".
echo 'hello, world' | shasum
cd50d19784897085a8d0e3e413f8612b097c03f1 -
The maximum size of an algorithm is 8 characters. The maximum size
of a digest is 128 characters, making the maximum size of a udig =
8 + 128 + 1 (colon).
The network protocol implements 7 verbs
get <udig> get a blob with a particular digest
put <udig> write a blob with a digest
take <udig> get a blob that the source may forget
give <udig> put a blob that the source may forget
eat <udig> untrusted digest of a blob by server
wrap bundle current unwrapped blob traffic
logs into a single blob and return
udig of the set of all blobs wrapped
since the previous roll. the wrapped
<udig>, returned to client, will
be in the next wrap, which allows
perpetual chaining.
roll <udig> forget all wrapped blob traffic logs in
the udig set described by <udig>.
subsequent wrap/rolls will probably
never see these blob sets again.
however, the blobs themselves will
exist and probably be gettable. the
<udig> of the rolled udig will be in
the next wrap set as a "roll" blob
request record.
After the initial request by the client, all request replies are just
'ok' or 'no', where 'no' is always the last reply. See below for the
exact protocol.
On the blob server, named "biod", for each correct client request a
single traffic record describing the request is written to the file
server:$BLOBIO_ROOT/spool/biod.brr
Those traffic records are named "Blob Request Records" and abbreviated
as BRR. The BRR record format is modeled after the Call Detail Record
from the telecommunications industry:
https://en.wikipedia.org/wiki/Call_detail_record
The format of a single brr record is a line of tab separated fields
i.e, the typical unixy ascii record:
start_time # YYYY-MM-DDThh:mm:ss.ns[+-]HH:MM
netflow # [a-z][a-z0-9]{0,7}~[[:graph:]]{1,128}
verb # get/put/take/give/eat/wrap/roll
algorithm:digest # udig of the blob in request
chat_history # ok/no handshake between server&client
blob_size # unsigned 64 bit long <= 2^63
wall_duration # request wall duration in sec.ns>=0
All characters are ascii. Most important is that each client request
is associated with a request for a single blob described by a udig.
Also, BRR records are always syntactally correct as described above.
In other words, a request for a syntactically incorrect udig or an
unknown verb will not generate a BRR record.
The minimum size of a brr record is 95 bytes.
(10+1+8+1+9+1+5) + 1 == start_time
(1 + 1 + 1) + 1 >= netflow
(2) + 1 >= verb
(1 + 1 + 32) + 1 >= udig
(2) + 1 >= chat history
(1) + 1 >= blobio size
(1+1+9) + 1 >= wall duration
where nanoseconds in start_time and wall_duration is always 9 digits.
Seconds in wall duration is <= 10 digits (2^31).
The maximum size of a blob request record is 370 bytes.
(10+1+8+1+9+1+5) + 1 + = start request time
(8+1+128) + 1 + >= netflow
(8) + 1 + >= verb
(8+1+128) + 1 + >= algorithm:digest
(8) + 1 + >= chat history: max=ok,ok,ok
(19) + 1 + >= blob size
(10+1+9) >= wall duration
<= 370 bytes
not including a trailing new-line or terminating null character.
Since the length is <= 370 bytes, a BR Record can be broadcast in an
UDP packet. The BRR record format is cast in stone.
Protocol Flow:
>get udig\n # request for blob by udig
<ok\n[bytes][close] # server sends bytes of blob
<no\n[close] # server can not honor request
>put udig\n # request to put blog matching a udig
<ok\n # server ready for blob bytes
>[bytes] # send bytes to server
<ok\n # accepted bytes for blob
<no\n # rejects bytes for blob
<no\n # server can not honor request
>take udig\n # request for blob by udig
<ok\n[bytes] # server sends blob bytes
>ok\n # client has the blob
<ok[close] # server forgets the blob
<no[close] # server may not forget the blob
>no[close] # client rejects blob
<no\n[close] # server can not honor request
>give udig\n # request to put blog matching a udig
<ok\n # server ready for blob bytes
>[bytes] # send digested bytes to server
<ok\n # server accepts the bytes
>ok[close] # client probably forgets blob
>no[close] # client might remember the blob
<no\n[close] # server rejects blob give request
>eat udig\n # client requests server to verify blob
<ok\n[close] # blob exists and has been verified
<no\n[close] # server was unable to digest the blob
>wrap # request udig set of traffic logs
<ok\nudig\n[close] # udig of set of traffic logs
<no\n[close] # no logs available
>roll udig # udig set of traffic logs to forget
<ok\n[close] # logs forgotten by server. typically
# a wrap set.
<no\n[close] # not all logs in set forgotten
Warning: the protocol is ambiguous about how the end of the blob stream
is physically interpreted. This may turn out to be a serious design
flaw, particulary with regard to denial-of-service attacks. Essentially
the only indication that the end of the blob has been reached is that
the socket has closed; nothing in the protocol explicity indicates
"END of BLOB", other than the sequence of bytes matching the digest
signature. In other words, only the digest presented by the request
determines the correctness of the blob.
As blobio evolves, I (jmscott) expect a client friendly, RESTful and
inherently more complex caching protocol to be built in front of the
blobio protocol described above. In other words, a biod server never
talks to untrusted code, due to the denial of service susceptibility
described above.
Protocol Questions:
- should netflow be renamed service flow?
- should "put" wait for response ok/no response from server before
sending the blob?
- should wrap send the algorithm?
- what about forbidding ':' as a digest character?
- think about a "prove" command that allows the server to formally prove
the existence of a set of blobs, thus verifying the server is not
lying about storage. the obvious technique would be for the client
to send the udig of a randomly selected set of blobs for which the
server is expected to digest the cancatenation of the blobs.
- reimplement wrap as
wrap <udig>
the idea is that the final record of the wrap set will always contain
<udig>, creating a kind of audit trail. also insures no wrap set is
empty and sequential wraps could always return different udigs if the
given udig was unique. in shell parlance
test $(blobio wrap <udigA>) = $(blobio wrap <udigB>)
is always false, even when udigA = udigB.
Billing Model Questions:
- what to measure per billing cycle:
total count of blobs stored
total bytes stored
total bytes transfered on public interface
avg transfer bytes/duration > X
for blob size > Y
total number of eat requests/proof of retrievability
should a small numbers of plans exist or should a formula
determine cycle charges?
what about overages? a mulligan is nice for customer
satisfaction but rolling to next higher plan is draconian.
- duration of storage: per blob or per plan?
- should the blob request record include a "duration to first byte",
which measures the difference between start_time and time after the
first read or write to the client. the time to first byte can be
observed on the network, sort of. the sender will see a slightly
different time to first byte than the receiver, as with the
start_time.
To Do:
- casting text to udig_sha fails when prefixed by sha:
-- OK
select '44e946a2889b295eeb0b9d95efe34625cc830dde'::udig_sha;
-- FAILS
select 'sha:44e946a2889b295eeb0b9d95efe34625cc830dde'::udig_sha;
why not cast both versions?
- move log/*.[fx]dr files to spool and, after wrapping, to data/
- Should the variable BLOBIO_ROOT be renamed BLOBIO_INSTALL or
BLOBIO_DIST? Currently BLOBIO_ROOT means the actual install
directory and not the parent, which is the correct root.
- add option to biod to limit size of accepted blob
- investigate linux splice() system call.
- think about separate biod processes sharing the same data/ file system
but otherwise separate. or, explore union file systems.
- make the postgresql udig data type be an extension
- prove that every brr log contains a successful "wrap" other than the
the first brr log. critical to prove this.
- two quick wraps in a row could generate the same wrap set?
- should BRR log files always have unique digests, modulo other servers.
perhaps the first put request in the wrapped brr is a blob uniquely
summarizing the state of the server and the current wrap,request.
- contemplate running the server such that the empty blob always exists
- think about adding fadvise to file system blob reads.
seems natural since every open of blob file will read the entire blob.
- cleanup the code in server/sha_fs.c
- create an is_empty() function in pgsql udig datatype.
- add support for unix domain socket, std{in,out}, inetd, and libevent
- upon server start up on mac os x test for case sensitive file system
see http://www.mail-archive.com/[email protected]/msg66830.html
another idea under macosx would be to convert blob file path to lower
case.
- rewrite makefile for client only install, without golang requirement
- should the postgres datatype include a function is_zero_size(udig)?
- the postgres udig data type ought to be in the blobio schema.
- merge fdr2sql & cron-fdr2pg into flowd
- enable/disable specific commands get/put/give/take per server/per
subnet
- named pipes to flowd as another tail source
- signal handling needs to be pushed to main listen loop or cleaned
up with sigaction().
Note:
This simple protocol is officially named bio4. That needs to be
clarified in this document.