README

Synopsis:
	A trivial protocol to manage immutable blobs over a network.
Description:
	The blobio environment implements a simple client/server protocol for 
	associating crypto hash digests as keys for immutable, binary large
	objects stored on a network.  The blobs are referenced with a uri
	(uniform resource identifier) called a uniform digest, shortened to
	'udig', with a syntax like

		algorithm:digest

	where 'algorithm' is the hash algorithm, like 'sha' for example, and
	'digest' is the ascii hash value of the blob.  The colon character
	separates the algorithm and the digest.  The algorithm matches the
	perl5 regex
	
		^[a-z][a-z0-9]{0,7}$

	and the digest matches the perl5 regex

		^[[:graph:]]{32,128}$
		^[[:ascii:]{32,128}$

	An example udig is sha:cd50d19784897085a8d0e3e413f8612b097c03f1
	which describes the string "hello, world\n".

		echo 'hello, world' | shasum
		cd50d19784897085a8d0e3e413f8612b097c03f1  -

	The maximum size of an algorithm is 8 characters.  The maximum size
	of a digest is 128 characters, making the maximum size of a udig =
	8 + 128 + 1 (colon).

	The network protocol implements 7 verbs
	
		get <udig>		get a blob with a particular digest
		put <udig>		write a blob with a digest
		take <udig>		get a blob that the source may forget
		give <udig>		put a blob that the source may forget
		eat <udig>		untrusted digest of a blob by server

		wrap			bundle current unwrapped blob traffic
					logs into a single blob and return
					udig of the set of all blobs wrapped
					since the previous roll. the wrapped
					<udig>, returned to client, will
					be in the next wrap, which allows
					perpetual chaining.

		roll <udig>		forget all wrapped blob traffic logs in
					the udig set described by <udig>.
					subsequent wrap/rolls will probably
					never see these blob sets again.
					however, the blobs themselves will
					exist and probably be gettable.  the
					<udig> of the rolled udig will be in
					the next wrap set as a "roll" blob
					request record.

	After the initial request by the client, all request replies are just
	'ok' or 'no', where 'no' is always the last reply. See below for the
	exact protocol.

	On the blob server, named "biod", for each correct client request a
	single traffic record describing the request is written to the file

		server:$BLOBIO_ROOT/spool/biod.brr

	Those traffic records are named "Blob Request Records" and abbreviated
	as BRR.  The BRR record format is modeled after the Call Detail Record
	from the telecommunications industry:

		https://en.wikipedia.org/wiki/Call_detail_record

	The format of a single brr record is a line of tab separated fields
	i.e, the typical unixy ascii record:

		start_time		#  YYYY-MM-DDThh:mm:ss.ns[+-]HH:MM
		netflow			#  [a-z][a-z0-9]{0,7}~[[:graph:]]{1,128}
		verb			#  get/put/take/give/eat/wrap/roll
		algorithm:digest	#  udig of the blob in request
		chat_history		#  ok/no handshake between server&client
		blob_size		#  unsigned 64 bit long <= 2^63
		wall_duration		#  request wall duration in sec.ns>=0

	All characters are ascii.  Most important is that each client request
	is associated with a request for a single blob described by a udig.
	Also, BRR records are always syntactally correct as described above.
	In other words, a request for a syntactically incorrect udig or an
	unknown verb will not generate a BRR record.

	The minimum size of a brr record is 95 bytes.

		(10+1+8+1+9+1+5) + 1	==  start_time
		(1 + 1 + 1) + 1		>=  netflow
		(2) + 1			>=  verb
		(1 + 1 + 32) + 1	>=  udig
		(2) + 1			>=  chat history 
		(1) + 1			>=  blobio size
		(1+1+9) + 1		>=  wall duration

	where nanoseconds in start_time and wall_duration is always 9 digits.
	Seconds in wall duration is <= 10 digits (2^31).

	The maximum size of a blob request record is 370 bytes.
	
		(10+1+8+1+9+1+5) + 1 +	=   start request time
		(8+1+128) + 1 +		>=  netflow
		(8) + 1 +		>=  verb
		(8+1+128) + 1 +		>=  algorithm:digest
		(8) + 1 + 		>=  chat history: max=ok,ok,ok
		(19) + 1 +		>=  blob size
		(10+1+9)		>=  wall duration

		<= 370 bytes

	not including a trailing new-line or terminating null character.
	Since the length is <= 370 bytes, a BR Record can be broadcast in an
	UDP packet.  The BRR record format is cast in stone.

Protocol Flow:
 	>get udig\n		# request for blob by udig
	  <ok\n[bytes][close]	#   server sends bytes of blob
	  <no\n[close]		#   server can not honor request

 	>put udig\n		# request to put blog matching a udig
	  <ok\n			#   server ready for blob bytes
	    >[bytes]		#     send bytes to server
	      <ok\n		#       accepted bytes for blob
	      <no\n		#       rejects bytes for blob
	  <no\n			#   server can not honor request

  	>take udig\n		# request for blob by udig
  	    <ok\n[bytes]	#   server sends blob bytes
  		>ok\n		#     client has the blob
  		    <ok[close]	#       server forgets the blob
  		    <no[close]	#       server may not forget the blob
  		>no[close]	#     client rejects blob
  	    <no\n[close]	#   server can not honor request

 	>give udig\n		# request to put blog matching a udig
	  <ok\n			#   server ready for blob bytes
	    >[bytes]		#     send digested bytes to server
  	      <ok\n		#   server accepts the bytes
  	        >ok[close]	#     client probably forgets blob
  		>no[close]	#     client might remember the blob
  	  <no\n[close]		#   server rejects blob give request

	>eat udig\n		# client requests server to verify blob
	    <ok\n[close]	#   blob exists and has been verified
	    <no\n[close]	#   server was unable to digest the blob

	>wrap			# request udig set of traffic logs
	    <ok\nudig\n[close]	#   udig of set of traffic logs
	    <no\n[close]	#   no logs available

	>roll udig		# udig set of traffic logs to forget
	    <ok\n[close]	# logs forgotten by server.  typically
	    			# a wrap set.
	    <no\n[close]	# not all logs in set forgotten

	Warning: the protocol is ambiguous about how the end of the blob stream
	is physically interpreted.  This may turn out to be a serious design
	flaw, particulary with regard to denial-of-service attacks.  Essentially
	the only indication that the end of the blob has been reached is that 
	the socket has closed;  nothing in the protocol explicity indicates
	"END of BLOB", other than the sequence of bytes matching the digest
	signature.  In other words, only the digest presented by the request
	determines the correctness of the blob.

	As blobio evolves, I (jmscott) expect a client friendly, RESTful and
	inherently more complex caching protocol to be built in front of the
	blobio protocol described above.  In other words, a biod server never
	talks to untrusted code, due to the denial of service susceptibility
	described above.

Protocol Questions:
	-  should netflow be renamed service flow?

	-  should "put" wait for response ok/no response from server before
	   sending the blob?

	- should wrap send the algorithm?

	- what about forbidding ':' as a digest character?

        - think about a "prove" command that allows the server to formally prove
	  the existence of a set of blobs, thus verifying the server is not
	  lying about storage.  the obvious technique would be for the client
	  to send the udig of a randomly selected set of blobs for which the
	  server is expected to digest the cancatenation of the blobs.

	- reimplement wrap as

		wrap <udig>

	  the idea is that the final record of the wrap set will always contain
	  <udig>, creating a kind of audit trail.  also insures no wrap set is
	  empty and sequential wraps could always return different udigs if the
	  given udig was unique. in shell parlance

	  	test $(blobio wrap <udigA>) = $(blobio wrap <udigB>)

	  is always false, even when udigA = udigB.

Billing Model Questions:
	- what to measure per billing cycle:
		total count of blobs stored
		total bytes stored
		total bytes transfered on public interface
		avg transfer bytes/duration > X
			for blob size > Y
		total number of eat requests/proof of retrievability

		should a small numbers of plans exist or should a formula
		determine cycle charges?

		what about overages? a mulligan is nice for customer
		satisfaction but rolling to next higher plan is draconian.

	- duration of storage: per blob or per plan?

	- should the blob request record include a "duration to first byte",
	  which measures the difference between start_time and time after the
	  first read or write to the client.  the time to first byte can be
	  observed on the network, sort of.  the sender will see a slightly
	  different time to first byte than the receiver, as with the
	  start_time.

To Do:
	- casting text to udig_sha fails when prefixed by sha:

		--  OK
		select '44e946a2889b295eeb0b9d95efe34625cc830dde'::udig_sha;

		-- FAILS
		select 'sha:44e946a2889b295eeb0b9d95efe34625cc830dde'::udig_sha;
	
	  why not cast both versions?

	- move log/*.[fx]dr files to spool and, after wrapping, to data/

	- Should the variable BLOBIO_ROOT be renamed BLOBIO_INSTALL or
	  BLOBIO_DIST?  Currently BLOBIO_ROOT means the actual install 
	  directory and not the parent, which is the correct root.

	- add option to biod to limit size of accepted blob

	- investigate linux splice() system call.

	- think about separate biod processes sharing the same data/ file system
	  but otherwise separate.  or, explore union file systems.

	- make the postgresql udig data type be an extension

	- prove that every brr log contains a successful "wrap" other than the
	  the first brr log.  critical to prove this.

	- two quick wraps in a row could generate the same wrap set?

	- should BRR log files always have unique digests, modulo other servers.
	  perhaps the first put request in the wrapped brr is a blob uniquely
	  summarizing the state of the server and the current wrap,request.

	- contemplate running the server such that the empty blob always exists

	- think about adding fadvise to file system blob reads.
          seems natural since every open of blob file will read the entire blob.

	- cleanup the code in server/sha_fs.c

	- create an is_empty() function in pgsql udig datatype.

	- add support for unix domain socket, std{in,out}, inetd, and libevent

	- upon server start up on mac os x test for case sensitive file system

	     see http://www.mail-archive.com/wine-devel@winehq.org/msg66830.html

	  another idea under macosx would be to convert blob file path to lower
	  case.

	- rewrite makefile for client only install, without golang requirement

	- should the postgres datatype include a function is_zero_size(udig)?

	- the postgres udig data type ought to be in the blobio schema.

	- merge fdr2sql & cron-fdr2pg into flowd

	- enable/disable specific commands get/put/give/take per server/per
	  subnet

	- named pipes to flowd as another tail source

	- signal handling needs to be pushed to main listen loop or cleaned
 	  up with sigaction().
Note:
	This simple protocol is officially named bio4.  That needs to be
	clarified in this document.