Simplify file system model #527

jdegoes · 2013-10-05T23:37:42Z

This ticket is to dramatically simplify the file system in a way that is simpler to maintain and is friendly to other tickets and future directions for Precog.

The capabilities of the file system interface must be the intersection of capabilities in a local file system and the HDFS file system used in Hadoop.

Ingest should leverage a write-oriented view of a file system to store data, while query should leverage a read-oriented view of a file system to read data.

The file system model does not need to support any versioning, nor does it need to support atomicity beyond that provided by the intersection of capabilities noted above. The surface area of the file system model should be minimal and limited to whatever is needed to implement the functionality we need now.

Although it's not necessary to completely formalize it at this point, in general, a distributed file system will be capable of executing a subset of operations that are expressible in the DAG that describes a query; and this subset will depend on the exact nature of the file system (e.g. HDFS, Tachyon, Ceph, etc.), as well as the path at which the data is being accessed (in the case of file systems that support mounting). Even a local system that has compact encoding for some file types might support pushing down operations such as "projection" (for a column-oriented file format) or "filtering" (for an indexed view of data). In fact, we could implement a layered file system approach where a NIHDB-encoded file would be handled by a NIHDB file system capable of efficiently handling operations for which acceleration is possible.

Read View

read file
list children
retrieve size

Write View

create file (empty)
create file (with contents)
append file
delete file
rename file

Care should be taken when defining these interfaces so that good implementations are possible on MongoDB and other systems to which Precog might be ported in the future.

See the following for Hadoop's file system:

http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html

And the following for Apache's Common FS:
-http://commons.apache.org/proper/commons-vfs/

This ticket will be considered complete when the internal file system model has been simplified to resemble HDFS / Apache Common FS, when the REST API for the file system supports the semantics of the file system model, and when there is ample documentation and tests for all of the above.

See here for more documentation on the REST API for the exposed file system: https://docs.google.com/document/d/1j43rvBNPvV7sDpO5l9vUXqtO9IPO-oWT2_tF8fMJEt0/edit?usp=sharing

Comment on the ticket for clarification.

ghost assigned jdegoes Dec 5, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify file system model #527

Simplify file system model #527

jdegoes commented Oct 5, 2013

Simplify file system model #527

Simplify file system model #527

Comments

jdegoes commented Oct 5, 2013