Skip to content
This repository has been archived by the owner on Feb 25, 2020. It is now read-only.

Simplify file system model #527

Open
jdegoes opened this issue Oct 5, 2013 · 0 comments
Open

Simplify file system model #527

jdegoes opened this issue Oct 5, 2013 · 0 comments
Assignees

Comments

@jdegoes
Copy link
Contributor

jdegoes commented Oct 5, 2013

This ticket is to dramatically simplify the file system in a way that is simpler to maintain and is friendly to other tickets and future directions for Precog.

The capabilities of the file system interface must be the intersection of capabilities in a local file system and the HDFS file system used in Hadoop.

Ingest should leverage a write-oriented view of a file system to store data, while query should leverage a read-oriented view of a file system to read data.

The file system model does not need to support any versioning, nor does it need to support atomicity beyond that provided by the intersection of capabilities noted above. The surface area of the file system model should be minimal and limited to whatever is needed to implement the functionality we need now.

Although it's not necessary to completely formalize it at this point, in general, a distributed file system will be capable of executing a subset of operations that are expressible in the DAG that describes a query; and this subset will depend on the exact nature of the file system (e.g. HDFS, Tachyon, Ceph, etc.), as well as the path at which the data is being accessed (in the case of file systems that support mounting). Even a local system that has compact encoding for some file types might support pushing down operations such as "projection" (for a column-oriented file format) or "filtering" (for an indexed view of data). In fact, we could implement a layered file system approach where a NIHDB-encoded file would be handled by a NIHDB file system capable of efficiently handling operations for which acceleration is possible.

Read View

  • read file
  • list children
  • retrieve size

Write View

  • create file (empty)
  • create file (with contents)
  • append file
  • delete file
  • rename file

Care should be taken when defining these interfaces so that good implementations are possible on MongoDB and other systems to which Precog might be ported in the future.

See the following for Hadoop's file system:

And the following for Apache's Common FS:
-http://commons.apache.org/proper/commons-vfs/

This ticket will be considered complete when the internal file system model has been simplified to resemble HDFS / Apache Common FS, when the REST API for the file system supports the semantics of the file system model, and when there is ample documentation and tests for all of the above.

See here for more documentation on the REST API for the exposed file system: https://docs.google.com/document/d/1j43rvBNPvV7sDpO5l9vUXqtO9IPO-oWT2_tF8fMJEt0/edit?usp=sharing

Comment on the ticket for clarification.

@ghost ghost assigned jdegoes Dec 5, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant