You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 25, 2020. It is now read-only.
This ticket is to dramatically simplify the file system in a way that is simpler to maintain and is friendly to other tickets and future directions for Precog.
The capabilities of the file system interface must be the intersection of capabilities in a local file system and the HDFS file system used in Hadoop.
Ingest should leverage a write-oriented view of a file system to store data, while query should leverage a read-oriented view of a file system to read data.
The file system model does not need to support any versioning, nor does it need to support atomicity beyond that provided by the intersection of capabilities noted above. The surface area of the file system model should be minimal and limited to whatever is needed to implement the functionality we need now.
Although it's not necessary to completely formalize it at this point, in general, a distributed file system will be capable of executing a subset of operations that are expressible in the DAG that describes a query; and this subset will depend on the exact nature of the file system (e.g. HDFS, Tachyon, Ceph, etc.), as well as the path at which the data is being accessed (in the case of file systems that support mounting). Even a local system that has compact encoding for some file types might support pushing down operations such as "projection" (for a column-oriented file format) or "filtering" (for an indexed view of data). In fact, we could implement a layered file system approach where a NIHDB-encoded file would be handled by a NIHDB file system capable of efficiently handling operations for which acceleration is possible.
Read View
read file
list children
retrieve size
Write View
create file (empty)
create file (with contents)
append file
delete file
rename file
Care should be taken when defining these interfaces so that good implementations are possible on MongoDB and other systems to which Precog might be ported in the future.
This ticket will be considered complete when the internal file system model has been simplified to resemble HDFS / Apache Common FS, when the REST API for the file system supports the semantics of the file system model, and when there is ample documentation and tests for all of the above.
This ticket is to dramatically simplify the file system in a way that is simpler to maintain and is friendly to other tickets and future directions for Precog.
The capabilities of the file system interface must be the intersection of capabilities in a local file system and the HDFS file system used in Hadoop.
Ingest should leverage a write-oriented view of a file system to store data, while query should leverage a read-oriented view of a file system to read data.
The file system model does not need to support any versioning, nor does it need to support atomicity beyond that provided by the intersection of capabilities noted above. The surface area of the file system model should be minimal and limited to whatever is needed to implement the functionality we need now.
Although it's not necessary to completely formalize it at this point, in general, a distributed file system will be capable of executing a subset of operations that are expressible in the DAG that describes a query; and this subset will depend on the exact nature of the file system (e.g. HDFS, Tachyon, Ceph, etc.), as well as the path at which the data is being accessed (in the case of file systems that support mounting). Even a local system that has compact encoding for some file types might support pushing down operations such as "projection" (for a column-oriented file format) or "filtering" (for an indexed view of data). In fact, we could implement a layered file system approach where a NIHDB-encoded file would be handled by a NIHDB file system capable of efficiently handling operations for which acceleration is possible.
Read View
Write View
Care should be taken when defining these interfaces so that good implementations are possible on MongoDB and other systems to which Precog might be ported in the future.
See the following for Hadoop's file system:
And the following for Apache's Common FS:
-http://commons.apache.org/proper/commons-vfs/
This ticket will be considered complete when the internal file system model has been simplified to resemble HDFS / Apache Common FS, when the REST API for the file system supports the semantics of the file system model, and when there is ample documentation and tests for all of the above.
See here for more documentation on the REST API for the exposed file system: https://docs.google.com/document/d/1j43rvBNPvV7sDpO5l9vUXqtO9IPO-oWT2_tF8fMJEt0/edit?usp=sharing
Comment on the ticket for clarification.
The text was updated successfully, but these errors were encountered: