-
Notifications
You must be signed in to change notification settings - Fork 20
COUCHDB-769: Store attachments in the external storage. #33
base: master
Are you sure you want to change the base?
Conversation
…mplementation, supports OpenStack Swift and SoftLayer Object store
swift_delete_container(DbName). | ||
|
||
externalize_att(Db) -> | ||
Res = config:get("swift","attachments_offload","false"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option looks as boolean, but you use "external" value elsewhere in place of "true". If it pretend to be a flag, use config:get_boolean. Otherwise better pick less confusing name for default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, was confused. So it's flag. Definitely use config:get_boolean
and true/false atoms elsewhere.
Please fix:
|
NewAtt = couch_att:store(data,N1,Att), | ||
couch_log:debug("Swift. testing store in original length ~p~n",[AttLen]), | ||
{ObjectSize,EtagMD5} = swift_head_object(DbName, Name), | ||
NewAtt1 = couch_att:store([{att_external_size,ObjectSize},{att_external,"external"},{att_external_md5,EtagMD5}],NewAtt), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why you store "external" as list, no at atom, or at least as binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think that "external" value is not the best choice. Since field is already carries external word in the key, better specify which exact external we assume here. Today it's only swift, tomorrow it could be also S3/NFS/whatever which will have to use different integration logic.
Would be nice to see some tests for this to prove that this all works. |
@@ -0,0 +1,333 @@ | |||
%% @author gilv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general note that we don't include attributions in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right. And we need ASF license header here.
I'm not going to have time to review this in detail, but I agree it needs tests to go in, as well as considerable cleanup. Noting here that this should not be merged before 2.0 is released whatever happens. |
@kxepal @rnewson
|
++ " ~p in the container: ~n",[FileName, ContentLen, | ||
ContainerName]), | ||
%TO-DO: No chunk reader. All kept in the memory. Should be fixed. | ||
Data = couch_httpd:recv(Req, ContentLen), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be in swift backend module, at least. And certainly not couch_httpd (even if these two does the same thing) - just chttpd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kxepal . Thanks for review it.
fabric_at_handler is a layer between CouchDB and the backend store. It's not aware how backend store works or what it does internally. I completely agree with your other comments, for example "201" is something that belongs to the backend store and should be there.
Data = couch_httpd:recv(Req, ContentLen) is suppose to read data from the HTTP request, using the same mechanism as CouchDB works, so i use the same function to read data. Otherwise i will need to copy-paste the same code here.
Why is it different to function att_processor(DbName, Att)? That function also "aware" of CouchDB's internal functions, for example it uses couch_at module to update attachments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why it's different is simple: couch_httpd/chttpd are frontend modules that provides HTTP API for internal stuff like couch_att.
The reason why couch_httpd better to avoid use here as that this module is for backdoor interface while chttpd that used for the front, cluster one, may eventually start work by different rules.
The reason why chttpd/couch_httpd shouldn't be here is that fabric is a layer to run cluster wide operations after http request get processed.
Also, have you checked fabric:att_receiver/2
? Looks like exactly what you need here.
{ok, Container} -> | ||
couch_log:debug("Container ~p created", [Container]); | ||
{error,_} -> | ||
couch_log:debug("Container ~p creation failed", [DbNameSuffix]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't handle an error by logging, surely this is fatal?
there's a lot to review here. The huge number of :debug calls seems inappropriate, and often appears to be placeholders / reminders for future work (like error handling). I note also that indentation rules are still not followed (4 spaces, no 'pretty' alignment). |
{error, DbName} | ||
end; | ||
container(get, DbName) -> | ||
couch_log:debug("Get container ~p~n", [DbName]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??
Given the nature of the patch, this cannot merge without tests, assuming all style and other issues are resolved. |
@rnewson what is the style convention i should use? For example, there was a comment that my code should use 80 chars per line, but there is bunch of existing code with more than 80 in a line. |
|
@gilv |
@kxepal @rnewson
|
@kxepal Can you please review the recent code? |
@gilv i think |
@gilv I can read the code and leave few more comments about styling, naming and else not-much-important-but-annoying bits, but I think that would be much better see any tests and the way to try this at home. That's more important than yet-another-spacing issue. |
@lazedo I think there is no practically need: these options could be fetched within att_store/2 transparently for the API user during driver resolve and without abstraction leaks. Assume we have per-db driver configuration. Where such options will be stored? First case: in config. Then we do |
@kxepal i'm thinking in a way to have different provider per attachment/document. the use case is a document with two or more attachments where one could stored in a public accessible manner (dropbox, gdrive) and others more sensitive or private could be stored in another like evernote. |
@kxepal passing the |
@lazedo Per-attachment configuration involves custom attachment stub fields feature, so they all will be available for the passed Doc argument (in anyway it's a reasonable to store them within the document itself). But that would be a question how to store private information there and how they should be replicated. |
@kxepal maybe i'm missing the objective of this feature. as i see it, a external attachment would still create the stub information about how to reach the |
@lazedo I think I miss the case as well. Anyway, to give the proper answer on this need to implement few more different drivers against very different backends for point of their API. Then we can shape our API more-or-less right. Now it's more like a proof-of-concept, but need to start from something. |
Initial implementation that allows CouchDB to store attachments outside of the database file.
This implementation supports OpenStack Swift and SoftLayer Object store