Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dry up append/delete on draftsets (and probably other routes too) #215

Closed
RickMoynihan opened this issue Oct 27, 2017 · 0 comments
Closed

Comments

@RickMoynihan
Copy link
Member

RickMoynihan commented Oct 27, 2017

DRYing up append/delete should be applied throughout the whole request stack, as the duplication and maintainance burden on this code is getting pretty bad. Additionally in production it appears that some large delete jobs are dying partway through without reporting an error back, leaving the PMD lock screen up. As we don't usually see this for deletes I suspect it might be due to differences between append/delete which should be tidied up and removed.

In theory the primary difference between an append and a delete should just be whether we send a DELETE DATA { .... } or an APPEND DATA { ... } at the bottom. As an aside grafter.rdf has functions append-batched and delete-batched which could possibly be used here too; though they also need to be dried up Swirrl/grafter#115.

Some examples of the lack of DRYNess here:

(make-route :delete "/draftset/:id/data"
(as-draftset-owner
(require-rdf-content-type
(require-graph-for-triples-rdf-format
(temp-file-body
(fn [{{draftset-id :draftset-id
graph :graph
rdf-format :rdf-format} :params body :body :as request}]
(let [delete-job (if (is-quads-format? rdf-format)
(dsmgmt/delete-quads-from-draftset-job backend draftset-id body rdf-format)
(dsmgmt/delete-triples-from-draftset-job backend draftset-id graph body rdf-format))]
(submit-async-job! delete-job))))))))

Is almost identical to:

(make-route :put "/draftset/:id/data"
(as-draftset-owner
(require-rdf-content-type
(require-graph-for-triples-rdf-format
(temp-file-body
(fn [{{draftset-id :draftset-id
rdf-format :rdf-format
graph :graph} :params body :body :as request}]
(if (is-quads-format? rdf-format)
(let [append-job (dsmgmt/append-data-to-draftset-job backend draftset-id body rdf-format)]
(submit-async-job! append-job))
(let [append-job (dsmgmt/append-triples-to-draftset-job backend draftset-id body rdf-format graph)]
(submit-async-job! append-job)))))))))

Then lower down the code is duplicated lots with arbitrary differences (only a few ideally almost none of which are necessary): e.g.

This:

(defn- append-draftset-quads [backend draftset-ref live->draft quad-batches {:keys [op job-started-at] :as state} job]
(case op
:append
(append-draftset-quads* quad-batches live->draft backend job-started-at job draftset-ref state)
:copy-graph
(copy-graph-for-append* state draftset-ref backend live->draft quad-batches job)))
(defn- append-quads-to-draftset-job [backend draftset-ref quads]
(ajobs/create-job :background-write
(fn [job]
(let [graph-map (get-draftset-graph-mapping backend draftset-ref)
quad-batches (util/batch-partition-by quads context jobs/batched-write-size)
now (java.util.Date.)]
(append-draftset-quads backend draftset-ref graph-map quad-batches {:op :append :job-started-at now} job)))))
(defn append-data-to-draftset-job [backend draftset-ref tempfile rdf-format]
(append-quads-to-draftset-job backend draftset-ref (read-statements tempfile rdf-format)))
(defn append-triples-to-draftset-job [backend draftset-ref tempfile rdf-format graph]
(let [triples (read-statements tempfile rdf-format)
quads (map (comp map->Quad #(assoc % :c graph)) triples)]
(append-quads-to-draftset-job backend draftset-ref quads)))
and this:
(defn- batch-and-delete-quads-from-draftset [backend quads draftset-ref live->draft job]
(let [quad-batches (util/batch-partition-by quads context jobs/batched-write-size)
now (java.util.Date.)]
(delete-quads-from-draftset backend quad-batches draftset-ref live->draft {:op :delete :job-started-at now} job)))
(defn delete-quads-from-draftset-job [backend draftset-ref serialised rdf-format]
(jobs/make-job :background-write [job]
(let [backend (get-draftset-executor {:backend backend :draftset-ref draftset-ref :union-with-live? false})
quads (read-statements serialised rdf-format)
graph-mapping (get-draftset-graph-mapping backend draftset-ref)]
(batch-and-delete-quads-from-draftset backend quads draftset-ref graph-mapping job))))
(defn delete-triples-from-draftset-job [backend draftset-ref graph serialised rdf-format]
(jobs/make-job :background-write [job]
(let [backend (get-draftset-executor {:backend backend :draftset-ref draftset-ref :union-with-live? false})
triples (read-statements serialised rdf-format)
quads (map #(util/make-quad-statement % graph) triples)
graph-mapping (get-draftset-graph-mapping backend draftset-ref)]
(batch-and-delete-quads-from-draftset backend quads draftset-ref graph-mapping job))))
are almost the same.

Then slightly lower still we have a code tidy attempt applied to append but not delete:

(declare append-draftset-quads)
(defn append-data-batch!
"Appends a sequence of triples to the given draft graph."
[conn graph-uri triple-batch]
;;NOTE: The remote sesame client throws an exception if an empty transaction is committed
;;so only create one if there is data in the batch
(when-not (empty? triple-batch)
;;WARNING: This assumes the backend is a sesame backend which is
;;true for all current backends.
(sparql/add conn graph-uri triple-batch)))
(defn- append-draftset-quads*
[quad-batches live->draft backend job-started-at job draftset-ref state]
(if-let [batch (first quad-batches)]
(let [{:keys [graph-uri triples]} (quad-batch->graph-triples batch)]
(if-let [draft-graph-uri (get live->draft graph-uri)]
(do
(append-data-batch! backend draft-graph-uri triples)
(mgmt/set-modifed-at-on-draft-graph! backend draft-graph-uri job-started-at)
(let [next-job (ajobs/create-child-job
job
(partial append-draftset-quads backend draftset-ref live->draft (rest quad-batches) (merge state {:op :append})))]
(writes/queue-job! next-job)))
;;NOTE: do this immediately instead of scheduling a
;;continuation since we haven't done any real work yet
(append-draftset-quads backend draftset-ref live->draft quad-batches (merge state {:op :copy-graph :graph graph-uri}) job)))
(jobs/job-succeeded! job)))
(defn- copy-graph-for-append*
[state draftset-ref backend live->draft quad-batches job]
(let [live-graph-uri (:graph state)
ds-uri (str (ds/->draftset-uri draftset-ref))
{:keys [draft-graph-uri graph-map]} (mgmt/ensure-draft-exists-for backend live-graph-uri live->draft ds-uri)]
(lock-writes-and-copy-graph backend live-graph-uri draft-graph-uri {:silent true})
;; Now resume appending the batch
(append-draftset-quads backend draftset-ref graph-map quad-batches (merge state {:op :append}) job)))
(defn- append-draftset-quads [backend draftset-ref live->draft quad-batches {:keys [op job-started-at] :as state} job]
(case op
:append
(append-draftset-quads* quad-batches live->draft backend job-started-at job draftset-ref state)
:copy-graph
(copy-graph-for-append* state draftset-ref backend live->draft quad-batches job)))

(defn- delete-quads-from-draftset [backend quad-batches draftset-ref live->draft {:keys [op job-started-at] :as state} job]
(case op
:delete
(if-let [batch (first quad-batches)]
(let [live-graph (context (first batch))]
(if (mgmt/is-graph-managed? backend live-graph)
(if-let [draft-graph-uri (get live->draft live-graph)]
(do
(with-open [conn (repo/->connection (->sesame-repo backend))]
(mgmt/set-modifed-at-on-draft-graph! conn draft-graph-uri job-started-at)
(let [rewritten-statements (map #(rewrite-statement live->draft %) batch)
sesame-statements (map IStatement->sesame-statement rewritten-statements)
graph-array (into-array Resource (map util/uri->sesame-uri (vals live->draft)))]
(.remove conn sesame-statements graph-array)))
(let [next-job (ajobs/create-child-job
job
(partial delete-quads-from-draftset backend (rest quad-batches) draftset-ref live->draft state))]
(writes/queue-job! next-job)))
;;NOTE: Do this immediately as we haven't done any real work yet
(recur backend quad-batches draftset-ref live->draft (merge state {:op :copy-graph :live-graph live-graph}) job))
;;live graph does not exist so do not create a draft graph
;;NOTE: This is the same behaviour as deleting a live graph
;;which does not exist in live
(recur backend (rest quad-batches) draftset-ref live->draft state job)))
(let [draftset-info (get-draftset-info backend draftset-ref)]
(jobs/job-succeeded! job {:draftset draftset-info})))
:copy-graph
(let [{:keys [live-graph]} state
ds-uri (str (ds/->draftset-uri draftset-ref))
{:keys [draft-graph-uri graph-map]} (mgmt/ensure-draft-exists-for backend live-graph live->draft ds-uri)]
(lock-writes-and-copy-graph backend live-graph draft-graph-uri {:silent true})
;; Now resume appending the batch
(recur backend
quad-batches
draftset-ref
(assoc live->draft live-graph draft-graph-uri)
(merge state {:op :delete})
job))))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants