Skip to content

Incantations

Adam Hooper edited this page Feb 15, 2018 · 19 revisions

Here are some collected command lines and SQL snippets that have proven useful during development of Overview

Command line recipies

Connect to the dev database

Afrer running ./dev to start the development server, run auto/psql.sh overview-dev to get a PSQL prompt connected to the development database

Run only specific tests

./sbt "testOnly *DocumentSetFileControllerSpec"

This will work for the main server (aka overview-web). To do this for another project, e.g. worker, common, do

./sbt "; project common; testOnly *ContentDispositionSpec"

More.

Really, really clean

Run the auto/clean-fully.sh script. Cleans up "everything SBT ever did."

See how much disk is being used

For an overview-local installation, make sure the server is running, then:

To check postgres: docker exec -it overview-database du -h -d 1 /var/lib/postgresql/data

To check blob storage (raw uploads, and PDFs for display): docker exec -it overview-worker du -h -d 1 /var/lib/overview/blob-storage/

To check elastic search: docker exec -it overview-searchindex du -h -d 1 /usr/share/elasticsearch/data

Vaccuum Postgres

Postgres should use substantially less disk than blob storage. If this is not the case, Postgres may need its large object table vacuumed. This is where uploads are stored, temporarily, before processing, and large uploads may fragment the table and make it unnecessarily large. To fix this,

docker exec -it overview-database psql -h localhost -U overview -c "vacuum full verbose pg_largeobject"

Dump a document set, like we do for clustering

First, run ./sbt worker/stage once.

Then something like: DATABASE_PORT=9010 DATABASE_NAME=overview-dev worker/target/universal/stage/bin/worker -main com.overviewdocs.clustering.DumpDocuments 1 > dump

Run an integration test

integration-test/run spec/file_upload_spec.rb

Notes:

  • ls integration-test/spec to see the full list of test files
  • It's always spec/. The path is relative to run, not relative to the current working directory.
  • Use run-browser to watch the test occur. (You must have vncviewer in your PATH.)
  • See overview-integration-tester docs for a more complete explanation

View logs

For overview-local, run dump-logs or tail-logs

For Overview on AWS, do overview-manage show-logs then browse to http://localhost:9292. To view document set conversion times click the "Load" icon in the upper-right, then choose "Conversion". You should see how many conversions we've had recently, and whether they've been quick, slow, very slow or ridiculously slow.

Useful SQL queries

Node IDs to their child node IDs:

SELECT node.parent_id AS id, node.id AS child_id, node.description AS child_description
FROM node
WHERE parent_id IN (2,91,171,799)
ORDER BY node.id;

Notes:

  • Replace 1,91,171,799 with the node IDs in the above query.
  • The ORDER BY keeps the tree ordering consistent, but it's not strictly necessary either, if the caller is going to sort the lists by other criteria.

Node IDs to document counts and document ID slices

SELECT
    node_id,
    document_count,
    document_id
FROM (
    SELECT
        nd.node_id,
        COUNT(nd.document_id) OVER (PARTITION BY nd.node_id) AS document_count,
        nd.document_id,
        RANK() OVER (PARTITION BY nd.node_id ORDER BY nd.document_id) AS pos
    FROM node_document nd
    WHERE nd.node_id IN (1,2,4,5,6,10,12,17,18,19,24,30,...)
    ORDER BY nd.document_id
    ) ss
WHERE ss.pos < 11

Example output

 node_id | document_count |              document_ids               
---------+----------------+-----------------------------------------
       1 |           2200 |           1
       1 |           2200 |           2
       1 |           2200 |           3
       1 |           2200 |           4
       1 |           2200 |           5
       1 |           2200 |           6
       1 |           2200 |           7
       1 |           2200 |           8
       1 |           2200 |           9
       1 |           2200 |          10
       2 |            172 |          16
       2 |            172 |          17
       2 |            172 |          42
       2 |            172 |          53
       2 |            172 |          85
       2 |            172 |         117
...

Notes:

  • Replace 1,2,... with the node IDs.
  • This iterates over all documents in all nodes. If the root node is included in the query (which is common), that means it iterates over all documents in the document set. It also sorts all document IDs from all nodes; we may want to change this to sort by document title, which would mean another table join and a more complex sort. We can avoid all these problems by pre-computing values and storing them in the node table.

DocumentSet Id to all tags, their document counts and the top 10 documents in each

SELECT
    tag_id,
    tag_name,
    document_count,
    document_id
FROM (
    SELECT
        t.id AS tag_id,
        t.name AS tag_name,
        COUNT(dt.document_id) OVER (PARTITION BY dt.tag_id) AS document_count,
        dt.document_id,
        RANK() OVER (PARTITION BY dt.tag_id ORDER BY dt.document_id) AS pos
    FROM tag t
    INNER JOIN document_tag dt ON t.id = dt.tag_id
    WHERE t.document_set_id = {document_set_id}
    ORDER BY t.name, dt.document_id
    ) ss
WHERE ss.pos < 11

Notes:

  • When a Tag has no Documents, this query will return one row for it. tag_id and tag_name will be what they should be; document_count will be 0 and document_id will be NULL.

Document Ids from Selection(nodes, tags, documents)

SELECT id, title, text_url, view_url FROM document
WHERE document.id IN (SELECT document_id FROM node_document WHERE node_id IN ({nodeIdList}) AND
WHERE document.id IN (SELECT document_id FROM document_tag WHERE tag_id IN ({tagList}) AND
WHERE document.id IN ({documentIdList})
ORDER BY id
LIMIT {end - start} OFFSET {start}

Notes:

  • If nodes, tags, or documents are not specified in selection, associated WHERE clause is removed.
  • Removed sorting by title
  • 0 <= start <= end
  • start == 0 means first row is included

Count of documents in Selection(nodes, tags, documents)

SELECT COUNT(*) FROM document
WHERE document.id IN (SELECT document_id FROM node_document WHERE node_id IN ({nodeIdList}) AND
WHERE document.id IN (SELECT document_id FROM document_tag WHERE tag_id IN ({tagList}) AND
WHERE document.id IN ({documentIdList})

Nodes to tags and tag document counts

SELECT nd.node_id, dt.tag_id, COUNT(*)
FROM node_document nd
INNER JOIN document_tag dt ON nd.document_id = dt.document_id
WHERE ...

Notes:

  • This query will be slow with large document sets. Perhaps we should cache the results in a new node_tag table?
  • If no documents in a node match a given tag, that tag will not show up in the results.

Nodes and a single tag to document counts

SELECT nd.node_id, COUNT(dt.tag_id)
FROM node_document nd
LEFT JOIN document_tag dt ON nd.document_id = dt.document_id AND dt.tag_id = {tagid}
WHERE nd.node_id IN (...)
GROUP BY nd.node_id

Notes:

  • Returns 0 for all nodes which do not include the given tag
  • Omits nodes which do not contain documents (which should never happen)

Find all tag IDs for given document IDs

SELECT dt.document_id, dt.tag_id
FROM document_tag dt
INNER JOIN tag t ON dt.tag_id = t.id
WHERE dt.document_id IN ([subselect])
ORDER BY dt.document_id, t.name

Notes:

  • the tag table is only included to give the proper ORDER.

Turn tooltips back on for a particular use

UPDATE "user" SET tree_tooltips_enabled = TRUE WHERE email = '[email protected]';

Clone this wiki locally