-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
surprising performance issue #443
Comments
@campoy could you send the result of the queries but adding at the beginning |
With current release these queries take around 1 minute in my machine. Have you been using indexes to go that fast? The last two queries return 0 rows as the string returned by |
SELECT t.tree_entry_name AS NAME, b.blob_content AS CODE
FROM refs r
JOIN commits c ON r.commit_hash = c.commit_hash
JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
JOIN tree_entries t ON ct.tree_hash = t.tree_hash
JOIN blobs b ON t.blob_hash = b.blob_hash
WHERE LANGUAGE(t.tree_entry_name, b.blob_content) = 'Go'; |
There is no index involved, because so far we cannot create an index across multiple columns from different tables and |
I have executed all three queries and the results in terms of time are quite similar:
Most likely, what happened is that you saw no output because every single row was being filtered due to I think for the sake of user friendliness we should either output lowercased language names or specify in the documentation the correct names for all languages. WDYT @ajnavarro? Correct second query: SELECT
t.tree_entry_name as name,
b.blob_content as code
FROM refs r
JOIN commits c ON r.commit_hash = c.commit_hash
JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
JOIN tree_entries t ON ct.tree_hash = t.tree_hash
JOIN blobs b ON t.blob_hash = b.blob_hash
WHERE LANGUAGE(t.tree_entry_name, b.blob_content) = 'go' Correct third query: SELECT name, code
FROM
(
SELECT
LANGUAGE(t.tree_entry_name, b.blob_content) as lang,
t.tree_entry_name as name,
b.blob_content as code
FROM refs r
JOIN commits c ON r.commit_hash = c.commit_hash
JOIN commit_trees ct ON c.commit_hash = ct.commit_hash
JOIN tree_entries t ON ct.tree_hash = t.tree_hash
JOIN blobs b ON t.blob_hash = b.blob_hash
) as blobs
WHERE lang = 'Go' That said, if what you were aiming for was getting all Go code from the repo, this is a more concise and less expensive query to to that: SELECT tree_entry_name as name, blob_content as code
FROM tree_entries NATURAL JOIN blobs
WHERE LANGUAGE(tree_entry_name, blob_content) = 'Go' |
personally I think LANGUAGE should return all lowercases. There is no point why we need to capitalise lang's name. |
@kuba-- totally agree |
@erizocosmico @kuba-- I cannot think in any other downside than old queries that are using correct Enry strings will fail. Maybe is totally OK. WDYT @smola ? |
UAST might break if you change LANGUAGE to lowercase. You should check that. Other than that, it should be ok. |
Another option is adding support for |
@smola UAST already lowercases everything, afaik. But lower and upper + adding to the docs the list of languages returned by enry sounds good too. |
👍 to adding a list of languages by enry. bblfsh has the same issue. We could document that list in src-d/enry README.md and link it from both projects. |
Opening new issue to handle the real problem: #658 |
I just ran this query on top of github.com/golang/go:
This finishes in 0.65s 🎉
Unfortunately this other request takes forever:
Trying to see whether I could find a workaround I wrote this second query:
Both of these requests take too long for me to wait.
The text was updated successfully, but these errors were encountered: