Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial nix language support #1911

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

fidgetingbits
Copy link
Collaborator

@fidgetingbits fidgetingbits commented Sep 24, 2023

EDIT: I removed the original PR text, as no longer relevant and will have made it harder to review.

This PR adds nix language support.

Checklist

  • Recorded tests for the new language
  • Used "change" / "clear" instead of "take" for selection tests to make recorded tests easier to read
  • Added a few specific tests that use "chuck" instead of "change" to test removal behaviour when it's interesting, especially:
    • "chuck arg" with single argument in list
    • "chuck arg" with multiple arguments in list
    • "chuck item" with single argument in list
    • "chuck item" with multiple arguments in list
  • Added @textFragment captures. Usually you want to put these on comment and string nodes. This enables "take round" to work within comments and strings.
  • Added a test for "change round" inside a string, eg "hello (there)"
  • [-] Supported "type" both for type annotations (eg foo: string) and declarations (eg interface Foo {}) (and added tests for this behaviour 😊)
  • Supported "item" both for map pairs and list entries (with tests of course)
Supported Tested Term Capture Definition Comment
list @list List type equivalent -
inside list @list.interior Inside of a list -
map @map Dictionary type equivalent -
inside map @map.interior Inside of a dictionary -
key @collectionKey Dictionary key equivalent -
funk @namedFunction A named function declaration -
inside funk @namedFunction.interior The inside of a lambda declaration -
funk name @functionName Name of declared function -
lambda @anonymousFunction A lambda declaration -
inside lambda @anonymousFunction.interior The inside of a lambda declaration -
name @name Variable name -
value @value Right-hand-side value of an assignment -
_ _ value @value Value returned from a function -
value @value Value of a key-value pair -
state @statement Any single coded statement -
if state @ifStatement An if conditional block -
condition @condition Condition of an if block -
_ _ condition @condition Condition of a while loop -
_ _ condition @condition Condition of a do while style loop -
_ _ condition @condition Condition of a for loop -
_ _ condition @condition Condition of a ternary expression -
branch @branch The resulting code associated with a conditional expression -
inside branch @branch.interior The statements associated with the body of a conditional expression -
comment @comment Single line code comment -
comment @comment Multi-line code comment -
string @string Single line strings -
string @string Multi-line strings -
- @textFragment Used to capture string-type nodes (strings and comments) -
call @functionCall A function call (not a function definition) -
callee @functionCallee Name of the function being called -
arg @argumentOrParameter Arguments to function definition -
arg @argumentOrParameter Arguments to function calls -
arg @argumentOrParameter Interpolated variables in strings -
_ _ class @class Class or structure declaration -
_ _ inside class @class.interior The inside of a class declaration -
_ _ class name @className Name of class or structure declaration -
_ _ type @type Type declarations -

Checklist

@fidgetingbits fidgetingbits marked this pull request as draft September 27, 2023 01:42
Copy link
Member

@pokey pokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you're making some good progress! I would advise recording tests while you're developing rather than at the end. If something works like you want it to, record a test. That way you won't break it later. It's ok if a scope doesn't work as you'd like it to in some ways: I would still advise recording tests to capture the ways in which it works like you do want it to.
Recording tests is absolutely trivial once you're set up; takes about as long as manually trying a command. In addition, it's really hard for me to review this code without tests, so I won't be able to give you early feedback, because I'm not intimately familiar with tree-sitter-nix 😅

@pokey
Copy link
Member

pokey commented Sep 28, 2023

Ok thanks for the tests, though we prefer "change" instead of "take" because it makes the tests much easier to read. See the PR template. Also, "chuck" tests are only necessary in cases where "chuck" is interesting, eg removing commas, etc

If you don't want to re-record the "take" tests, you can just find and replace setSelection to clearAndSetSelection in your recorded tests, and then run the "update test fixtures" launch config, which should handle the rest for you

@auscompgeek
Copy link
Member

I'd be happy to try this out on some real files once the corresponding parse-tree PR is merged.

@fidgetingbits
Copy link
Collaborator Author

fidgetingbits commented Oct 2, 2023

One thing I can't entirely decide is if take lambda from the right most lambda should capture all curried lambdas as one highlight or should individually select each (the way it works currently). Taking all curried lambdas would make it feel more like other languages when dealing with functions, which seems nice.

I have no functional programming background, so someone with experience might think differently?

To be clear, in nix technically x = a: b: a + b is two lambda expressions. So if the cursor is in a + b I say take lambda I get:

image

But if the cursor is in the a: part I say take lambda I get:

image

This also gets a bit cumbersome if your cursor is in the a + b and say take every arg and expect it to highlight both a and b. Currently it would only select b. I think I'd rather it select all.

I think one possible exception to this should be if one of the lambdas is explicitly wrapped in () which to me implies some sort of explicit separation vs a regular multi-argument lambda?

Maybe this was already discussed/decided for other languages, but I didn't check yet.

Worth noting that atm I also made it so above works as a single "named function" where x is the function name, so if you say take funk you get:

image

and take funk name:

image

Not sure what to think of that yet, or how it currently works in other languages. But my thinking is because nix is only lambda functions, meaning named functions won't be used elsewhere, this might be nice to use.

@pokey
Copy link
Member

pokey commented Oct 2, 2023

huh interesting. so is currying the only way to define a lambda with multiple parameters? If so, then I think I'd lean towards treating a: b: a+b as lambda, and having "every arg" iterate over a and b. Lmk if you need help getting that to work

@auscompgeek @josharian @AndreasArvidsson curious to hear your thoughts tho

Copy link
Member

@pokey pokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really good! Excellent work. Left a bunch of small comments

queries/nix.scm Show resolved Hide resolved
queries/nix.scm Show resolved Hide resolved
queries/nix.scm Outdated Show resolved Hide resolved
queries/nix.scm Outdated Show resolved Hide resolved
queries/nix.scm Outdated Show resolved Hide resolved
queries/nix.scm Outdated Show resolved Hide resolved
@fidgetingbits fidgetingbits mentioned this pull request Oct 28, 2023
10 tasks
@fidgetingbits
Copy link
Collaborator Author

I'd be happy to try this out on some real files once the corresponding parse-tree PR is merged.

@auscompgeek fyi nix support merged into parse-tree a little while ago in case you didn't see.

(#not-parent-type? @functionCall apply_expression)
)

;; This is gross, but not sure how to do fuzzy matching against an unknown number of
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if someone is aware of a better way to do this? I couldn't see anything in other .scm files, the tree sitter query documentation, or chatgpt. Assuming there isn't something I missed, it almost seems like something a special predicate could handle more elegantly?

queries/nix.scm Outdated
"assert" @statement.start
condition: (_)
";" @statement.end
body: (_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok a couple surprising things when exploring the assert flie:

"call"

image

To be honest, I wasn't expecting asserts to match "call" at all. It feels more like a statement to me than calling an "assert" function. But I could see that going either way cc/ @auscompgeek

The more surprising thing is that while the range seems reasonable (assuming we want asserts to be calls), the domain was quite surprising. It seems to contain everything after the assert statement. Hence the increasingly light color of each subsequent assert in the above screenshot.

I'm guessing that has something to do with this whole thing being pure functional, so it's kinda like how let x in ... technically includes the ... in langs like clojure. But I'm not sure it's desirable behaviour for the "call" scope

This "call" in particular confused me:

image

Should it not include "a"? If not, should it really include the ;? Is that assert statement even valid? I made it up 😅

"state"

image

Agreed that assert should be a statement, and I'm happy with your ranges here. However, as with "call" above, it seems to be exhibiting the same surprising nested domain behaviour

And agreed that it shouldn't be a statement if it's the rhs of an assignment. Tbh I have no idea what that's supposed to do, but it doesn't look like a statement to me 😅

"branch"

image

This one was also surprising to me. I guess it makes more sense in a one-line assert? cc/ @auscompgeek

"callee"

image

As above, the fact this is a "callee", as well as the domain, were surprising to me

"condition"

image

This one makes the most sense of any to me, even the domain. I guess if people see assert statements this way, then maybe the other domains make sense as well? Still not totally sure bout "call" tho

Tests

Looks like there are no tests for assert

  • Assert as statement
  • Assert condition
  • Assert as call, if we decide to keep that
  • Assert callee, if we decide to keep that
  • Assert branch, if we decide to keep that
  • Negative test for assert not being statement when on rhs of assignment (say "cursorless record error" and then do "change state")

@pokey
Copy link
Member

pokey commented Nov 8, 2023

"item"

image

I was really expecting a lot more items here 😅. In particular, for anything that you're considering a "map", any contained key-value pairs should be items

@pokey
Copy link
Member

pokey commented Nov 8, 2023

Hmm nix really does blur the line between statement and item, doesn't it?

image

My mind kind of expects the single-line ones not to be statements 😅. But that doesn't feel like a good distinction to draw. Idk cc/ @auscompgeek . Looks like there's no difference between a "statement block" and a "map"?

@pokey
Copy link
Member

pokey commented Nov 8, 2023

"call"

image

I did not expect this to be a call, especially since there is no "callee". I think a good rule of thumb is that if there's a "call", there should be a "callee"

@pokey
Copy link
Member

pokey commented Nov 8, 2023

"callee"

image

This callee where the same callee range has two nested domains was surprising. I also didn't expect an "import" to be a "call" / "callee" at all, but if people see it that way I won't object too strongly 😊

Copy link
Member

@pokey pokey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok left some comments. Haven't done a thorough review, but given it's marked draft, I assume you're still just looking for early feedback?

queries/nix.scm Outdated Show resolved Hide resolved
@pokey pokey mentioned this pull request Nov 8, 2023
@pokey
Copy link
Member

pokey commented Nov 8, 2023

Also, would be good to make sure you're formatting with @AndreasArvidsson 's formatter; I'm getting a big autoformat when I save one of your files locally

@fidgetingbits
Copy link
Collaborator Author

"item"

image

I was really expecting a lot more items here 😅. In particular, for anything that you're considering a "map", any contained key-value pairs should be items

This is fixed now for attrsets.

@fidgetingbits
Copy link
Collaborator Author

Hmm nix really does blur the line between statement and item, doesn't it?
image

My mind kind of expects the single-line ones not to be statements 😅. But that doesn't feel like a good distinction to draw. Idk cc/ @auscompgeek . Looks like there's no difference between a "statement block" and a "map"?

Ya. Really there isn't much similarity to what you'd normally think of as a statement in other (non functional) languages at least. Originally I was leaning towards statements because there are so many complex expressions and stuff when setting key/value pairs, but I think using items feels more natural after playing with it for awhile. I've disabled most statement matching for now to test more, and also added better item support.

@pokey
Copy link
Member

pokey commented Nov 14, 2023

Thanks, ya that was it it seems. Installed now. I am using your pre-commit hooks as well, so kinda surprised that doesn't automatically fix it either.

Yeah scm autoformat doesn't run as a hook yet as it's just implemented in that VSCode extension. We need to turn it into a cli

@fidgetingbits
Copy link
Collaborator Author

This one was also surprising to me. I guess it makes more sense in a one-line assert? 

Things seem a lot nicer with most of these assert issues addressed (not pushed yet). I like the idea of having branch apply only if it's a one-liner, but from searching the repo I don't see how its possible to actually have that as a query predicate? Is there some existing example of where you've already done this or would I have to add a new predicate?

@pokey
Copy link
Member

pokey commented Nov 15, 2023

I like the idea of having branch apply only if it's a one-liner, but from searching the repo I don't see how its possible to actually have that as a query predicate? Is there some existing example of where you've already done this or would I have to add a new predicate?

Oh I wasn't suggesting actually changing definition of "branch" depending on whether it's one or multiple lines. I think Python has taught us that making whitespace significant is a deeply problematic notion 😅. I think it would be quite surprising if a scope behaved differently if it was on one or multiple lines. Unless nix devs see them as significantly different constructs? cc/ @auscompgeek

Lmk if you want me to take another look, or to drop into a meet-up to discuss some of this stuff

@fidgetingbits
Copy link
Collaborator Author

Fair enough. As much as I hate python whitespace, I actually think there are maybe a couple cases in nix where it could sort of make sense to be whitespace dependent to feel "natural", but I don't have time to explain right atm.

I'm new to coding nix so I'd rather hold off implementing weird things I assume maybe make sense and instead just push something that has the basics covered and then wait until more people (hopefully more experienced devs) get to play with it.

You can feel free to take another look now ya.

There are two issues that I think might require new query predicates but aren't show stoppers (maybe... will see how much you groan at how I handled call arguments in apply/select expressions lol), so happy to solve them as follow up. Will explain in more detail in a comment tomorrow.

I do hope to actually make it to a meetup at some point :D

@fidgetingbits
Copy link
Collaborator Author

So to elaborate on a few changes and issues:

The statement stuff has changed to only be lines within the first ... part of let ... in ... statements, which feels a lot better.

There are I think 3 things could be done that I'm not sure how without possibly building new query predicates, so I'll just list them here.

  1. The call argument handling nesting is gross, but works. I suspect there could be a predicate introduced that repeats some node match nested up to N deep or something. I'd probably rather do that as a follow up. I suppose worth noting I also had tried to use #any-of to match both an apply and select expression in the same big query, but ran into some odd errors which I couldn't figure out, which may be a bug I guess. #any-of would only fix the query duplication, not the huge gross query.

  2. Probably we just won't use assert branch scope at all. But just touching on the idea of certain scopes only working depending on white space Like inherit n; on a single line could almost be thought of as a statement (maybe I'm too general with the way I think of statement). So for example you often see short assignments that might look like this:

  pkgs = import pkgsLibPath { inherit lib; pkgs = null; };

In this case intuitively I don't think the inherit would be make sense to be its own statement. But then you often also see longer assignments that are laid out like this:

  pkgs = import pkgsLibPath {
    inherit lib;
    pkgs = null;
  };

In the latter I find it feels natural to want to target the inherit line if it were a statement. Of course can just use line as the target, but just clarifying why in my brain it feels sort of natural to think of it differently depending on the white space.

There is a similar case with the with were you can have like maintainers = with maintainers; [ onny ] ++ teams.php.members; and:

{ config, lib, pkgs, ... }:

with lib;

let

  cfg = config.security.sudo;

Anyway, I think you are right that avoiding whitespace-based differences, but I wanted to explain anyway.

  1. I took a stab at adding inside comment support and ran into some issues. There doesn't seem to currently be a way to use a predicate to query the contents of the matched node. If I do something like:
  (
    (comment) @comment.interior
    (#shrink-to-match! @comment.interior "# ?(?<keep>.*)$")
  )

It works-ish, but the problem is that there are two types of comment both which match as (comment) nodes. So by having the above requiring #strink-to-match then I lose the ability to match on the multi-line comments. I tried something naive like:

[
  (
    (comment) @comment.interior
    (#shrink-to-match! @comment.interior "# ?(?<keep>.*)$")
  )
  (
    (comment) @comment.interior
    ;; This would need to be a multi-line match, which I'm not sure shrink-to-match can do
    (#shrink-to-match! @comment.interior "/\*(?<keep>.*)\*/$")
  )
] @comment

But this doesn't work since #shrink-to-match isn't actually a match predicate per se, it's just a modifier on what was already matched, so if you try to match on the multiline variant, since (comment) still matches in the first case, #shrink-to-match will error because it doesn't find the #.

So this made me think we probably want something like a #starts-with or even just a #regex predicate where we could just have regex("^#") for single line and #regex("^/\*") for double line, and then have the #strink-to-match after.

I think that's it. Definitely understand why you like lots of small PRs, as it gets hard to track all these different issues (even for me actively working on it, let alone you lol).

@fidgetingbits fidgetingbits marked this pull request as ready for review November 16, 2023 01:16
@pokey
Copy link
Member

pokey commented Feb 2, 2024

So to elaborate on a few changes and issues:

The statement stuff has changed to only be lines within the first ... part of let ... in ... statements, which feels a lot better.

There are I think 3 things could be done that I'm not sure how without possibly building new query predicates, so I'll just list them here.

1. The call argument handling nesting is gross, but works. I suspect there could be a predicate introduced that repeats some node match nested up to N deep or something. I'd probably rather do that as a follow up. I suppose worth noting I also had tried to use `#any-of` to match both an apply and select expression in the same big query, but ran into some odd errors which I couldn't figure out, which may be a bug I guess. `#any-of` would only fix the query duplication, not the huge gross query.

Here is a proposal for a new query predicate operator: fidgetingbits#1. I think it's actually fairly general purpose

2. Probably we just won't use `assert` branch scope at all. But just touching on the idea of certain scopes only working depending on white space Like `inherit n;` on a single line could almost be thought of as a statement (maybe I'm too general with the way I think of statement). So for example you often see short assignments that might look like this:
  pkgs = import pkgsLibPath { inherit lib; pkgs = null; };

In this case intuitively I don't think the inherit would be make sense to be its own statement. But then you often also see longer assignments that are laid out like this:

  pkgs = import pkgsLibPath {
    inherit lib;
    pkgs = null;
  };

In the latter I find it feels natural to want to target the inherit line if it were a statement. Of course can just use line as the target, but just clarifying why in my brain it feels sort of natural to think of it differently depending on the white space.

There is a similar case with the with were you can have like maintainers = with maintainers; [ onny ] ++ teams.php.members; and:

{ config, lib, pkgs, ... }:

with lib;

let

  cfg = config.security.sudo;

Anyway, I think you are right that avoiding whitespace-based differences, but I wanted to explain anyway.

I would be tempted to just err on the side of more scopes. So if it makes sense multi-line, let's include it always. Can always "grand state" your way out of it or target part of the scope not trapped

3. I took a stab at adding `inside comment` support and ran into some issues. There doesn't seem to currently be a way to use a predicate to query the contents of the matched node. If I do something like:
  (
    (comment) @comment.interior
    (#shrink-to-match! @comment.interior "# ?(?<keep>.*)$")
  )

It works-ish, but the problem is that there are two types of comment both which match as (comment) nodes. So by having the above requiring #strink-to-match then I lose the ability to match on the multi-line comments. I tried something naive like:

[
  (
    (comment) @comment.interior
    (#shrink-to-match! @comment.interior "# ?(?<keep>.*)$")
  )
  (
    (comment) @comment.interior
    ;; This would need to be a multi-line match, which I'm not sure shrink-to-match can do
    (#shrink-to-match! @comment.interior "/\*(?<keep>.*)\*/$")
  )
] @comment

But this doesn't work since #shrink-to-match isn't actually a match predicate per se, it's just a modifier on what was already matched, so if you try to match on the multiline variant, since (comment) still matches in the first case, #shrink-to-match will error because it doesn't find the #.

So this made me think we probably want something like a #starts-with or even just a #regex predicate where we could just have regex("^#") for single line and #regex("^/\*") for double line, and then have the #strink-to-match after.

I would just include the # . Does that make your life easier?

I think that's it. Definitely understand why you like lots of small PRs, as it gets hard to track all these different issues (even for me actively working on it, let alone you lol).

Ha yeah 😅

Also, looks like this PR has some of the same leading / trailing / removal issues I fixed in #1962, so would be awesome if you could take a crack at fixing them. Basically:

  • Leading and trailing is designed for delimiters. We only remove one of them; not both. Think of a comma-separated list, where deleting two commas would smash the siblings together. If you need to remove something before and after, just make a removal range
  • You no longer need to specify @leading.end; it's implied to be the scope itself
  • Instead of @leading.start.endOf, you can now just say @leading.endOf; start is now implied
  • See my tweaks to Initial lua support #1962 if you're not sure what I mean; hopefully that will make things clear

@pokey
Copy link
Member

pokey commented Feb 6, 2024

Here is a proposal for a new query predicate operator: fidgetingbits#1. I think it's actually fairly general purpose

We discussed this one in a meet-up and decided it was simpler to just introduce a query predicate operator called #recursively-expand-to-parent-of-type! where you give it a node and a type and it recursively expands the range to direct parents of the given type. So in this case it woudl be (#recursively-expand-to-parent-of-type! @functionCallee.domain "apply_expression"). Make sense @fidgetingbits? Lmk if you want help implementing that

@fidgetingbits
Copy link
Collaborator Author

Sounds good. I do really need to get more familar with typescript, so will try to give it a whirl when I find the time.

@pokey
Copy link
Member

pokey commented Apr 22, 2024

As proposed above, I don't think we'll end up going the route of fidgetingbits#1, but I captured it in a tag just in case we want it in the future, and the PR disappears from @fidgetingbits's repo for whatever reason (not sure how things work with forks). Here's the commit 52ce5b1

github-merge-queue bot pushed a commit that referenced this pull request Jun 13, 2024
This is me breaking out some functionality that was part of PR #1911
since that PR has lots of work to be done and it may still benefit other
languages that get in sooner. I've applied the suggested changes from
pokey in that PR's code review.

I've also added a placeholder map for lua which uses `[[ ]]` for
multiline support as noted in PR #1962, however I realized as I went to
add it that it's not as simple as the way I did it for nix because I
can't just reuse the existing `singleQuote` entry since lua actually
uses single quotes. So this may actually be a bit harder, and reminds me
of [this
discussion](#1992 (comment)).
So before I go randomly hacking stuff I'm curious what you think the
best approach here is.

These are the comments I had left in the Nix PR about these changes:

* I called returning delimiterToText as getSimpleDelimiterMap to kind of
mirror complexDelimiterMap.

* I'm not sure is if you still want a delimiterMap.ts standalone file,
and then a getSimpleDelimiterMap.ts only for that function. Now
delimiterToText isn't referenced anywhere else, so seemed maybe okay to
keep them together.

* I also thought about adding getComplexDelimiterMap(), but atm because
complexDelimiterMap only ever uses the keys from the delimiterToText it
won't change even if the language has different values, so seemed
unnecessary. These are all things I could guess at your preferences, but
may as well ask instead.

* I also noticed leftToRightMap in that file isn't actually used
anywhere so can be deleted I think, but also not sure about doing that
as part of a totally unrelated code change, to keep commits clean.


## Checklist

- [x] I have added
[tests](https://www.cursorless.org/docs/contributing/test-case-recorder/)
- [-] I have updated the
[docs](https://github.com/cursorless-dev/cursorless/tree/main/docs) and
[cheatsheet](https://github.com/cursorless-dev/cursorless/tree/main/cursorless-talon/src/cheatsheet)
- [x] I have not broken the cheatsheet

---------

Co-authored-by: fidgetingbits <[email protected]>
Co-authored-by: Pokey Rule <[email protected]>
@fidgetingbits fidgetingbits requested a review from a team as a code owner August 5, 2024 01:20
@AndreasArvidsson
Copy link
Member

AndreasArvidsson commented Jan 14, 2025

@fidgetingbits I've tried to read through the history of this one, but I think it's better if I just ask you. Is this one ready? Are you happy with the current state?

@fidgetingbits
Copy link
Collaborator Author

fidgetingbits commented Jan 14, 2025

@fidgetingbits I've tried to read through the history of this one, but I think it's better if I just ask you. Is this one ready? Are you happy with the current state?

No it's got some bugs and some decisions I made early on I think need to change now that I'm much more familiar with the language. I will come back to it eventually but I haven't been able to actually work on it in months, although I do use it every day.

I guess one question for you is, given what a mess this PR became, would it be better/easier to just close it and make smaller PRs over time?

@AndreasArvidsson
Copy link
Member

Okay I will convert it to draft in the meantime.

You can decide if you want to continue to work on this pull request or close it and make multiple smaller ones. I unfortunately don't have the time to review languages I don't know in the kind of detail that Pokey did. I will more or less entrust the behavior to you. ie when you say that it's ready and you are happy with it we will merged unless I find something I definitely don't like. Sounds good?

@AndreasArvidsson AndreasArvidsson marked this pull request as draft January 15, 2025 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants