Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added codeAction (extract subSchema to defs) #133

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

arpitkuriyal
Copy link

@arpitkuriyal arpitkuriyal commented Feb 20, 2025

#132
Point to Note in this:-

  1. Child Nodes of $defs Are Added from the Top
    • Instead of adding child elements from the bottom, they are inserted from the top.
    • This prevents inconsistencies caused by offset + textLength changes due to spaces and brackets
Screen.Recording.2025-02-20.at.10.47.01.PM.mov

Copy link
Collaborator

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great progress! But, it looks like there's still some work to do.

Adding the definitions to the top isn't good enough. Also, the sloppy formatting of the generated code isn't acceptable either. The biggest motivating factor for this project is to encourage best practices and good style. Two of those rules are that $schema and $id always come first and that $defs always goes last. It's important that the definitions are in the right place and reasonably formatted when moved. You might have to somehow run the moved code through a formatter once it's in its new location. jsonc-parser is used internally to parse the schema. It has a formatting feature that you might be able to use somehow.

I noticed that the schema you're using in your demo video has a problem. The "shipping_address" property has a property "address" that then has the actual schema in it. That subschema will do nothing at all. "address" is not JSON Schema keyword, so it gets ignored. You can tell something's wrong because the schema in "address" doesn't have the expected syntax highlighting. JSON Schema keywords should be highlighted differently than plain JSON properties.

I see you're using the property name as the definition name. That's a nice default, but that's not always going to work. Subschemas can be in quite a few places other than properties values. For example, this would work for the items keyword. I think what needs to happen is that the user needs to provide the definition name. Have a look at the if/then completion provider. There's a special syntax that allows you to set placeholders that the user will be prompted to fill in. That approach may also work here.

You'll need to add tests for this feature that cover whatever edge cases you can think of.

I appreciate the clean and well written code so far. I suggest installing the ESLint plugin in your editor to get linting feedback while developing. It will help avoid getting build errors in your PRs if you forget to run the linter before pushing.

@arpitkuriyal
Copy link
Author

Thank you for the detailed feedback. However, I encountered an issue while looking into what you mentioned:

I think what needs to happen is that the user needs to provide the definition name.
There's a special syntax that allows you to set placeholders that the user will be prompted to fill in. That approach may also work here.

I found that InsertTextFormat: InsertTextFormat.Snippet is coming in LSP 3.18 (see here). I’ll try using the command feature as an alternative and see how it goes.

@jdesrosiers
Copy link
Collaborator

Thanks for figuring that out. Yes, it looks like SnippetTextEdit is what we need. Although 3.18 isn't released yet, it looks like vscode-languageserver-node already supports it. That means that vscode's language server client probably supports it too. Other clients may not support it yet, but that's ok.

@arpitkuriyal
Copy link
Author

arpitkuriyal commented Feb 21, 2025

I think it's not supported by vscode-languageserver-node yet.

As you can see in the screenshots, the latest available version is still 3.17.5, and this feature is introduced in 3.18. I also checked the node_modules directory and couldn't find any trace of this feature there.
Could you please confirm this on your end as well?

version ScreenShot
Screenshot 2025-02-22 at 1 34 48 AM

nodeModule Screenshot
Screenshot 2025-02-22 at 1 20 05 AM
PR Screenshot that u mentioned here:-

Although 3.18 isn't released yet, it looks like vscode-languageserver-node already supports it.

Screenshot 2025-02-22 at 1 21 54 AM

@jdesrosiers
Copy link
Collaborator

Yes, you're right. Although the code was merged over a year ago, it hasn't made it to an official release yet. It's planned for the next major release (v10) and that only has pre-releases published so far. We could change the dependency to `"vscode-languageserver": "^10.0.0-next" to use the pre-release. But, that's only the server. It probably wouldn't work in vscode yet.

I guess that means we have to hold off on that detail. I was hoping this would give us a way to avoid having to come up with a more robust way to generate definition names, but it looks like we're going to have to find a temporary solution until this feature is released.

@arpitkuriyal
Copy link
Author

I think, for now, we should keep it simple and implement a basic defCounter that generates definition names like def1, def2, and so on.

Do you have any other solutions in mind? Please let me know your thoughts.

@jdesrosiers
Copy link
Collaborator

Number based naming can work, but there are some some edge cases.

Extract a schema and you'd get def1. Then restart the language server and extract another schema and you get def1 again. Ideally people will rename after extraction and this won't come up often, but it could be a problem.

Also, if you do refactorings in one schema and then go to another schema, it would be weird for it to generate def5 or something when there isn't def1 - def4 yet.

One thing that I think would work is the generate number-based names starting at def1 and check if the name already exists. If it does, increment and check again until you have a unique name.

Or, you could inspect all the definition names looking for the def{number} pattern and increment starting from the number you find or 1 if the pattern isn't found.

Or, you could generate a UUID and use that as the name and not have to check anything. But, a bulky UUID might not make for as good a user experience.

Any of those options would be ok.

@arpitkuriyal
Copy link
Author

Alright, I’ll work on it and let you know once it’s done.

@arpitkuriyal
Copy link
Author

Screen.Recording.2025-02-24.at.12.48.49.AM.mov

Please review it. If everything looks good, I will start writing the test cases.

@arpitkuriyal
Copy link
Author

arpitkuriyal commented Feb 23, 2025

When switching the dialect URI of the schema, we need to change $defs to definitions for older drafts. Therefore, I added a condition: if the dialect is 2020-12 or 2019-09, use $defs otherwise, use definitions. Is this correct, or is there anything else I should do?

@jdesrosiers
Copy link
Collaborator

Is this correct, or is there anything else I should do?

Use the getKeywordName function from @hyperjump/json-schema/experimental. It takes a keyword URI and a dialect URI and returns the right label for the dialect.

@arpitkuriyal
Copy link
Author

Got it! I'll use getKeywordName from @hyperjump/json-schema/experimental for the right label.

@arpitkuriyal
Copy link
Author

I just made the change, you can check it now. I'll write the test case as soon as possible.

Copy link
Collaborator

@jdesrosiers jdesrosiers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have another try at the JSON formatting part. The other things I mentioned should be small and easy to address. I included a few code style suggestions that aren't caught by the linter. In general, I thought the code made better use of whitespace the last time I reviewed. Things are more compact now making the code harder to read.

Comment on lines 27 to 36
// Helper function to format new def using jsonc-parser
const formatNewDef = (/** @type {string} */ newDefText) => {
try {
/** @type {unknown} */
const parsedDef = jsoncParser.parse(newDefText);
return JSON.stringify(parsedDef, null, 2).replace(/\n/g, "\n ");
} catch {
return newDefText;
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull this out of the constructor. Either make it a private function or a utility function outside of the class. I'm sure other refactorings will need to use a function like this as well, so maybe it belongs in util.js?

Comment on lines +31 to +32
const parsedDef = jsoncParser.parse(newDefText);
return JSON.stringify(parsedDef, null, 2).replace(/\n/g, "\n ");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't what I meant when I suggested jsonc-parser. You're not using it for anything you couldn't have used JSON.parse for. Look for the format function from jsonc-parser. This solution using JSON.stringify to format isn't going to work for embedded schemas.

{
  "$defs": {
    "this-is-an-embedded-schema-because-it-has-$id": {
      "$id": "my-embedded-schema",
      "$defs": {
        "def1": {
          "$comment": "This definition will need more indentation because it's nested"
        }
      }
    }
  }
}

jsoncParser.format should help get around that problem because you can give it the whole document with the replaced text and tell it to format just the replaced text.

try {
/** @type {unknown} */
const parsedDef = jsoncParser.parse(newDefText);
return JSON.stringify(parsedDef, null, 2).replace(/\n/g, "\n ");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoding the indentation strategy to two spaces isn't going to work. You're going to need to determine what indentation strategy the client is using and match that. You should be able to get the information from the configuration service, but it's currently only configured to retrieve this server's configs. I think there's another "section" you'd have to request, but you'll have to figure out what that is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made all the changes, but I am stuck on the formatting part. It always takes the default tab size of four instead of the current tab size. Even though I am fetching the editor settings, it doesn't seem to reflect the actual tab size used in the document. Do you have any suggestions on how to correctly retrieve the active document's indentation settings?
Screenshot 2025-02-26 at 1 38 00 AM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try sending the document's URI (schemaDocument.textDocument.uri) in the scopeUri parameter in the configuration request. That should return the settings for that file, instead of the settings for the workspace. That's my best guess.

I think it should be ok to add an optional parameter to the get function so you can pass in the document URI.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried this approach, but when I console it in the terminal, it still always shows 4. Not sure why it's not picking up the actual tab size. Any other suggestions?

Copy link
Collaborator

@jdesrosiers jdesrosiers Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any more guesses. I'll try to find some time tonight to try some things and see if I can figure anything out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I figured out.

I think getting 4 is technically correct. Your editor is configured for a 4 space indentation by default. However, vscode also has a setting called "Detect Indentation". If that is set, it ignores the tabSize setting and figures out the indentation based on the content of the file. I haven't found any way to get vscode to tell us the detected indentation. There might not be a way. I think that all we can do is check for the detectIndentation config and if it's true, do our own indentation detection on the server.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, then I'll set detectIndentation: true and handle the indentation detection. Thanks for your help!

Comment on lines +60 to +67
const defsContent = schemaDocument.textDocument.getText().slice(
definitionsNode.offset,
definitionsNode.offset + definitionsNode.textLength
);
const defMatches = [...defsContent.matchAll(/"def(\d+)":/g)];
defMatches.forEach((match) =>
highestDefNumber = Math.max(highestDefNumber, parseInt(match[1], 10))
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a good approach. Use keys from schema-node.js to loop over all the property names of the definitionsNode. Then you can use the regex on those values to determine the highestDefNumber.

arpitkuriyal and others added 5 commits February 26, 2025 02:49
Co-authored-by: Jason Desrosiers <[email protected]>
Co-authored-by: Jason Desrosiers <[email protected]>
Co-authored-by: Jason Desrosiers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants