Skip to content

Commit

Permalink
Reorganize custom dictionaries, better spell checking infra (#36255)
Browse files Browse the repository at this point in the history
* Reorganize custom dictionaries, better spell checking infra

* Update scripts/sort_and_unique_file_lines.js

* Reorg files

* Updates

* Apply suggestions from code review

Co-authored-by: Onkar Khadangale <[email protected]>

* Typo

* Fix action

* Fix checkout

* Update scripts/sort_and_unique_file_lines.js

Co-authored-by: Onkar Khadangale <[email protected]>

* Update .vscode/cspell.json

Co-authored-by: Onkar Khadangale <[email protected]>

* Add docs

* Update files/en-us/mdn/writing_guidelines/writing_style_guide/index.md

Co-authored-by: Onkar Khadangale <[email protected]>

---------

Co-authored-by: Onkar Khadangale <[email protected]>
  • Loading branch information
2 people authored and fiji-flo committed Oct 31, 2024
1 parent c75956f commit b833f70
Show file tree
Hide file tree
Showing 17 changed files with 2,942 additions and 5,959 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/auto-cleanup-bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@ jobs:
yarn content fix-flaws
yarn fix:md
yarn fix:fm
node scripts/sort_and_unique_file_lines.js .vscode/ignore-list.txt
node scripts/sort_and_unique_file_lines.js .vscode/terms-abbreviations.txt
node scripts/sort_and_unique_file_lines.js .vscode/dictionaries
- name: Create PR with only fixable issues
if: success()
Expand Down
9 changes: 3 additions & 6 deletions .github/workflows/pr-check_cspell_lists.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@ on:
branches:
- main
paths:
- .vscode/ignore-list.txt
- .vscode/terms-abbreviations.txt
- .vscode/dictionaries/*

jobs:
docs:
Expand All @@ -16,8 +15,7 @@ jobs:
with:
sparse-checkout-cone-mode: false
sparse-checkout: |
.vscode/ignore-list.txt
.vscode/terms-abbreviations.txt
.vscode/dictionaries/*
.nvmrc
package.json
scripts/sort_and_unique_file_lines.js
Expand All @@ -29,5 +27,4 @@ jobs:

- name: Check if cSpell word lists are in correct order
run: |
node scripts/sort_and_unique_file_lines.js .vscode/ignore-list.txt --check
node scripts/sort_and_unique_file_lines.js .vscode/terms-abbreviations.txt --check
node scripts/sort_and_unique_file_lines.js --check .vscode/dictionaries
2 changes: 1 addition & 1 deletion .github/workflows/spelling-check-bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,4 @@ jobs:
${{ env.OUTPUT }}
> [!TIP]
> To exclude words from the spellchecker, you can add valid words (web technology terms or abbreviations) to the [terms-abbreviations.txt](https://github.com/mdn/content/blob/main/.vscode/terms-abbreviations.txt) dictionary for IDE autocompletion. To ignore strings that are not words (\`AABBCC\` in code, for instance), you can add them to [ignore-list.txt](https://github.com/mdn/content/blob/main/.vscode/ignore-list.txt).
> If the word is actually valid or it is required to be ignored, consider adding it to one of the dictionaries under [`.vscode/dictionaries`](https://github.com/mdn/content/tree/main/.vscode/dictionaries).
7 changes: 2 additions & 5 deletions .lintstagedrc.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,7 @@ export default {
`yarn filecheck ${filenames.join(" ")}`,
],
"*": (filenames) => [`node scripts/log-url-issues.js`],
".vscode/ignore-list.txt": (filenames) => [
`node scripts/sort_and_unique_file_lines.js .vscode/ignore-list.txt`,
],
".vscode/terms-abbreviations.txt": (filenames) => [
`node scripts/sort_and_unique_file_lines.js .vscode/terms-abbreviations.txt`,
".vscode/dictionaries/*.txt": (filenames) => [
`node scripts/sort_and_unique_file_lines.js ${filenames.join(" ")}`,
],
};
101 changes: 86 additions & 15 deletions .vscode/cspell.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@
"useGitignore": true,
"dictionaries": [
"terms-abbreviations",
"cultural-words",
"proper-names",
"non-english",
"code-entities",
"ignore-list",
"bash",
"css",
Expand All @@ -32,16 +36,31 @@
],
"ignoreRegExpList": [
// macros
"{{\\s?\\w*\\(",
"{{EmbedInteractiveExample\\(.*\\)}}",
"{{EmbedLiveSample\\(.*\\)}}",
"{{EmbedYouTube\\(\"[\\w-]*\"\\)}}",
// TODO - add some details what these match
"\\(#\\w*\\)",
"{{\\s?\\w*",
"{{\\s*EmbedInteractiveExample\\(.*\\)\\s*}}",
"{{\\s*EmbedLiveSample\\(.*\\)\\s*}}",
"{{\\s*EmbedYouTube\\(.*\\)\\s*}}",
"{{\\s*EmbedGHLiveSample\\(.*\\)\\s*}}",
// Markdown links
"\\]\\(\\S*\\)",
"\\*\\*\\w\\*\\*\\w*",
"\\*\\w\\*\\w*",
// Website references
"[\\w\\-.]+\\.(com|net|org|ac\\.uk)\\b",
// Things like "**J**ava**S**cript"
"\\*\\*\\w+\\*\\*\\w*",
"\\*\\w+\\*\\w*",
"#[À-ž\\w-]*",
// Old Firefox interfaces
"nsIDOM\\w+",
// Don't check other scripts
"[\\u0370-\\u03FF]+", // Greek
"[\\u0400-\\u04FF]+", // Cyrillic
"[\\u0590-\\u05FF]+", // Hebrew
"[\\u0600-\\u06FF]+", // Arabic
"(\\uD835[\\uDC00-\\uDFFF])+", // Mathematical Alphanumeric Symbols
"(\\uD83A[\\uDD00-\\uDD5F])+", // Adlam script
// Percent-encoding
"[A-Za-z]*%[A-F0-9]{2}[A-Za-z]*",
// Various HTML attributes that often have non-word values
"aria-activedescendant=\"(?:[^\\\"]+|\\.)*\"",
"aria-controls=\"(?:[^\\\"]+|\\.)*\"",
"aria-describedby=\"(?:[^\\\"]+|\\.)*\"",
Expand All @@ -50,29 +69,81 @@
"aria-flowto=\"(?:[^\\\"]+|\\.)*\"",
"aria-labelledby=\"(?:[^\\\"]+|\\.)*\"",
"aria-owns=\"(?:[^\\\"]+|\\.)*\"",
"Base64",
"class=\"(?:[^\\\"]+|\\.)*\"",
"data-test-id=\"(?:[^\\\"]+|\\.)*\"",
"for=\"(?:[^\\\"]+|\\.)*\"",
"HexValues",
"pattern=\"(?:[^\\\"]+|\\.)*\"",
"href=\"(?:[^\\\"]+|\\.)*\"",
"(?<=id)=\"(?:[^\\\"]+|\\.)*\"",
"(?<!\\w)id=\"(?:[^\\\"]+|\\.)*\"",
"lang=\".*\">.*</",
"src=\"(?:[^\\\"]+|\\.)*\"",
"HexValues",
"Base64",
// Any base64 in data URLs, even those shorter than 40 chars (which don't match Base64 regex)
"data:[^\\s;]+;base64,[a-zA-Z0-9/+=…]*",
"[Ee][Tt]ag: ([\\w-]+|\"[\\w-]+\")",
// Note: we don't add other headers that may contain base64 data, becase
// they often contain other meaningful directives that we want to spell
// check too
"url\\(\"data\\:image/svg\\+xml.*\"\\)[,;]",
"Urls",
"favourite-colour",
"nonce-\\w+",
"sessionid=\\w+",
"csrftoken=\\w+",
"csrfmiddlewaretoken=\\w+",
"widget_session=\\w+",
"ucaf:.*\""
],
"dictionaryDefinitions": [
// Note: when adding words to these lists, be as specific and contextualized
// as possible, to avoid typos being masked elsewhere. For example, all FF
// preferences should include prefixes: `dom.abortablepromise` instead of
// just `abortablepromise`, which may be missing a space in other contexts.
{
"name": "terms-abbreviations",
"path": "./terms-abbreviations.txt",
"path": "./dictionaries/terms-abbreviations.txt",
"description": "Anything that may be used throughout the content: compound words, abbreviations, etc. They are considered as real words and will be suggested.",
"addWords": true
},
{
"name": "cultural-words",
"path": "./dictionaries/cultural-words.txt",
"description": "Culture-specific names: currencies, calendars, languages, big cities, countries, etc.",
"addWords": true
},
// Dictionaries below will not be suggested.
// We are not dogmatic about where to put a word: for example,
// sometimes proper names are in terms-abbreviations dictionary because you
// are likely to use them.
// There's no difference between these dictionaries; they only provide rough
// divisions for easier management. For example, a proper name can be
// non-English, and non-English words may be code entities.
// We recommend assessing applicability in the order in which the
// dictionaries are listed.
{
"name": "proper-names",
"path": "./dictionaries/proper-names.txt",
"description": "Proper names: people, small towns, companies, products, fonts, online platform handles.",
"addWords": false,
"noSuggest": true
},
{
"name": "non-english",
"path": "./dictionaries/non-english.txt",
"description": "Non-English words. Note that some non-English words denote well-known concepts, such as \"Adlam script\" or \"Adis Ababa\", in which case they should be placed in cultural-words. This dictionary is intended for entire scripts for demonstrating non-English languages.",
"addWords": false,
"noSuggest": true
},
{
"name": "code-entities",
"path": "./dictionaries/code-entities.txt",
"description": "This list contains compound words that aren't properly capitalized or obscure abbreviations that are only utilized by particular web APIs (e.g. HTML attributes, language codes, event names, etc.). Only include entities defined by standards or libraries here; variable names, strings, etc. that are created by MDN code examples should be added to ignore-list.",
"addWords": false,
"noSuggest": true
},
{
"name": "ignore-list",
"path": "./ignore-list.txt",
"path": "./dictionaries/ignore-list.txt",
"description": "Other gibberish words that are used for specific purposes. For example, placeholder identifiers, random strings, URLs (all lowercase), hashes, filler texts, etc. For purposefully misspelled words and/or words that are likely typos in other contexts, consider using cSpell:ignore instead.",
"addWords": false,
"noSuggest": true
}
Expand Down
Loading

0 comments on commit b833f70

Please sign in to comment.