Skip to content

Scribe-Data v3.3.0

Compare
Choose a tag to compare
@andrewtavis andrewtavis released this 09 Jun 12:51
· 1319 commits to main since this release

✨ Features

  • The translation process has been updated to allow for translations from non-English languages (#72, #73, #74, #75, #75, #76, #77, #78, #79).

πŸ“ Documentation

  • The documentation has been given a new layout with the logo in the top left (#90).
  • The documentation now has links to the code at the top of each page (#91).

🐞 Bug Fixes

  • Annotation bugs were removed like repeat or empty values.
  • Perfect tenses of Portuguese verbs were fixed via finding the appropriate PID (#68).
    • Note that the most common past perfect property is not the standard one, so this will need to be fixed.

♻️ Code Refactoring

  • pre-commit have been added to the repo to improve the development experience (#137).
  • Code formatting was shifted from black to Ruff.
  • A Ruff based GitHub workflow was added to check the code formatting and lint the codebase on each pull request (#109).
  • The _update_files directory was renamed update_files as these files are used in non-internal manners now (#57).
  • A common function has been created to map Wikidata ids to noun genders (#69).
  • The project now is installed locally for development and command line usage, so usages of sys.path have been removed from files (#122).
  • The directory structure has been dramatically streamlined and includes folders for future projects where language data could come from other sources like Wiktionary (#139).
    • Translation files are moved to their own directory.
    • The extract_transform directory has been removed and all files within it have been moved one level up.
    • The languages directory has been renamed language_data_extraction.
    • All files within wikidata/_resources have been moved to the resources directory.
    • The gender and case annotations for data formatting have now been commonly defined.
    • All language directory formatted_data files have been now moved to the scribe_data_json_export directory to prepare for outputs being required to be directed to a directory outside of the package.
    • Path computing has been refactored throughout the codebase, and unneeded functions for data transfers have been removed.