Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMS Forms Flexible Imports #419

Merged
merged 12 commits into from
Feb 19, 2025

Conversation

DevChima
Copy link
Contributor

@DevChima DevChima commented Feb 17, 2025

Purpose

There is need for CMS forms to have a more robust import mechanism.
Also, we need the error handling to be more intuitive so there's better guidance for the user when there's an import failure

Solution

This pr includes:
A header validator function which validates the header before writing to rows
A list of mandatory headers that each import should have
A snakecase function which converts the headers to snake case so the import is more flexible
What this means is that if a user enters Generic Error instead of generic_error as a header name, it will convert and pass.

Checklist

  • Added or updated unit tests
  • Added to release notes
  • Updated readme/documentation (if necessary)

Copy link
Contributor

@erikh360 erikh360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! few comments

@@ -425,6 +425,41 @@ def test_single_assessment(self, impexp: ImportExport) -> None:
imported = impexp.get_assessment_json()
assert imported == orig

def test_snake_case_assessments(self, csv_impexp: ImportExport) -> None:
"""
Importing a csv with spaces in header names and uppercase text should be coverted to snake case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: "coverted"

model_fields = [field.name for field in assessment_model._meta.get_fields()]

expected_fields = {
"title": True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the True value here not redundant since they're all True? Removing this would make your for loop below less complicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that's true.

row_iterator = parse_file(self.file_content, self.file_type)
rows = [row for _, row in row_iterator]

if rows:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like we have a test covering rows being empty?

Copy link
Contributor Author

@DevChima DevChima Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it in but I also had to modify the helper function here:

first_row = next(rows)

It would throw a runtime stop-iteration error becuase it will try use next(rows) on an empty iterator since the file is empty.
So I put it in a try-except block to catch the exception if the iterator is empty.
Then in the parse_file function, I added an if not rows to catch empty rows.

@erikh360

@@ -185,6 +214,21 @@ def create_shadow_assessment_from_row(
)
assessment.questions.append(question)

def validate_headers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to pass in MANDATORY_HEADERS here? validate_headers is in the same class where it is called and can also access the variable directly.

Comment on lines 118 to 126
original_headers = rows[0].keys()
headers_mapping = {
header: self.to_snake_case(header) for header in original_headers
}
snake_case_headers = list(headers_mapping.values())
self.validate_headers(snake_case_headers, MANDATORY_HEADERS, row_num=1)
transformed_rows = [
{headers_mapping[key]: value for key, value in row.items()} for row in rows
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should some of this not maybe live inside the parse_file function? I see we have a function in there called fix_rows?

I'm not very familiar with all the import code so I'm just asking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can move some there so it's also more accessible to other apps that will share the helper functions. There we can check if there are rows. We could perhaps handle snake_case conversion there too. I'll take a look.

… apps for file imports.

- Check empty rows() now actioned for assessment, content pages imports and any other cms imports that uses the parse_file function.
- We are not able to move snake_case into fix_rows() in import_helpers.py because contentset uses PascalCase for headers.
If we snake_cased every import, there will be key errors in contentsets.
row_num=row_num,
)

def to_snake_case(self, s: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this being used?

Comment on lines 186 to 190
def to_snake_case(s: str) -> str:
"""
Converts string to snake_case.
"""
return re.sub(r"[\W_]+", "_", s).lower().strip("_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have a specific set of characters that we convert, perhaps just spaces and dashes. Anything too far off should be an error to avoid confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it that way then making it reusable across all imports could be more cumbersome since they all have their various headers?

Comment on lines 218 to 222
if any(char in s for char in INVALID_CHARACTERS):
raise ImportException(
f"Invalid header: '{s}' contains invalid characters.", row_num=row_num
)
return re.sub(r"[\W_]+", "_", s).strip("_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than maintaining a long list of invalid non-word characters (some of which we may potentially want to use in the future) and replacing everything else with _, we can replace just the _-equivalent characters:

Suggested change
if any(char in s for char in INVALID_CHARACTERS):
raise ImportException(
f"Invalid header: '{s}' contains invalid characters.", row_num=row_num
)
return re.sub(r"[\W_]+", "_", s).strip("_")
return s.replace(" ", "_").replace("-", "_").strip("_")

(I typed this directly into the PR comment, so check that it works before accepting the suggestion.)

Comment on lines 403 to 414
@classmethod
def check_missing_fields(cls, row: dict[str, str], row_num: int) -> None:
"""
Checks for missing required fields in the row and raises an exception if any is missing.
"""
missing_fields = [field for field in MANDATORY_HEADERS if field not in row]
if missing_fields:
raise ImportAssessmentException(
f"The import file is missing required fields: {', '.join(missing_fields)}",
row_num,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already covered by AssessmentImporter.validate_headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, validate_headers checks for missing headers before writing to rows. Check_missing_fields then checks that the mandatory headers do not have missing fields in its rows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All rows will have exactly the same set of keys, so if the headers aren't missing there they won't be missing here. If the intent is to check for missing values, we should change the code to do that instead of just checking the keys.

(Unless we're removing empty fields somewhere along the way and I missed it?)

raise ImportAssessmentException(
"The import file is missing some required fields."
f"The import file is missing required fields: {', '.join(missing_fields)}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "Row missing values for required fields: {...}" instead?

@DevChima DevChima merged commit 9cca1d2 into main Feb 19, 2025
2 checks passed
@DevChima DevChima deleted the assessment-flexible-import-misisng-error-fields branch February 19, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants