Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pc 33771/clean ean inside offer titles books cd vinyles only #15878

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jbaudet-pass
Copy link
Contributor

@jbaudet-pass jbaudet-pass commented Jan 16, 2025

But de la pull request

Ticket Jira (ou description si BSR) : https://passculture.atlassian.net/browse/PC-33771

Première étape visant à corriger les offres avec des EANs dans le titre : livres, CDs et vinyles.

Si l'offre a un EAN dans le titre et que celui-ci est correspond à un produit en base de données, alors son titre et ses extraData sont mis à jour avec les informations du produit.

Si l'EAN est inconnu ou si le produit est incompatible avec les CGU, alors l'offre est rejetée.

Au passage

Fix pour la fonction get_chunks qui boucle indéfiniment si chunk_size < 1 parce que len([]) vaut 0 et ne sera donc jamais strictement inférieure à 0.

@jbaudet-pass jbaudet-pass added the python Pull requests that update Python code label Jan 16, 2025
@jbaudet-pass
Copy link
Contributor Author

Je ne sais pas s'il y a des cas avec des offres qui ont un EAN dans le titre et un EAN renseigné dans extraData, et éventuellement deux EANs différents. A priori non, mais c'est à vérifier. Voire s'il y a des offres avec un nombre dans le titre qui ressemblerait à un EAN mais qui n'en est pas un. Pareil, normalement non mais c'est à confirmer.

Copy link
Contributor

github-actions bot commented Jan 16, 2025

Visit the preview URL for this PR (updated for commit be62564):

https://pc-pro-testing--pr15878-pc-33771-clean-ean-i-pykqm215.web.app

(expires Sat, 25 Jan 2025 16:41:02 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: 032d233ee67e1c50d6af12e29c936c7076770eb1

@jbaudet-pass jbaudet-pass force-pushed the PC-33771/clean_ean_inside_offer_titles_books_cd_vinyles_only branch 2 times, most recently from 2c77634 to c9fcac0 Compare January 16, 2025 08:42
@jbaudet-pass jbaudet-pass force-pushed the PC-33771/clean_ean_inside_offer_titles_books_cd_vinyles_only branch from c9fcac0 to f33a27f Compare January 16, 2025 13:18
Copy link
Contributor

@tcoudray-pass tcoudray-pass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je pense qu'il est important de simplifier le script pour s'assurer que tout fonctionne bien.

api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved
api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved
api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved
api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved


def parse_offers(rows: Collection[OfferEanQueryRow]) -> None:
for chunk in get_chunks(rows, chunk_size=2_000):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je ne suis pas sûr de saisir l'intérêt de faire un get_chunks ici. Est-ce qu'il ne vaudrait pas mieux just limité la requête SQL à 2k lignes ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C'est pour éviter de traiter les offres une par une. La requête SQL de base ne donne qu'un identifiant d'offre, j'ai donc préféré les regrouper par paquets de 2000 afin de ne pas avoir à les charger au fur et à mesure.

api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved
api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved
api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved
@jbaudet-pass jbaudet-pass force-pushed the PC-33771/clean_ean_inside_offer_titles_books_cd_vinyles_only branch 4 times, most recently from d3ab10a to 4b48f6b Compare January 20, 2025 15:24
Copy link
Contributor

@tcoudray-pass tcoudray-pass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Il y a un petit problème de linting.

api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved
api/src/pcapi/utils/chunks.py Outdated Show resolved Hide resolved
api/src/pcapi/scripts/clean_offer_titles_with_eans/main.py Outdated Show resolved Hide resolved
@jbaudet-pass jbaudet-pass force-pushed the PC-33771/clean_ean_inside_offer_titles_books_cd_vinyles_only branch 4 times, most recently from 97df1f1 to 6c4eccc Compare January 21, 2025 14:09
Copy link
Contributor

@tcoudray-pass tcoudray-pass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Il faut tester en staging

Bug: if chunk_size was 0, the while loop would never stop since the
len([]) can never be < 0.
First step: clean books, cds and vinyles. If the offer has its EAN
inside its name, find the product and update its information from it.
If the EAN is unknown or if the product is not allowed by the GCU,
reject it.
@jbaudet-pass jbaudet-pass force-pushed the PC-33771/clean_ean_inside_offer_titles_books_cd_vinyles_only branch from 32f77bf to be62564 Compare January 23, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants