Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using non-unique or partial keys as foreign keys in a CREATE TABLE or ALTER TABLE statement is deprecated -> Bug when creating genes table #392

Closed
northwestwitch opened this issue Dec 17, 2024 · 4 comments · Fixed by #394
Assignees
Labels
Bug Something isn't working Effort M Effort medium Refactor Urgency L Urgency large

Comments

@northwestwitch
Copy link
Member

I've started seeing this error when trying to create/populate the genes table:

    self.dialect.do_execute(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/chanjo_report/lib/python3.11/site-packages/sqlalchemy/engine/default.py", line 941, in do_execute
    cursor.execute(statement, parameters)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/chanjo_report/lib/python3.11/site-packages/MySQLdb/cursors.py", line 179, in execute
    res = self._query(mogrified_query)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/chanjo_report/lib/python3.11/site-packages/MySQLdb/cursors.py", line 330, in _query
    db.query(q)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/chanjo_report/lib/python3.11/site-packages/MySQLdb/connections.py", line 265, in query
    _mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb.OperationalError) (6125, "Failed to add the foreign key constraint. Missing unique key for constraint 'transcripts_ibfk_1' in the referenced table 'genes'")
[SQL:
CREATE TABLE transcripts (
	id INTEGER NOT NULL AUTO_INCREMENT,
	chromosome VARCHAR(6) NOT NULL,
	start INTEGER NOT NULL,
	stop INTEGER NOT NULL,
	ensembl_id VARCHAR(24) NOT NULL,
	refseq_mrna VARCHAR(24),
	refseq_mrna_pred VARCHAR(24),
	refseq_ncrna VARCHAR(24),
	refseq_mane_select VARCHAR(24),
	refseq_mane_plus_clinical VARCHAR(24),
	ensembl_gene_id VARCHAR(24) NOT NULL,
	build ENUM('GRCh37','GRCh38'),
	PRIMARY KEY (id),
	FOREIGN KEY(ensembl_gene_id) REFERENCES genes (ensembl_id)
)

@northwestwitch northwestwitch added the Bug Something isn't working label Dec 17, 2024
@northwestwitch
Copy link
Member Author

@northwestwitch
Copy link
Member Author

northwestwitch commented Dec 17, 2024

The problem is that if I add the unique constraint to the ensembl id field of the gene, here and then load the genes I get the following error:

 (MySQLdb.IntegrityError) (1062, "Duplicate entry 'ENSG00000230417' for key 'genes.ensembl_id'")
[SQL: INSERT INTO genes (chromosome, start, stop, ensembl_id, hgnc_id, hgnc_symbol, build) VALUES (%s, %s, %s, %s, %s, %s, %s)]
[parameters: [('10', 110868890, 110872366, 'ENSG00000203497', 27425, 'PDCD4-AS1', 'GRCh38'), ('12', 75257635, 75258634, 'ENSG00000258112', 50853, 'CCNG2P1', 'GRCh38'), ('5', 94618669, 94739436, 'ENSG00000133302', 25408, 'SLF1', 'GRCh38'), ('8', 145052465, 145066685, 'ENSG00000182307', 26104, 'C8orf33', 'GRCh38'), ('12', 48978322, 48982620, 'ENSG00000125084', 12774, 'WNT1', 'GRCh38'), ('10', 78179167, 78675109, 'ENSG00000230417', 31430, 'LINC00595', 'GRCh38'), ('10', 78179167, 78675109, 'ENSG00000230417', 45111, 'LINC00856', 'GRCh38'), ('12', 94306449, 94459988, 'ENSG00000173588', 17966, 'CEP83', 'GRCh38')  ... displaying 10 of 11 total bound parameter sets ...  ('7', 44748581, 44769881, 'ENSG00000122515', 22229, 'ZMIZ2', 'GRCh38'), ('10', 44793119, 44959733, 'ENSG00000224812', 27349, 'TMEM72-AS1', 'GRCh38')]]
(Background on this error at: https://sqlalche.me/e/20/gkpj)

Due to the fact that 2 genes apparently share the same Ensembl ID:

(schug) chiararasi@n145-p183 schug_downloads % grep ENSG00000230417  genes_GRCh38_test.txt
10	78179167	78675109	ENSG00000230417	LINC00595	HGNC:31430
10	78179167	78675109	ENSG00000230417	LINC00856	HGNC:45111

@northwestwitch northwestwitch self-assigned this Dec 19, 2024
@northwestwitch
Copy link
Member Author

Temporary fix is disabling this check in the database:

mysql> SHOW VARIABLES LIKE '%fk%';
+---------------------------------+-------+
| Variable_name                   | Value |
+---------------------------------+-------+
| restrict_fk_on_non_standard_key | ON    |
+---------------------------------+-------+
1 row in set (0,00 sec)

mysql> SET GLOBAL restrict_fk_on_non_standard_key = OFF;
Query OK, 0 rows affected, 1 warning (0,00 sec)

@northwestwitch
Copy link
Member Author

I'll work on this after the holidays

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Effort M Effort medium Refactor Urgency L Urgency large
Projects
None yet
1 participant