Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement step: updating light_utils.py to run with current Pandas & Numpy when there are no junctions #26

Open
fomightez opened this issue Jul 25, 2024 · 0 comments

Comments

@fomightez
Copy link

fomightez commented Jul 25, 2024

Current Pandas (v 2.2.2) and Numpy (v 2.0.1) don't seem very compatible with the current light_utils.py, at least if there seem to be no splice junctions in a tiny aligned dataset. (Or maybe it is even a problem if the input data has some junctions; I'm still trying to sort that out.)

Trying to run the workflow with a tiny dataset is valid because you maybe are trying to sort out if you have all the dependencies installed and so want to be able to quickly step through the entire process. (Especially since the test data and associated annotation files, etc., that apparently were available before may not be accessible, see here?)

I have tried updating it. Here is what I have. You can see it in action by going here and clicking on 'launch Sicilian in JupyterLab' badge there. When the session comes up step through running all the cells. The final cell there at the end of that Jupyter Notebook presently includes running the modify_refnames() function that is in light_utils.py. You can swap in the original light_utils.py to see it will error out at line 38 in modify_refnames() with AttributeError: Can only use .str accessor with string values!.

The differences are shown here.
All but a couple of the changes are casting explicitly to string type before next chaining in using string methods or building a string. The other couple of changes address not setting on a copy by instead re-assigning the column back to CI_new without using inplace=True.

I will admit I am very unsure about my changes to what correspond to lines 118 and higher in the original light_utils.py. Unfortunately, I couldn't quite get a good situation set up quite yet to fully compare the action of the original and my modified lines with real data. But even the earlier lines needed some updating, and so I thought prompting a discussion at this point was still fine.

Even with earlier Pandas and Numpy ( v 1.5.1 & 1.23.5, respectively) I was seeing issues when I ran my simplistic test dataset with the original light_utils.py.

The current script I edited also works without error with those older versions of Pandas & Nunpy, too. It only shows a deprecation warning, which is moot since I specify the value of the regex argument. Here is that part of the run isolated to show the warning it gives:

started modify 0.007201433181762695
/home/jovyan/SICILIAN-binder/scripts/light_utils.py:53: FutureWarning: The default value of regex will change from True to False in a future version.
  CI_new.loc[ind,"geneR1" + suff] = CI_new.loc[ind,"geneR1" + suff].astype(str).str.replace("{}[^,]*[,]".format(weird_gene),"",regex=True).str.replace(",{}.*".format(weird_gene),"")  # added cast to str based on https://stackoverflow.com/a/52065957/8508004
ended modify 0.22089719772338867

(Practical reminder note to myself, I was able to test the original light_utils.py with older Pandas and Numpy, in conjunction with the simplistic demo by launching a Jupyter session from here and then cloning in my repo and then replacing the altered light_utils.py with the original. [Had to install pyarrow and pysam and specify to install STAR aligner 2.7.11 with%conda install -y bioconda::star=2.7.11, too.] That Binder example repo currently has older Pandas and Numpy, v 1.5.1 & 1.23.5, respectively.)

@fomightez fomightez changed the title Enhancement request: update light_utils.py to run with current Pandas & Numpy Enhancement step: updating light_utils.py to run with current Pandas & Numpy Jul 26, 2024
@fomightez fomightez changed the title Enhancement step: updating light_utils.py to run with current Pandas & Numpy Enhancement step: updating light_utils.py to run with current Pandas & Numpy when there are no junctions Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant