Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hangs using with large files #14

Open
baranberkay96 opened this issue May 28, 2021 · 0 comments
Open

Hangs using with large files #14

baranberkay96 opened this issue May 28, 2021 · 0 comments

Comments

@baranberkay96
Copy link

  • occupationcoder version: 0.2.0
  • Python version: Python 3.9.5
  • Operating System: MacOS Big Sur Version 11.2.3

Description

pip3 freeze

Here the output:

alabaster==0.7.12
appdirs==1.4.4
Babel==2.9.1
beautifulsoup4==4.9.3
bleach==3.3.0
bump2version==1.0.1
certifi==2020.12.5
chardet==4.0.0
click==8.0.1
cloudpickle==1.6.0
colorama==0.4.4
coverage==5.5
dask==2021.5.0
distlib==0.3.1
docutils==0.16
filelock==3.0.12
flake8==3.9.0
fsspec==2021.5.0
idna==2.10
imagesize==1.2.0
importlib-metadata==4.3.0
Jinja2==3.0.1
joblib==1.0.1
keyring==23.0.1
locket==0.2.1
MarkupSafe==2.0.1
mccabe==0.6.1
nltk==3.6.2
numpy==1.20.3
occupationcoder @ file:///Users/baranberkaybarakcin/Documents/learning/occupation-coder/occupationcoder/dist/occupationcoder-0.2.0.tar.gz
packaging==20.9
pandas==1.2.4
partd==1.2.0
pkginfo==1.7.0
pluggy==0.13.1
py==1.10.0
pycodestyle==2.7.0
pyflakes==2.3.1
Pygments==2.9.0
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2021.1
PyYAML==5.4.1
readme-renderer==29.0
regex==2021.4.4
requests==2.25.1
requests-toolbelt==0.9.1
rfc3986==1.5.0
scikit-learn==0.24.2
scipy==1.6.3
six==1.16.0
snowballstemmer==2.1.0
soupsieve==2.2.1
Sphinx==3.5.4
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
threadpoolctl==2.1.0
toml==0.10.2
toolz==0.11.1
tox==3.23.0
tqdm==4.61.0
twine==3.4.1
urllib3==1.26.5
virtualenv==20.4.7
watchdog==2.0.2
webencodings==0.5.1
zipp==3.4.1

We run this snippet:

import pandas as pd
from occupationcoder.coder import coder
myCoder = coder.Coder()

if __name__ == '__main__':

    df = pd.read_csv('construction.csv')
    df['job_sector'] = "Construction & Property"
    df = myCoder.codedataframe(df)
    df.head()

construction.csv is a relatively large file. It has approx. 40K row.

When we try to run the code with 'construction.csv', it hangs and never finishes. I think that it can be related with dask multithread count, however couldn't find the solution. I'll be glad if you can help me. Have a nice day :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant