Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Main exits with IOErrors #1

Open
simNN7 opened this issue Nov 10, 2014 · 11 comments
Open

Main exits with IOErrors #1

simNN7 opened this issue Nov 10, 2014 · 11 comments

Comments

@simNN7
Copy link

simNN7 commented Nov 10, 2014

Hi Lars,
thanks for the toolbox. I am having a hard time getting it to run though. Is the main.py supposed to be working `as is' or only with modifications? I downloaded the data set, changed all formats to .txt but running it (on an IMac with 10.9.5) returns

Traceback (most recent call last):
File "main.py", line 67, in
run_simulation('input/20news-bydate/20news-bydate-train','input/20news-bydate/20news-bydate-test',epochs = 50,attributes=2000,evaluation_points=[1,3,7,15,31,63],binary_output=True)
File "main.py", line 46, in run_simulation
dat_proc_train = data_processing.DataProcessing(train_paths,words_count=attributes,trainingset_size=1.0,acceptance_lst_path="input/acceptance_lst_stemmed.txt")
File "/Users/admin/Desktop/Deep-Belief-Nets-for-Topic-Modeling-master/DataPreparation/data_processing.py", line 42, in init
self.acceptance_lst = open(acceptance_lst_path).read().replace(" ","").split("\n")
IOError: [Errno 2] No such file or directory: 'input/acceptance_lst_stemmed.txt'

Removing the 'acceptance_lst_path' from `dat_proc_train = data_processing.DataProcessing...' (as in )results in

Traceback (most recent call last):
File "main.py", line 67, in
run_simulation('input/20news-bydate/20news-bydate-train','input/20news-bydate/20news-bydate-test',epochs = 50,attributes=2000,evaluation_points=[1,3,7,15,31,63],binary_output=True)
File "main.py", line 52, in run_simulation
dat_proc_test = data_processing.DataProcessing(test_paths,trainingset_size=0.0, trainingset_attributes=data_processing.get_attributes())
File "/Users/admin/Desktop/Deep-Belief-Nets-for-Topic-Modeling-master/DataPreparation/data_processing.py", line 437, in get_attributes
return s.load( open( env_paths.get_attributes_path(training), "rb" ) )
IOError: [Errno 2] No such file or directory: 'output/train/BOWs/attributes.p'

@larsmaaloee
Copy link
Owner

Hello gents.
Thank you very much for your interest in the toolbox. There are two cases in which errors occur:

  1. You are missing an acceptance list (white words). If you do not want to use one such, then don't.
  2. As the parameters you have set the traniningset_size to 0.0, which means that it is trying to generate the testing set. This is not possible though, since the training set has not been generated yet. So please set the training set size to an appropriate level, such as 0.7 (70 %) if all documents are collected in one.

During the next days I will take a look at making the training easier to understand. I can see why this confuse you a lot. But the way that it has been setup now leaves many ways of generating the dataset, which is very handy when doing scientific analysis of different DBNs. Let me know if this helps you getting started using the toolbox?

Best regards
Lars

@karenkua
Copy link

karenkua commented Jan 6, 2015

Hi Lars,

Having the same issue as Vamsi-lg, could you help? Thanks!

@larsmaaloee
Copy link
Owner

Hello Karenkua and Vamsi-Ig. The attribute list (saved as serialised file attribute.p) must be generated in the data preparation by the def "__set_attributes" as a part of the generation of the training set. This is the list of words for the BOW. Please let me know how that works.

@karenkua
Copy link

karenkua commented Jan 7, 2015

Hi Lars, thanks for the prompt reply, greatly appreciated. I tried the following steps and different problems arise.

  1. In main.py, I uncommented: dat_proc_train.generate_bows() to generate the BOW.
  2. In __read_docs_from_filesystem of data_processing.py there's a if loop (shown below) checking if filenames end with .p. I commented the if loop given that the data files downloaded from 20 Newsgroups website are not in .p format

for doc in docs:
#if doc.endswith('.p'):

  1. However under "print 'Reading and saving docs from file system'" section of data_processing, docs_list = False for all files.

I assume that has something to do with the .p files in step 2, could you kindly advise if the input files have to be in .p format (any code for converting or source for downloading the files). I got mine from http://qwone.com/~jason/20Newsgroups/ as mentioned in README file. Or how could I go around this problem. Thanks again!

@larsmaaloee
Copy link
Owner

Hi again. So now I have made various amendments to the toolbox so that it should be much clearer what needs to be done. Please read the README.md file and follow the 3 examples. That should do the job to get you up-and-running on using the toolbox.

@jyb002
Copy link

jyb002 commented Feb 26, 2015

Hi Lars,

Thanks for your toolbox. I am trying to run your code, but it has the exception, which says "deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x))".

I googled solutions to fix it such as replacing the sigmoid function in dbn.py:241 with "return expit(x)" or "return .5 * (1+ than(.5 * x))". But neither of these changes works.

Do your have the same issue when your run the toolbox? And do you have any idea to solve it? Thx.

The details of the exception are shown as follows:

Pre Training
Visible units: 2000 Hidden units: 500
/deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp
  return 1. / (1 + exp(-x))
/deep-belief-nets-for-topic-modeling/DBN/pretraining.py:140: RuntimeWarning: divide by zero encountered in log
  perplexity = nansum(vis * log(softmax_value))
/deep-belief-nets-for-topic-modeling/DBN/pretraining.py:140: RuntimeWarning: invalid value encountered in multiply
  perplexity = nansum(vis * log(softmax_value))
Bottom units: 500 Top units: 500
Epoch[ 1]: Error = 1.7385879
Bottom units: 500 Output units: 128
Epoch[ 1]: Error = 32.1944861
Time  71.8855669498
Fine Tuning
Backprop: Epoch 1
Large batch: 1 of 36
/deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp
  return 1. / (1 + exp(-x))
/deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp
  return 1. / (1 + exp(-x))
/deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp
  return 1. / (1 + exp(-x))

@larsmaaloee
Copy link
Owner

Hi,

This happens because of numbers being too small. You’ll need to scale the data accordingly. But for most overflow warnings, they don’t have a real influence on the training.

Let me know if you have any more questions?

Best regards


Lars Maaløe
PHD Student
DTU Compute
Technical University of Denmark (DTU)

Email: [email protected], [email protected]
Phone: 0045 2229 1010
Skype: lars.maaloe
LinkedIn http://dk.linkedin.com/in/larsmaaloe
DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html

On 26 Feb 2015, at 17:32, jyb002 [email protected] wrote:

Hi Lars,

Thanks for your toolbox. I am trying to run your code, but it has the exception, which says "deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x))".

I googled solutions to fix it such as replacing the sigmoid function in dbn.py:241 with "return expit(x)" or "return .5 * (1+ than(.5 * x))". But neither of these changes works.

Do your have the same issue when your run the toolbox? And do you have any idea to solve it? Thx.

The details of the exception are shown as follows:

Pre Training
Visible units: 2000 Hidden units: 500
/deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp
return 1. / (1 + exp(-x))
/deep-belief-nets-for-topic-modeling/DBN/pretraining.py:140: RuntimeWarning: divide by zero encountered in log
perplexity = nansum(vis * log(softmax_value))
/deep-belief-nets-for-topic-modeling/DBN/pretraining.py:140: RuntimeWarning: invalid value encountered in multiply
perplexity = nansum(vis * log(softmax_value))
Bottom units: 500 Top units: 500
Epoch[ 1]: Error = 1.7385879
Bottom units: 500 Output units: 128
Epoch[ 1]: Error = 32.1944861
Time 71.8855669498
Fine Tuning
Backprop: Epoch 1
Large batch: 1 of 36
/deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp
return 1. / (1 + exp(-x))
/deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp
return 1. / (1 + exp(-x))
/deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp
return 1. / (1 + exp(-x))


Reply to this email directly or view it on GitHub #1 (comment).

@AmrAzzam
Copy link

Hi Dr Lars,
I would like to thank you for publishing your code . I am trying to run your code I am facing some issue 1) there is a problem with the unpickle that some of the dataset files does not work with it so I removed this files from the data set
2) The parallel Stemming does not work .. it creates the files but if I tried to open this files I just find a array of boolean values False .. I am using Windows 7 64

[False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False, False, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False]

@alexminnaar
Copy link

There are definitely some problems with the importing the newsgroup training data

  1. Your code looks for files that end in ".p" however the newsgroup files are ".txt" files.
  2. When you change the code to look for ".txt" files, there are still some pickling errors that occur with some files.
  3. When you get rid of the files with pickling errors, the docs_list list in __set_attributes() contains all false values.

Have you tested this? Didn't you run into the ".p" problem?

@larsmaaloee
Copy link
Owner

Hi Alex,

Thanks for you interest in the toolbox.

The code is a little outdated, but there are no problems in running the code. The pickled files, are temporary lists of words, used for later BOW creation. You should not change the code to look for the .txt files. I believe what you are missing is the stemming. Please stem the files and then create the BOW, as is in the example code.

Let me know how it works. :)

Best regards


Lars Maaløe
PHD Student
Cognitive Systems, DTU Compute
Technical University of Denmark (DTU)

Email: [email protected], [email protected]
Phone: 0045 2229 1010
Skype: lars.maaloe
LinkedIn http://dk.linkedin.com/in/larsmaaloe
DTU Orbit http://orbit.dtu.dk/en/persons/lars-maaloee(0ba00555-e860-4036-9d7b-01ec1d76f96d).html

On 29 Sep 2015, at 02:18, Alex Minnaar [email protected] wrote:

There are definitely some problems with the importing the newsgroup training data

  1. Your code looks for files that end in ".p" however the newsgroup files are ".txt" files.
  2. When you change the code to look for ".txt" files, there are still some pickling errors that occur with some files.
  3. When you get rid of the files with pickling errors, the docs_list list in __set_attributes() contains all false values.

Have you tested this? Didn't you run into the ".p" problem?


Reply to this email directly or view it on GitHub #1 (comment).

@alexminnaar
Copy link

Apologies. The problem was that I did not have nltk installed for the stemming. Strangely the error did not say that I did not have nltk, instead it seemed to just skip stemming altogether which is what created the error associated with not creating any ".p" files. It seems to be working now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants