-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Main exits with IOErrors #1
Comments
Hello gents.
During the next days I will take a look at making the training easier to understand. I can see why this confuse you a lot. But the way that it has been setup now leaves many ways of generating the dataset, which is very handy when doing scientific analysis of different DBNs. Let me know if this helps you getting started using the toolbox? Best regards |
Hi Lars, Having the same issue as Vamsi-lg, could you help? Thanks! |
Hello Karenkua and Vamsi-Ig. The attribute list (saved as serialised file attribute.p) must be generated in the data preparation by the def "__set_attributes" as a part of the generation of the training set. This is the list of words for the BOW. Please let me know how that works. |
Hi Lars, thanks for the prompt reply, greatly appreciated. I tried the following steps and different problems arise.
for doc in docs:
I assume that has something to do with the .p files in step 2, could you kindly advise if the input files have to be in .p format (any code for converting or source for downloading the files). I got mine from http://qwone.com/~jason/20Newsgroups/ as mentioned in README file. Or how could I go around this problem. Thanks again! |
Hi again. So now I have made various amendments to the toolbox so that it should be much clearer what needs to be done. Please read the README.md file and follow the 3 examples. That should do the job to get you up-and-running on using the toolbox. |
Hi Lars, Thanks for your toolbox. I am trying to run your code, but it has the exception, which says "deep-belief-nets-for-topic-modeling/DBN/dbn.py:241: RuntimeWarning: overflow encountered in exp return 1. / (1 + exp(-x))". I googled solutions to fix it such as replacing the sigmoid function in dbn.py:241 with "return expit(x)" or "return .5 * (1+ than(.5 * x))". But neither of these changes works. Do your have the same issue when your run the toolbox? And do you have any idea to solve it? Thx. The details of the exception are shown as follows: Pre Training |
Hi, This happens because of numbers being too small. You’ll need to scale the data accordingly. But for most overflow warnings, they don’t have a real influence on the training. Let me know if you have any more questions? Best regards Lars Maaløe Email: [email protected], [email protected]
|
Hi Dr Lars, [False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False, False, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False, False, False, False, False, False, <type 'exceptions.StopIteration'>, False] |
There are definitely some problems with the importing the newsgroup training data
Have you tested this? Didn't you run into the ".p" problem? |
Hi Alex, Thanks for you interest in the toolbox. The code is a little outdated, but there are no problems in running the code. The pickled files, are temporary lists of words, used for later BOW creation. You should not change the code to look for the .txt files. I believe what you are missing is the stemming. Please stem the files and then create the BOW, as is in the example code. Let me know how it works. :) Best regards Lars Maaløe Email: [email protected], [email protected]
|
Apologies. The problem was that I did not have nltk installed for the stemming. Strangely the error did not say that I did not have nltk, instead it seemed to just skip stemming altogether which is what created the error associated with not creating any ".p" files. It seems to be working now. Thanks! |
Hi Lars,
thanks for the toolbox. I am having a hard time getting it to run though. Is the main.py supposed to be working `as is' or only with modifications? I downloaded the data set, changed all formats to .txt but running it (on an IMac with 10.9.5) returns
Traceback (most recent call last):
File "main.py", line 67, in
run_simulation('input/20news-bydate/20news-bydate-train','input/20news-bydate/20news-bydate-test',epochs = 50,attributes=2000,evaluation_points=[1,3,7,15,31,63],binary_output=True)
File "main.py", line 46, in run_simulation
dat_proc_train = data_processing.DataProcessing(train_paths,words_count=attributes,trainingset_size=1.0,acceptance_lst_path="input/acceptance_lst_stemmed.txt")
File "/Users/admin/Desktop/Deep-Belief-Nets-for-Topic-Modeling-master/DataPreparation/data_processing.py", line 42, in init
self.acceptance_lst = open(acceptance_lst_path).read().replace(" ","").split("\n")
IOError: [Errno 2] No such file or directory: 'input/acceptance_lst_stemmed.txt'
Removing the 'acceptance_lst_path' from `dat_proc_train = data_processing.DataProcessing...' (as in )results in
Traceback (most recent call last):
File "main.py", line 67, in
run_simulation('input/20news-bydate/20news-bydate-train','input/20news-bydate/20news-bydate-test',epochs = 50,attributes=2000,evaluation_points=[1,3,7,15,31,63],binary_output=True)
File "main.py", line 52, in run_simulation
dat_proc_test = data_processing.DataProcessing(test_paths,trainingset_size=0.0, trainingset_attributes=data_processing.get_attributes())
File "/Users/admin/Desktop/Deep-Belief-Nets-for-Topic-Modeling-master/DataPreparation/data_processing.py", line 437, in get_attributes
return s.load( open( env_paths.get_attributes_path(training), "rb" ) )
IOError: [Errno 2] No such file or directory: 'output/train/BOWs/attributes.p'
The text was updated successfully, but these errors were encountered: