SpareNTM is an open-source Python package implementing the algorithm proposed in the paper (Chen and Wang etc, ECML PKDD 2023), created by Jiayao Chen. For more details, please refer to this paper.
If you use this package, please cite the paper: Jiayao Chen, Rui Wang, Jueying He and Mark Junjie Li. Encouraging Sparsity in Neural Topic Modeling with Non-Mean-Field Inference. In Proceedings of ECML PKDD 2023, pp. 142-158.
If you have any questions or bug reports, please contact Jiayao Chen ([email protected]).
- python==3.6
- tensorflow-gpu==1.13.1
- numpy
- gensim
Note: the data in the path ./data has been preprocessed with tokenization, filtering non-Latin characters, etc before, from Scholar and the Search Snipppets.
python SpareNTM.py --learning_rate 0.0001 --dir_prior 0.02 --bern_prior 0.05 --bs 200 --n_topic 50 --warm_up_period 100 --data_dir ./data/20ng/ --data_name 20ng -output_dir ./output/
--learning_rate
: Specify the learning rate.
--dir_prior
: Specify the value of the Dirichlet prior.
--bern_prior
: Specify the value of the Bernoulli prior.
--bs
: Specify the number of the batch size.
--n_topic
: Specify the number of topics. The default value is 50.
--warm_up_period
:
--data_dir
: Specify the path to the input corpus file.
--data_name
: Specify the name of the input corpus file.
--output_dir
: Specify the path to the output directory.
topic coherence: NPMI.
topic diversity: TU.
topic quality: NPMI x TU
For computing the NPMI, the external data should be downloaded to the current directory ./
as ./ref_corpus
bash run_computeQuality.sh
If you want to use our code, please cite as
@inproceedings{DBLP:conf/pkdd/ChenWHL23,
author = {Jiayao Chen and
Rui Wang and
Jueying He and
Mark Junjie Li},
title = {Encouraging Sparsity in Neural Topic Modeling with Non-Mean-Field
Inference},
booktitle = {Machine Learning and Knowledge Discovery in Databases: Research Track
- European Conference, {ECML} {PKDD} 2023, Turin, Italy, September
18-22, 2023, Proceedings, Part {IV}},
series = {Lecture Notes in Computer Science},
volume = {14172},
pages = {142--158},
publisher = {Springer},
year = {2023},
url = {https://doi.org/10.1007/978-3-031-43421-1\_9},
doi = {10.1007/978-3-031-43421-1\_9},
}