-
word_proportions()
: after careful testing, we decided to deprecate this function. The main reason is the lack of support for high-dimensionaldfm
inputs. Now all the functions that require word proportions take advantage of the classdfm
from quanteda. -
sim_dfm()
: we introduced a new function to easily simulate a document-feature-matrix from a LDA specification. This is useful for simulating corpora for testing.
-
Document matching between input LDAs and
weighted_dfm
has been improved. Now the check does not rely on a specificdocvar
anymore but rather on the internal document naming convention as defined in thecorpus
. -
Small tweaks in the computation speed for certain operations.
-
Improved documentation.
- If the optimal model is the last in the list of LDA objects passed to
optimal_topic()
, now all the other functions devoted to testing returnNULL
with a message. This is because there is nothing to test in terms of topic stability above the optimal model.
word_proportions()
is much general now and can carry out complex preprocessing routines.optimal_topic()
is much faster now due to matrix implementation. Now it checks for the presence of documents in both thecorpus
/dfm
and the estimates byLDA()
.
- The argument
remove_documents
inword_proportions()
is now set toFALSE
by default. This automatically triggers document check inoptimal_topic()
. - Preprocessing in
word_proportions()
is achieved by the new general argument...
. word_proportion()
can now work on acorpus
anddfm
objects as defined in quanteda.- Improved documentation
- Fixing tiny minor things
- Just kidding! No more data here...we'll figure out something later on...
- A set of pre-run LDA models are now available.
agg_document_stability()
has been fully implemented. The function returns both the Aggregated Document Stability test and the F-test on informative and uninformative components.
-
General improvements in plots.
-
Better documentation.
-
Slighty faster functions.
-
agg_topic_stability()
can now compute smoothed tests and plot the results accordingly. -
Support for final convertion to a
tibble
table spreaded out to all functions, -
All eligible functions get better plots.
-
optimal_topic()
gains the parameterq
which allows to select the quantile of the cumulative probability of word weights to consider as relevant. -
optimal_topic()
now finds the optimal number of topics either by significance levels or by forcing the algorithm to reach the global minimum. This is controlled by the new parameteralpha
.
-
optimal_topic()
drops boththreshold
andq_type
. -
In
optimal_topic()
,convert
now supportstibble
structure.
-
Function
agg_topic_stability()
has been widely improved. -
All functions which return a test now gain the new argument
do_plot
. This plot the test statistic as a function of the number of topics. -
The argument
test
has been removed fromtopic_stability()
which now returns only the aggregate statistic. -
The argument
compute_res
has been finally removed fromtopic_stability()
. -
topic_stability()
now returns either a data.frame or a data.table with the LDA specifiction associated to each statistic (i.e. columntopic
).
- Improved documentation for some functions.
Since we have two more functions, I feel like this deserves a jump in package version.
-
topic_match()
: detect and extract informative and uninformative components. -
agg_topic_stability()
: implements Test 4 from the methodological paper [Lewis and Grossetti (2019)].
- Improved documentation for some functions.
-
get_topic_models()
: handy function to immediately get the list of topic models the user wants to process from a specified environment; -
topic_stability()
: implements Tests 2 and 3 from the methodological paper [Lewis and Grossetti (2019)].
-
Formal declaration of
LDA_VEM
objects as functions input. -
All the functions now have more detailed and better documentations.
-
Added Continuous Integration with Travis CI and AppVeyor.
- Choice of quantile algorithms is now fully supported.
First version!
-
word_proportions()
: computes word proportions from acorpus
object created by quanteda [Benoit et al. (2018)]; -
optimal_topic()
: implements Test 1 from the methodological paper [Lewis and Grossetti (2019)].