API Documentation¶
-
class
topsbm.
TopSBM
(n_init=1, min_groups=None, max_groups=None, weighted_edges=True, random_state=None)[source]¶ A Scikit-learn compatible transformer for hSBM topic models
Parameters: - n_init : int, default=1
Number of random initialisations to perform in order to avoid a local minimum of MDL. The minimum MDL solution is chosen.
- min_groups : int, default=None
The minimum number of word and docuent groups to infer. This is also a lower bound on the number of topics.
- max_groups : int, default=None
The maximum number of word and docuent groups to infer. This also an upper bound on the number of topics.
- weighted_edges : bool, default=True
When True, edges are weighted instead of adding duplicate edges.
- random_state : None, int or np.random.RandomState
Controls randomization. See Scikit-learn’s glossary.
Note that if this is set, the global random state of libcore will be affected, and the global random state of numpy will be temporarily affected.
References
Martin Gerlach, Tiago P. Peixoto, and Eduardo G. Altmann, “A network approach to topic models,”. Science Advances (2018)
Attributes: - graph_ : graph_tool.Graph
Bipartite graph between samples (the first n_samples_ vertices) and features (the remaining vertices)
- state_
Inference state from graphtool
- n_levels_ : int
The number of levels in the inferred hierarchy of groups.
- groups_ : dict
Results of group membership from inference. Key is an integer, indicating the level of grouping (starting from 0). Value is a dict of information about the grouping which contains:
- B_d : int
number of doc-groups
- B_w : int
number of word-groups
- p_tw_d : array of shape (B_w, d)
doc-topic mixtures: prob of word-group tw in doc d P(tw | d)
- p_td_d : array of shape (B_d, n_samples)
doc-group membership: prob that doc-node d belongs to doc-group td: P(td | d)
- p_tw_w : array of shape (B_w, n_features)
word-group-membership: prob that word-node w belongs to word-group tw: P(tw | w)
- p_w_tw : array of shape (n_features, B_w)
topic distribution: prob of word w given topic tw P(w | tw)
Here “d”/document refers to samples; “w”/word refers to features.
- mdl_
minimum description length of inferred state
- n_features_ : int
- n_samples_ : int
Methods
fit
(X[, y])Fit the hSBM topic model fit_transform
(X[, y])Fit the hSBM topic model get_params
([deep])Get parameters for this estimator. plot_graph
([filename, n_edges])Plots arcs from documents to words coloured by inferred group set_params
(**params)Set the parameters of this estimator. -
fit
(X, y=None)[source]¶ Fit the hSBM topic model
Constructs a graph representation of X and infers clustering.
Parameters: - X : ndarray or sparse matrix of shape (n_samples, n_features)
Word frequencies for each document, represented as non-negative integers.
- y : ignored
Returns: - self
-
fit_transform
(X, y=None)[source]¶ Fit the hSBM topic model
Constructs a graph representation of X, infers clustering, and reports the cluster probability for each sample in X.
Parameters: - X : ndarray or sparse matrix of shape (n_samples, n_features)
Word frequencies for each document, represented as non-negative integers.
- y : ignored
Returns: - Xt : ndarray of shape (n_samples, n_components)
The cluster probability for each sample in X