API Reference
reddwarf.implementations.polis
reddwarf.implementations.polis.run_clustering(votes, mod_out_statement_ids=[], min_user_vote_threshold=7, keep_participant_ids=[], init_centers=None, max_group_count=5, force_group_count=None, random_state=None)
An essentially feature-complete implementation of the Polis clustering algorithm.
Still missing
- base-cluster calculations (so can't match output of conversations larger than 100 participants),
- k-smoothing, which holds back k-value (group count) until re-calculated 3 consecutive times,
- some advanced participant filtering that involves past state (you can use keep_participant_ids to mimic manually).
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/implementations/polis.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
reddwarf.implementations.agora
reddwarf.implementations.agora.run_clustering_v1(conversation, options={})
A minimal Polis-based clustering agorithm suitable for use by Agora Citizen Network.
This does the following:
- builds a vote matrix (includes as statement with at least 1 participant vote),
- filters out any participants with less than 7 votes,
- runs PCA and projects active participants into 2D coordinates,
- scales the projected participants out from center when low number of votes,
- test 2-5 groups for best k-means fit via silhouette scores (random state set for reproducibility)
- returns a list of clusters, each with a list of participant members and their projected 2D coordinates.
Warning
This will technically function without PASS votes, but scaling factors will not be effective in compensating for missing votes, and so participant projections will be bunched up closer to the origin.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/implementations/agora.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
reddwarf.utils.matrix
reddwarf.utils.matrix.generate_raw_matrix(votes, cutoff=None)
Generates a raw vote matrix from a list of vote records.
See filter_votes
method for details of cutoff
arg.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/matrix.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
|
reddwarf.utils.matrix.simple_filter_matrix(vote_matrix, mod_out_statement_ids=[])
The simple filter on the vote_matrix that is used by Polis prior to running PCA.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/matrix.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
|
reddwarf.utils.matrix.get_participant_ids(vote_matrix, vote_threshold)
Find participant IDs that meet a vote threshold in a vote_matrix.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/matrix.py
163 164 165 166 167 168 169 170 171 172 173 174 |
|
reddwarf.utils.pca
reddwarf.utils.pca.run_pca(vote_matrix, n_components=2)
Process a prepared vote matrix to be imputed and return projected participant data, as well as eigenvectors and eigenvalues.
The vote matrix should not yet be imputed, as this will happen within the method.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/pca.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
|
reddwarf.utils.clustering
reddwarf.utils.clustering.find_optimal_k(projected_data, max_group_count=5, init_centers=None, random_state=None, debug=False)
Use silhouette scores to find the best number of clusters k to assume to fit the data.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/clustering.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
|
reddwarf.utils.clustering.run_kmeans(dataframe, n_clusters=2, init_centers=None, random_state=None)
Runs K-Means clustering on a 2D DataFrame of xy points, for a specific K, and returns labels for each row and cluster centers. Optionally accepts guesses on cluster centers, and a random_state to reproducibility.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/clustering.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
reddwarf.utils
(These are in the process of being either moved or deprecated.)
reddwarf.utils.filter_votes(votes, cutoff=None)
Filters a list of votes.
If a cutoff
is provided, votes are filtered based on either:
- An
int
representing unix timestamp (ms), keeping only votes before or at that time.- Any int above 13_000_000_000 is considered a timestamp.
- Any other positive or negative
int
is considered an index, reflecting where to trim the time-sorted vote list.- positive: filters in votes that many indices from start
- negative: filters out votes that many indices from end
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/matrix.py
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
|
reddwarf.utils.filter_matrix(vote_matrix, min_user_vote_threshold=7, active_statement_ids=[], keep_participant_ids=[], unvoted_filter_type='drop')
Generates a filtered vote matrix from a raw matrix and filter config.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/matrix.py
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 |
|
reddwarf.utils.impute_missing_votes(vote_matrix)
Imputes missing votes in a voting matrix using column-wise mean. All columns must have at least one vote.
Reference
Small, C. (2021). "Polis: Scaling Deliberation by Mapping High Dimensional Opinion Spaces." Specific highlight: https://hyp.is/8zUyWM5fEe-uIO-J34vbkg/gwern.net/doc/sociology/2021-small.pdf
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/matrix.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
reddwarf.utils.scale_projected_data(projected_data, vote_matrix)
Scale projected participant xy points based on vote matrix, to account for any small number of votes by a participant and prevent those participants from bunching up in the center.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/utils/pca.py
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
reddwarf.utils.get_unvoted_statement_ids(vote_matrix)
A method intended to be piped into a VoteMatrix DataFrame, returning list of unvoted statement IDs.
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pipe.html
Parameters: |
|
---|
Returns: |
|
---|
Example:
unused_statement_ids = vote_matrix.pipe(get_unvoted_statement_ids)
Source code in reddwarf/utils/matrix.py
121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
|
reddwarf.data_presenter
reddwarf.data_presenter.generate_figure(coord_dataframe, labels=None)
Generates a matplotlib scatterplot with optional bounded clusters.
The plot is drawn from a dataframe of xy values, each point labelled by index participant_id
.
When a list of labels are supplied (corresponding to each row), concave hulls are drawn around them.
Parameters: |
|
---|
Returns: |
|
---|
Source code in reddwarf/data_presenter.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
Types
reddwarf.types.agora.Conversation
Bases: TypedDict
Attributes: |
|
---|
Source code in reddwarf/types/agora.py
50 51 52 53 54 55 |
|
reddwarf.types.agora.Vote
Bases: TypedDict
Attributes: |
|
---|
Source code in reddwarf/types/agora.py
38 39 40 41 42 43 44 45 46 47 48 |
|
reddwarf.types.agora.VoteValueEnum
Bases: IntEnum
Source code in reddwarf/types/agora.py
32 33 34 35 36 |
|
reddwarf.types.agora.Identifier = int | str
module-attribute
reddwarf.types.agora.ClusteringOptions
Bases: TypedDict
Attributes: |
|
---|
Source code in reddwarf/types/agora.py
73 74 75 76 77 78 79 80 |
|
reddwarf.types.agora.ClusteringResult
Bases: TypedDict
Attributes: |
|
---|
Source code in reddwarf/types/agora.py
66 67 68 69 70 71 |
|
reddwarf.types.agora.Cluster
Bases: TypedDict
Attributes: |
|
---|
Source code in reddwarf/types/agora.py
57 58 59 60 61 62 63 64 |
|
reddwarf.types.agora.ClusteredParticipant
Bases: TypedDict
Attributes: |
|
---|
Source code in reddwarf/types/agora.py
21 22 23 24 25 26 27 28 29 30 |
|