Unsupervised Graph Learning

Evaluation utils for unsupervised graph learning tasks.

the_utils.evaluation.unsupervised_graph_learning.generate_split(num_nodes: int, train_ratio: int, valid_ratio: int) → Tuple[ndarray][source]

generate the train, val and test set.

Parameters:

num_nodes (int) – num of nodes.
train_ratio (int) – node ratio of training set.
valid_ratio (int) – node ratio of valid set.

Returns:

[train set node ids, val set node ids, test set node ids]

Return type:

Tuple[np.ndarray]

the_utils.evaluation.unsupervised_graph_learning.split_train_test_nodes(num_nodes: int, train_ratio: int, valid_ratio: int, data_name: str, split_id: int = 0, split_times: int = 10, fixed_split: bool = True, split_save_dir: str = './data') → Tuple[ndarray, ndarray, ndarray][source]

Split training and test set.

Parameters:

num_nodes (int) – num of nodes.
train_ratio (int) – training ratio.
valid_ratio (int) – valid ratio.
data_name (str) – dataset name.
split_id (int, optional) – the idx of the split. Defaults to 0.
split_times (int, optional) – num of the random splits. Defaults to 10.
fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.
split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.

Returns:

[train_idx, val_idx, test_idx]

Return type:

Tuple[np.ndarray, np.ndarray, np.ndarray]

the_utils.evaluation.unsupervised_graph_learning.cluster_eval(y_true: Tensor | ndarray, y_pred: Tensor | ndarray) → Tuple[float][source]

Evaluate Clustering.

Parameters:

y_true (Union[torch.Tensor, np.ndarray]) – ground truth label.
y_pred (Union[torch.Tensor, np.ndarray]) – predicted label.

Returns:

[ACC, NMI, AMI, ARI, MacroF1, Purity]

Return type:

Tuple[List[Tuple[float]], float]

the_utils.evaluation.unsupervised_graph_learning.kmeans_test(X: Tensor | ndarray, y: Tensor | ndarray, n_clusters: int, repeat: int = 1) → Tuple[float][source]

Evaluate Embedding with kmeans.

Parameters:

X (Union[torch.Tensor, np.ndarray]) – embedding.
y (Union[torch.Tensor, np.ndarray]) – ground truth label.
n_clusters (int) – num of clusters.
repeat (int, optional) – kmeans repeat times. Defaults to 0.

Returns:

[svm_macro_f1_list, svm_micro_f1_list, acc_mean, acc_std, nmi_mean, nmi_std, ami_mean, ami_std, ari_mean,ari_std, f1_mean, f1_std, purity_mean, purity_std]

Return type:

Tuple[List[Tuple[float]], float]

the_utils.evaluation.unsupervised_graph_learning.svm_test(num_nodes: int, data_name: str, embeddings: tensor, labels: ndarray, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), repeat: int = 3, fixed_split: bool = True, split_save_dir: str = './data') → Tuple[List[Tuple[float]]][source]

Linear regression (node classification) using SVM on embedding.

Parameters:

num_nodes (int) – number of nodes.
data_name (str) – dataset name.
embeddings (torch.tensor) – node embeddings.
labels (np.ndarray) – ground truth labels.
train_ratios (tuple, optional) – split ratio of training set. Defaults to (10, 20, 30, 40).
valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).
repeat (int, optional) – svm repeat times. Defaults to 10.
fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.
split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.

Returns:

([(mean, std),(mean, std),…] of macro_f1 for all train_ratios,[(mean, std),(mean, std),…] of micro_f1 for all train_ratios)

Return type:

Tuple[List[Tuple[float]]]

the_utils.evaluation.unsupervised_graph_learning.evaluate_clf_cls(labels: ndarray, num_classes: int, num_nodes: int, data_name: str, embeddings: Tensor, quiet: bool = True, method: str = 'both', clf_repeat: int = 3, cls_repeat: int = 1, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), fixed_split: bool = True, split_save_dir: str = './data') → Tuple[List[Tuple[float]], float][source]

Evaluation of node classification (linear regression) and clustering.

Parameters:

labels (np.ndarray) – Labels.
num_classes (int) – Num of classes.
num_nodes (int) – Num of nodes.
data_name (str) – Dataset name.
embeddings (torch.Tensor) – Node embedding matrix.
quiet (bool, optional) – Whether to print info. Defaults to True.
method (bool, optional) – method for evaluation, “clf” for linear regression (node classification), “cls” for node clustering, “both” for both. Defaults to “both”.
clf_repeat (int, optional) – node classification repeat times. Defaults to 3.
cls_repeat (int, optional) – node clustering repeat times. Defaults to 1.
train_ratios (tuple, optional) – split ratio of training set for node classification. Defaults to (10, 20, 30, 40).
valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).
split_times (int, optional) – num of the random splits. Defaults to 10.
fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.
split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.

Returns:

[svm_macro_f1_list, svm_micro_f1_list, acc_mean, acc_std, nmi_mean, nmi_std, ami_mean, ami_std, ari_mean,ari_std, f1_mean, f1_std, purity_mean, purity_std]

Return type:

Tuple[List[Tuple[float]], float]

the_utils.evaluation.unsupervised_graph_learning.evaluate_from_embeddings(labels: ndarray, num_classes: int, num_nodes: int, data_name: str, embeddings: Tensor, quiet: bool = True, method: str = 'both', clf_repeat: int = 3, cls_repeat: int = 1, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), fixed_split: bool = True, split_save_dir: str = './data') → Tuple[Dict, Dict][source]

Evaluate embeddings with node classification (linear regression) and clustering.

Parameters:

labels (np.ndarray) – labels.
num_classes (int) – number of classes.
num_nodes (int) – number of nodes.
data_name (str) – name of the datasets.
embeddings (torch.Tensor) – embeddings.
quiet (bool, optional) – whether to print info. Defaults to True.
method (bool, optional) – method for evaluation, “clf” for linear regression (node classification), “cls” for node clustering, “both” for both. Defaults to “both”.
clf_repeat (int, optional) – node classification repeat times. Defaults to 3
cls_repeat (int, optional) – node clustering repeat times. Defaults to 1.
train_ratios (tuple, optional) – split ratio of training set for node classification. Defaults to (10, 20, 30, 40).
valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).
fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.
split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.

Returns:

(clustering_results, classification_results)

Return type:

Tuple[Dict, Dict]