Unsupervised Graph Learning

Evaluation utils for unsupervised graph learning tasks.

the_utils.evaluation.unsupervised_graph_learning.generate_split(num_nodes: int, train_ratio: int, valid_ratio: int) Tuple[ndarray][source]

generate the train, val and test set.

Parameters:
  • num_nodes (int) – num of nodes.

  • train_ratio (int) – node ratio of training set.

  • valid_ratio (int) – node ratio of valid set.

Returns:

[train set node ids, val set node ids, test set node ids]

Return type:

Tuple[np.ndarray]

the_utils.evaluation.unsupervised_graph_learning.split_train_test_nodes(num_nodes: int, train_ratio: int, valid_ratio: int, data_name: str, split_id: int = 0, split_times: int = 10, fixed_split: bool = True, split_save_dir: str = './data') Tuple[ndarray, ndarray, ndarray][source]

Split training and test set.

Parameters:
  • num_nodes (int) – num of nodes.

  • train_ratio (int) – training ratio.

  • valid_ratio (int) – valid ratio.

  • data_name (str) – dataset name.

  • split_id (int, optional) – the idx of the split. Defaults to 0.

  • split_times (int, optional) – num of the random splits. Defaults to 10.

  • fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.

  • split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.

Returns:

[train_idx, val_idx, test_idx]

Return type:

Tuple[np.ndarray, np.ndarray, np.ndarray]

the_utils.evaluation.unsupervised_graph_learning.cluster_eval(y_true: Tensor | ndarray, y_pred: Tensor | ndarray) Tuple[float][source]

Evaluate Clustering.

Parameters:
  • y_true (Union[torch.Tensor, np.ndarray]) – ground truth label.

  • y_pred (Union[torch.Tensor, np.ndarray]) – predicted label.

Returns:

[ACC, NMI, AMI, ARI, MacroF1, Purity]

Return type:

Tuple[List[Tuple[float]], float]

the_utils.evaluation.unsupervised_graph_learning.kmeans_test(X: Tensor | ndarray, y: Tensor | ndarray, n_clusters: int, repeat: int = 1) Tuple[float][source]

Evaluate Embedding with kmeans.

Parameters:
  • X (Union[torch.Tensor, np.ndarray]) – embedding.

  • y (Union[torch.Tensor, np.ndarray]) – ground truth label.

  • n_clusters (int) – num of clusters.

  • repeat (int, optional) – kmeans repeat times. Defaults to 0.

Returns:

[svm_macro_f1_list, svm_micro_f1_list, acc_mean, acc_std, nmi_mean, nmi_std, ami_mean, ami_std, ari_mean,ari_std, f1_mean, f1_std, purity_mean, purity_std]

Return type:

Tuple[List[Tuple[float]], float]

the_utils.evaluation.unsupervised_graph_learning.svm_test(num_nodes: int, data_name: str, embeddings: tensor, labels: ndarray, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), repeat: int = 3, fixed_split: bool = True, split_save_dir: str = './data') Tuple[List[Tuple[float]]][source]

Linear regression (node classification) using SVM on embedding.

Parameters:
  • num_nodes (int) – number of nodes.

  • data_name (str) – dataset name.

  • embeddings (torch.tensor) – node embeddings.

  • labels (np.ndarray) – ground truth labels.

  • train_ratios (tuple, optional) – split ratio of training set. Defaults to (10, 20, 30, 40).

  • valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).

  • repeat (int, optional) – svm repeat times. Defaults to 10.

  • fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.

  • split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.

Returns:

([(mean, std),(mean, std),…] of macro_f1 for all train_ratios,[(mean, std),(mean, std),…] of micro_f1 for all train_ratios)

Return type:

Tuple[List[Tuple[float]]]

the_utils.evaluation.unsupervised_graph_learning.evaluate_clf_cls(labels: ndarray, num_classes: int, num_nodes: int, data_name: str, embeddings: Tensor, quiet: bool = True, method: str = 'both', clf_repeat: int = 3, cls_repeat: int = 1, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), fixed_split: bool = True, split_save_dir: str = './data') Tuple[List[Tuple[float]], float][source]

Evaluation of node classification (linear regression) and clustering.

Parameters:
  • labels (np.ndarray) – Labels.

  • num_classes (int) – Num of classes.

  • num_nodes (int) – Num of nodes.

  • data_name (str) – Dataset name.

  • embeddings (torch.Tensor) – Node embedding matrix.

  • quiet (bool, optional) – Whether to print info. Defaults to True.

  • method (bool, optional) – method for evaluation, “clf” for linear regression (node classification), “cls” for node clustering, “both” for both. Defaults to “both”.

  • clf_repeat (int, optional) – node classification repeat times. Defaults to 3.

  • cls_repeat (int, optional) – node clustering repeat times. Defaults to 1.

  • train_ratios (tuple, optional) – split ratio of training set for node classification. Defaults to (10, 20, 30, 40).

  • valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).

  • split_times (int, optional) – num of the random splits. Defaults to 10.

  • fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.

  • split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.

Returns:

[svm_macro_f1_list, svm_micro_f1_list, acc_mean, acc_std, nmi_mean, nmi_std, ami_mean, ami_std, ari_mean,ari_std, f1_mean, f1_std, purity_mean, purity_std]

Return type:

Tuple[List[Tuple[float]], float]

the_utils.evaluation.unsupervised_graph_learning.evaluate_from_embeddings(labels: ndarray, num_classes: int, num_nodes: int, data_name: str, embeddings: Tensor, quiet: bool = True, method: str = 'both', clf_repeat: int = 3, cls_repeat: int = 1, train_ratios: Tuple[int] = (10, 20, 30, 40), valid_ratios: Tuple[int] = (10, 20, 30, 40), fixed_split: bool = True, split_save_dir: str = './data') Tuple[Dict, Dict][source]

Evaluate embeddings with node classification (linear regression) and clustering.

Parameters:
  • labels (np.ndarray) – labels.

  • num_classes (int) – number of classes.

  • num_nodes (int) – number of nodes.

  • data_name (str) – name of the datasets.

  • embeddings (torch.Tensor) – embeddings.

  • quiet (bool, optional) – whether to print info. Defaults to True.

  • method (bool, optional) – method for evaluation, “clf” for linear regression (node classification), “cls” for node clustering, “both” for both. Defaults to “both”.

  • clf_repeat (int, optional) – node classification repeat times. Defaults to 3

  • cls_repeat (int, optional) – node clustering repeat times. Defaults to 1.

  • train_ratios (tuple, optional) – split ratio of training set for node classification. Defaults to (10, 20, 30, 40).

  • valid_ratios (tuple, optional) – split ratio of validation set. Defaults to (10, 20, 30, 40).

  • fixed_split (bool, optional) – save the split after splitting once and load the fixed split next time. Defaults to True.

  • split_save_dir (str, optional) – the dir for saving the fixed split. Defaults to ‘./data’.

Returns:

(clustering_results, classification_results)

Return type:

Tuple[Dict, Dict]