gnnwr.datasets module

gnnwr.datasets.BasicDistance(x, y)[source]

Calculate the distance between two points

Parameters:
  • x – Input point coordinate data

  • y – Input target point coordinate data

Returns:

distance matrix

gnnwr.datasets.Manhattan_distance(x, y)[source]

Calculate the Manhattan distance between two points

Parameters:
  • x – Input point coordinate data

  • y – Input target point coordinate data

Returns:

distance matrix

class gnnwr.datasets.baseDataset(data=None, x_column: list | None = None, y_column: list | None = None, id_column=None, is_need_STNN=False)[source]

Bases: Dataset

baseDataset is the base class of dataset, which is used to store the data and other information. it also provides the function of data scaling, data saving and data loading.

Parameters:
  • data – DataSets with x_column and y_column

  • x_column – independent variables column name

  • y_column – dependent variables column name

  • is_need_STNN – whether to use STNN

getScaledDataframe()[source]

get the scaled dataframe

read(dirname)[source]

read the dataset by the directory

Parameters:

dirname – read directory

rescale(x, y)[source]
Parameters:
  • x – Input independent variable data

  • y – Input dependent variable data

Returns:

rescaled x and y

save(dirname)[source]

save the dataset

Parameters:

dirname – save directory

scale(scale_fn=None, scale_params=None)[source]

scale the data by MinMaxScaler or StandardScaler

Parameters:
  • scale_fn – scale function

  • scale_params – scale parameters like MinMaxScaler or StandardScaler

scale2(scale_fn, scale_params)[source]

scale the data with the scale function and scale parameters

Parameters:
  • scale_fn – scale function

  • scale_params – scale parameters like max and min

gnnwr.datasets.init_dataset(data, test_ratio, valid_ratio, x_column, y_column, spatial_column=None, temp_column=None, id_column=None, sample_seed=100, process_fn='minmax_scale', batch_size=32, shuffle=True, use_class=<class 'gnnwr.datasets.baseDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_val_size=-1, max_test_size=-1, from_for_cv=0, is_need_STNN=False, Reference=None, simple_distance=True)[source]

Initialize the dataset and return the training set, validation set and test set for the model

Parameters:
  • data – dataset

  • test_ratio – test data ratio

  • valid_ratio – valid data ratio

  • x_column – input attribute column name

  • y_column – output attribute column name

  • spatial_column – spatial attribute column name

  • temp_column – temporal attribute column name

  • sample_seed – random seed

  • process_fn – data pre-process function

  • batch_size – batch size

  • max_val_size – max valid data size in one injection

  • max_test_size – max test data size in one injection

  • shuffle – shuffle data

  • use_class – dataset class

  • spatial_fun – spatial distance calculate function

  • temporal_fun – temporal distance calculate function

  • from_for_cv – the start index of the data for cross validation

  • is_need_STNN – whether to use STNN

  • Reference – reference points to calculate the distance

  • simple_distance – whether to use simple distance function to calculate the distance

Returns:

train dataset, valid dataset, test dataset

gnnwr.datasets.init_dataset_cv(data, test_ratio, k_fold, x_column, y_column, spatial_column=None, temp_column=None, id_column=None, sample_seed=100, process_fn='minmax_scale', batch_size=32, shuffle=True, use_class=<class 'gnnwr.datasets.baseDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_val_size=-1, max_test_size=-1, is_need_STNN=False, Reference=None, simple_distance=True)[source]

initialize dataset for cross validation

Parameters:
  • data – input data

  • test_ratio – test set ratio

  • k_fold – k of k-fold

  • x_column – attribute column name

  • y_column – label column name

  • spatial_column – spatial distance column name

  • temp_column – temporal distance column name

  • id_column – id column name

  • sample_seed – random seed

  • process_fn – data process function

  • batch_size – batch size

  • shuffle – shuffle or not

  • use_class – dataset class

  • spatial_fun – spatial distance calculate function

  • temporal_fun – temporal distance calculate function

  • max_val_size – validation set size

  • max_test_size – test set size

  • is_need_STNN – whether need STNN

  • Reference – reference data

  • simple_distance – is simple distance

Returns:

cv_data_set, test_dataset

gnnwr.datasets.init_predict_dataset(data, train_dataset, x_column, spatial_column=None, temp_column=None, process_fn='minmax_scale', scale_sync=True, use_class=<class 'gnnwr.datasets.predictDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_size=-1, is_need_STNN=False)[source]

initialize predict dataset

Parameters:
  • data – input data

  • train_dataset – train data

  • x_column – attribute column name

  • spatial_column – spatial distance column name

  • temp_column – temporal distance column name

  • process_fn – data process function

  • scale_sync – scale sync or not

  • max_size – max size of predict dataset

  • use_class – dataset class

  • spatial_fun – spatial distance calculate function

  • temporal_fun – temporal distance calculate function

  • is_need_STNN – is need STNN or not

Returns:

predict_dataset

gnnwr.datasets.load_dataset(directory, use_class=<class 'gnnwr.datasets.baseDataset'>)[source]
class gnnwr.datasets.predictDataset(data, x_column, process_fn='minmax_scale', scale_info=[], is_need_STNN=False)[source]

Bases: Dataset

Predict dataset is used to predict the dependent variable of the data.

Parameters:
  • data – dataframe

  • x_column – independent variable column name

  • process_fn – process function name

  • scale_info – process function parameters

  • is_need_STNN – whether need STNN

minmax_scaler(x, min=[], max=[])[source]

function of minmax scaler

Parameters:
  • x – Input attribute data

  • min – minimum value of each attribute

  • max – maximum value of each attribute

Returns:

Output attribute data

rescale(x)[source]

rescale the attribute data

Parameters:

x – Input attribute data

Returns:

rescaled attribute data

standard_scaler(x, mean=[], std=[])[source]

function of standard scaler

Parameters:
  • x – Input attribute data

  • mean – mean value of each attribute

  • std – standard deviation of each attribute

Returns:

Output attribute data