gnnwr.datasets module
- gnnwr.datasets.BasicDistance(x, y)[source]
Calculate the distance between two points
- Parameters:
x – Input point coordinate data
y – Input target point coordinate data
- Returns:
distance matrix
- gnnwr.datasets.Manhattan_distance(x, y)[source]
Calculate the Manhattan distance between two points
- Parameters:
x – Input point coordinate data
y – Input target point coordinate data
- Returns:
distance matrix
- class gnnwr.datasets.baseDataset(data=None, x_column: list | None = None, y_column: list | None = None, id_column=None, is_need_STNN=False)[source]
Bases:
DatasetbaseDataset is the base class of dataset, which is used to store the data and other information. it also provides the function of data scaling, data saving and data loading.
- Parameters:
data – DataSets with x_column and y_column
x_column – independent variables column name
y_column – dependent variables column name
is_need_STNN – whether to use STNN
- rescale(x, y)[source]
- Parameters:
x – Input independent variable data
y – Input dependent variable data
- Returns:
rescaled x and y
- gnnwr.datasets.init_dataset(data, test_ratio, valid_ratio, x_column, y_column, spatial_column=None, temp_column=None, id_column=None, sample_seed=100, process_fn='minmax_scale', batch_size=32, shuffle=True, use_class=<class 'gnnwr.datasets.baseDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_val_size=-1, max_test_size=-1, from_for_cv=0, is_need_STNN=False, Reference=None, simple_distance=True)[source]
Initialize the dataset and return the training set, validation set and test set for the model
- Parameters:
data – dataset
test_ratio – test data ratio
valid_ratio – valid data ratio
x_column – input attribute column name
y_column – output attribute column name
spatial_column – spatial attribute column name
temp_column – temporal attribute column name
sample_seed – random seed
process_fn – data pre-process function
batch_size – batch size
max_val_size – max valid data size in one injection
max_test_size – max test data size in one injection
shuffle – shuffle data
use_class – dataset class
spatial_fun – spatial distance calculate function
temporal_fun – temporal distance calculate function
from_for_cv – the start index of the data for cross validation
is_need_STNN – whether to use STNN
Reference – reference points to calculate the distance
simple_distance – whether to use simple distance function to calculate the distance
- Returns:
train dataset, valid dataset, test dataset
- gnnwr.datasets.init_dataset_cv(data, test_ratio, k_fold, x_column, y_column, spatial_column=None, temp_column=None, id_column=None, sample_seed=100, process_fn='minmax_scale', batch_size=32, shuffle=True, use_class=<class 'gnnwr.datasets.baseDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_val_size=-1, max_test_size=-1, is_need_STNN=False, Reference=None, simple_distance=True)[source]
initialize dataset for cross validation
- Parameters:
data – input data
test_ratio – test set ratio
k_fold – k of k-fold
x_column – attribute column name
y_column – label column name
spatial_column – spatial distance column name
temp_column – temporal distance column name
id_column – id column name
sample_seed – random seed
process_fn – data process function
batch_size – batch size
shuffle – shuffle or not
use_class – dataset class
spatial_fun – spatial distance calculate function
temporal_fun – temporal distance calculate function
max_val_size – validation set size
max_test_size – test set size
is_need_STNN – whether need STNN
Reference – reference data
simple_distance – is simple distance
- Returns:
cv_data_set, test_dataset
- gnnwr.datasets.init_predict_dataset(data, train_dataset, x_column, spatial_column=None, temp_column=None, process_fn='minmax_scale', scale_sync=True, use_class=<class 'gnnwr.datasets.predictDataset'>, spatial_fun=<function BasicDistance>, temporal_fun=<function Manhattan_distance>, max_size=-1, is_need_STNN=False)[source]
initialize predict dataset
- Parameters:
data – input data
train_dataset – train data
x_column – attribute column name
spatial_column – spatial distance column name
temp_column – temporal distance column name
process_fn – data process function
scale_sync – scale sync or not
max_size – max size of predict dataset
use_class – dataset class
spatial_fun – spatial distance calculate function
temporal_fun – temporal distance calculate function
is_need_STNN – is need STNN or not
- Returns:
predict_dataset
- class gnnwr.datasets.predictDataset(data, x_column, process_fn='minmax_scale', scale_info=[], is_need_STNN=False)[source]
Bases:
DatasetPredict dataset is used to predict the dependent variable of the data.
- Parameters:
data – dataframe
x_column – independent variable column name
process_fn – process function name
scale_info – process function parameters
is_need_STNN – whether need STNN
- minmax_scaler(x, min=[], max=[])[source]
function of minmax scaler
- Parameters:
x – Input attribute data
min – minimum value of each attribute
max – maximum value of each attribute
- Returns:
Output attribute data