mixed#
generalized_gower_dist_matrix#
Calculates the Generalized Gower matrix for a data matrix.
Parameters (inputs)
----------
X: a pandas/polars data-frame or a numpy array. Represents a data matrix.
p1, p2, p3: number of quantitative, binary and multi-class variables in the considered data matrix, respectively. Must be a non negative integer.
d1: name of the distance to be computed for quantitative variables. Must be an string in ['euclidean', 'minkowski', 'canberra', 'mahalanobis', 'robust_mahalanobis'].
d2: name of the distance to be computed for binary variables. Must be an string in ['sokal', 'jaccard'].
d3: name of the distance to be computed for multi-class variables. Must be an string in ['hamming'].
q: the parameter that defines the Minkowski distance. Must be a positive integer.
robust_method: the robust_method to be used for computing the robust covariance matrix. Only needed when d1 = 'robust_mahalanobis'.
alpha : a real number in [0,1] that is used if `robust_method` is 'trimmed' or 'winsorized'. Only needed when d1 = 'robust_mahalanobis'.
epsilon : parameter used by the Delvin transformation. epsilon=0.05 is recommended. Only needed when d1 = 'robust_mahalanobis'.
n_iter : maximum number of iterations run by the Delvin algorithm. Only needed when d1 = 'robust_mahalanobis'.
weights: the sample weights. Only used if provided and d1 = 'robust_mahalanobis'.
Returns (outputs)
-------
D: the Generalized Gower matrix for the data matrix `X`.
generalized_gower_dist#
Calculates the Generalized Gower distance between a pair of mixed data vectors.
Parameters (inputs)
----------
xi, xr: 1D array-like. They represent a couple of statistical observations (mixed data vectors).
p1, p2, p3: number of quantitative, binary and multi-class variables in the considered data vectors, respectively. Must be a non negative integer.
d1: name of the distance to be computed for quantitative variables. Must be an string in ['euclidean', 'minkowski', 'canberra', 'mahalanobis', 'robust_mahalanobis'].
d2: name of the distance to be computed for binary variables. Must be an string in ['sokal', 'jaccard'].
d3: name of the distance to be computed for multi-class variables. Must be an string in ['hamming'].
q: the parameter that defines the Minkowski distance. Must be a positive integer.
S: the covariance matrix (standard or robust) to be used. Only needed when d1 is 'mahalanobis' or 'robust_mahalanobis'.
geom_var_1, geom_var_2, geom_var_3: geometric variability of the quantitative, binary, and multi-class distances, respectively. Used to standardize the squared distances.
Returns (outputs)
-------
dist: the Generalized Gower distance between the observations `xi` and `xr`.