pd_marginals_to_np#

caf.distribute.iterative_proportional_fitting.pd_marginals_to_np(target_marginals, dimension_order, valid_dimension_combos, allow_sparse=False, sparse_value_maps=None)#

Convert pandas marginals to numpy format for ipf().

Parameters:
  • target_marginals (list[Series]) – A list of the aggregates to adjust seed_df towards. Aggregates are the target values to aim for when aggregating across one or several other axis. Each item should be a pandas.Series where the index names relate to the dimensions to control seed_df to. The index names relate to target_dimensions in ipf(). See there for more information

  • dimension_order (dict[str, list[Any]]) – A dictionary of {col_name: col_values} pairs. dimension_cols.keys() MUST return a list of keys in the same order as the seed matrix for this function to be accurate. dimension_cols.keys() is defined by the order the keys are added to a dictionary. col_values MUST be in the same order as the values in the dimension they refer to. The values are used to ensure the returned marginals are in the correct order.

  • valid_dimension_combos (DataFrame) – A dataframe defining all the valid combinations of each of the dimension values. This should be taken from the seed matrix. Used internally to ensure the generated numpy marginals are valid with no unexpected missing combinations.

  • allow_sparse (bool) – Whether to allow the resultant marginals to become sparse.COO matrices. Usually used when the corresponding seed matrix is also sparse. If set to False, then the resultant marginals will not be allowed to be sparse and MUST be dense numpy matrices.

  • sparse_value_maps (dict[Any, dict[Any, int]] | None) – A nested dictionary of {col_name: {col_val: coordinate_value}} where col_name is the name of the column in df, col_val is the value in col_name, and coordinate_value is the coordinate value to assign to that value in the sparse array.

Returns:

  • target_marginals – A list of the aggregates to adjust matrix towards. Aggregates are the target values to aim for when aggregating across one or several other axis. Directly corresponds to target_dimensions.

  • target_dimensions – A list of target dimensions for each aggregate. Each target dimension lists the axes that should be preserved when calculating the achieved aggregates for the corresponding target_marginals. Another way to look at this is a list of the numpy axis which should NOT be summed from mat when calculating the achieved marginals.

Raises:
  • ValueError: – If any of the marginal index names do not exist in the keys of dimension_order.

  • ValueError: – If the passed in marginals do not contain all the valid combinations of dimension_cols, as defined in valid_dimension_combos

Return type:

tuple[list[ndarray] | list[COO], list[list[int]]]