ipf_dataframe#

caf.distribute.iterative_proportional_fitting.ipf_dataframe(seed_df, target_marginals, value_col, drop_zeros_return=False, force_sparse=False, **kwargs)#

Adjust a matrix iteratively towards targets until convergence met.

This is a pandas wrapper of ipf https://en.wikipedia.org/wiki/Iterative_proportional_fitting

Parameters:
  • seed_df (DataFrame) – The starting pandas.DataFrame that should be adjusted.

  • target_marginals (list[Series]) – A list of the aggregates to adjust seed_df towards. Aggregates are the target values to aim for when aggregating across one or several other axis. Each item should be a pandas.Series where the index names relate to the dimensions to control seed_df to. The index names relate to target_dimensions in ipf(). See there for more information

  • value_col (str) – The column in seed_df that refers to the data. All other columns will be assumed to be dimensional columns.

  • drop_zeros_return (bool) – Whether to drop any rows of the dataframe that contain 0 values on return or now. If False, the return dataframe will be in the same order as seed_df. That is, the return will be exactly the same as seed_df except in the value_col column.

  • force_sparse (bool) – Whether to force the dataframe into a sparse array without first checking if the dense array would fit into memory.

  • **kwargs – Any other arguments to pass to iterative_proportional_fitting.ipf()

Returns:

  • fit_df – The final fit matrix, converted back to a DataFrame.

  • completed_iterations – The number of completed iterations before exiting.

  • achieved_convergence – The final achieved convergence - achieved by fit_matrix

Raises:

ValueError: – If any of the marginals or dimensions are not valid when passed in.

Return type:

tuple[DataFrame, int, float]

See also

iterative_proportional_fitting.ipf()