ipf_dataframe#
- caf.distribute.iterative_proportional_fitting.ipf_dataframe(seed_df, target_marginals, value_col, drop_zeros_return=False, force_sparse=False, **kwargs)#
Adjust a matrix iteratively towards targets until convergence met.
This is a pandas wrapper of ipf https://en.wikipedia.org/wiki/Iterative_proportional_fitting
- Parameters:
seed_df (DataFrame) – The starting pandas.DataFrame that should be adjusted.
target_marginals (list[Series]) – A list of the aggregates to adjust seed_df towards. Aggregates are the target values to aim for when aggregating across one or several other axis. Each item should be a pandas.Series where the index names relate to the dimensions to control seed_df to. The index names relate to target_dimensions in ipf(). See there for more information
value_col (str) – The column in seed_df that refers to the data. All other columns will be assumed to be dimensional columns.
drop_zeros_return (bool) – Whether to drop any rows of the dataframe that contain 0 values on return or now. If False, the return dataframe will be in the same order as seed_df. That is, the return will be exactly the same as seed_df except in the value_col column.
force_sparse (bool) – Whether to force the dataframe into a sparse array without first checking if the dense array would fit into memory.
**kwargs – Any other arguments to pass to iterative_proportional_fitting.ipf()
- Returns:
fit_df – The final fit matrix, converted back to a DataFrame.
completed_iterations – The number of completed iterations before exiting.
achieved_convergence – The final achieved convergence - achieved by fit_matrix
- Raises:
ValueError: – If any of the marginals or dimensions are not valid when passed in.
- Return type:
tuple[DataFrame, int, float]
See also
iterative_proportional_fitting.ipf()