ipf#

caf.distribute.iterative_proportional_fitting.ipf(seed_mat: ndarray, target_marginals: list[ndarray], target_dimensions: list[list[int]], convergence_fn: Callable[[Collection[ndarray], Collection[ndarray]], float] | None = None, max_iterations: int = 5000, tol: float = 1e-09, min_tol_rate: float = 1e-09, use_sparse: bool = False, show_pbar: bool = False, pbar_kwargs: dict[str, Any] | None = None) tuple[ndarray, int, float]#
caf.distribute.iterative_proportional_fitting.ipf(seed_mat: COO, target_marginals: list[ndarray | COO], target_dimensions: list[list[int]], convergence_fn: Callable[[Collection[ndarray], Collection[ndarray]], float] | None = None, max_iterations: int = 5000, tol: float = 1e-09, min_tol_rate: float = 1e-09, use_sparse: bool = False, show_pbar: bool = False, pbar_kwargs: dict[str, Any] | None = None) tuple[COO, int, float]

Adjust a matrix iteratively towards targets until convergence met.

https://en.wikipedia.org/wiki/Iterative_proportional_fitting

Parameters:
  • seed_mat – The starting matrix that should be adjusted.

  • target_marginals – A list of the aggregates to adjust matrix towards. Aggregates are the target values to aim for when aggregating across one or several other axis. Directly corresponds to target_dimensions.

  • target_dimensions – A list of target dimensions for each aggregate. Each target dimension lists the axes that should be preserved when calculating the achieved aggregates for the corresponding target_marginals. Another way to look at this is a list of the numpy axis which should NOT be summed from mat when calculating the achieved marginals.

  • convergence_fn – The function that should be called to calculate the convergence of mat after all target_marginals adjustments have been made. If a callable is given it must take the form: fn(targets: list[np.ndarray], achieved: list[np.ndarray])

  • max_iterations – The maximum number of iterations to complete before exiting

  • tol – The target convergence to achieve before exiting early. This is one condition which allows exiting before max_iterations is reached. The convergence is calculated via convergence_fn.

  • min_tol_rate – The minimum value that the convergence can change by between iterations before exiting early. This is one condition which allows exiting before max_iterations is reached. The convergence is calculated via convergence_fn.

  • use_sparse – Whether to use sparse matrices when doing the ipf calculations. This is useful then the given numpy array is very sparse. If a sparse.COO matrix is given, this argument is ignored.

  • show_pbar – Whether to show a progress bar of the current ipf iterations or not. If pbar_kwargs is not None, then this argument is ignored. If pbar_kwargs is set, and you want to disable it, add {“disable”: True} as a kwarg.

  • pbar_kwargs – A dictionary of keyword arguments to pass into a progress bar. This dictionary is passed into tqdm.tqdm(**pbar_kwargs) when building the progress bar.

Returns:

  • fit_matrix – The final fit matrix.

  • completed_iterations – The number of completed iterations before exiting.

  • achieved_convergence – The final achieved convergence - achieved by fit_matrix

Raises:

ValueError: – If any of the marginals or dimensions are not valid when passed in.