DataFrameComparison.summary#

DataFrameComparison.summary( show_perfect_column_matches: bool = True, top_k_column_changes: int = 0, sample_k_rows_only: int = 0, show_sample_primary_key_per_change: bool = False, left_name: str = Side.LEFT, right_name: str = Side.RIGHT, slim: bool = False, hidden_columns: list[str] | None = None, metrics: Mapping[str, Metric] | None = None, ) → Summary[source]#

Generate a summary of all aspects of the comparison.

Parameters:

show_perfect_column_matches – Whether to include column matches in the summary even if the column match rate is 100%. Setting this to False is useful when comparing very wide data frames.
top_k_column_changes – The maximum number of column values changes to display for columns with a match rate below 100% in the summary. When enabling this feature, make sure that no sensitive data is leaked.
sample_k_rows_only – The number of rows to show in the “Rows left/right only” section of the summary. If 0 (default), no rows are shown. Only the primary key will be printed. An error will be raised if a positive number is provided and any of the primary key columns is also in hidden_columns.
show_sample_primary_key_per_change – Whether to show a sample primary key per column change in the summary. If False (default), no primary key values are shown. A sample primary key can only be shown if top_k_column_changes is greater than 0, as each sample primary key is linked to a specific column change. An error will be raised if True and any of the primary key columns is also in hidden_columns.”
left_name – Custom display name for the left data frame.
right_name – Custom display name for the right data frame.
slim – Whether to generate a slim summary. In slim mode, the summary is as concise as possible, only showing sections that contain differences. As the structure of the summary can vary, it should only be used by advanced users who are familiar with the summary format.
hidden_columns – Columns for which no values are printed, e.g. because they contain sensitive information.
metrics – Optional mapping from display label to a metric callable (left_expr, right_expr) -> pl.Expr. Each callable receives two polars.Expr referring to the left and right values of a single numerical column across all joined rows, and must return a scalar aggregation expression. See diffly.metrics for presets (mean, median, mean_absolute_deviation etc.). When None (default), no metrics are computed; presets are not applied automatically. Metrics are only computed for numerical columns. Prefer short labels — the summary has a fixed width and many or long labels degrade rendering.

Returns:

A summary which can be printed or written to a file.

Examples

>>> import polars as pl
>>> from diffly import compare_frames
>>> left = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "b", "c"]})
>>> right = pl.DataFrame({"id": [1, 2, 3], "status": ["a", "x", "x"]})
>>> comparison = compare_frames(left, right, primary_key="id")
>>> print(comparison.summary())