Reference Guide

Overview
Installation
CLI Commands
Common Input Options
db Subcommand (SQLite)
xlsx Subcommand (Excel)
Python API
ingest_excel_to_sqlite()
ingest_excel_to_excel()
Utility Functions
Fill Modes
Hierarchical (default)
Independent (pandas-style)
Behavior Details
SQLite Output Details
Excel Output Details
Compatibility

Overview

xlfilldown is a Python library and CLI tool that streams an .xlsx (OOXML) sheet into SQLite or a new Excel sheet with forward-fill padding. It preserves original Excel row numbers and computes a stable SHA-256 row hash for audit purposes.

The tool ingests only columns with non-empty headers, stores all values as TEXT strings (numbers and dates are canonicalized to stable text), and streams rows in constant memory using openpyxl's read-only mode.

Key concept: "Fill-down" (or forward-fill) means populating empty cells with the last non-empty value from the same column. This is essential when working with grouped .xlsx exports where parent values appear only once per group.

Installation

From PyPI (recommended)

# Using pip
$ pip install xlfilldown

# Using pipx (isolated environment)
$ pipx install xlfilldown

Requires Python ≥ 3.9. The only dependency is openpyxl ≥ 3.1.0.

From source

$ git clone https://github.com/RexBytes/xlfilldown.git
$ cd xlfilldown
$ pip install .

CLI Commands

The xlfilldown command has two subcommands sharing the same input options:

xlfilldown db - write to a SQLite database
xlfilldown xlsx - write to a new .xlsx file

Common Input Options

These options apply to both db and xlsx subcommands.

Option	Required	Default	Description
`--infile`	Yes	-	Path to the input .xlsx file.
`--insheet`	Yes	-	Sheet name to read from the input file.
`--header-row`	Yes	-	1-based row number containing column headers.
`--fill-cols`	No	-	JSON array of header names to forward-fill. Example: `'["Region","Country"]'`. Mutually exclusive with `--fill-cols-letters`.
`--fill-cols-letters`	No	-	Excel column letters to forward-fill, space-separated. Example: `A C AE`. Resolved to header names via `--header-row`. Mutually exclusive with `--fill-cols`.
`--fill-mode`	No	`hierarchical`	`hierarchical` (higher-tier changes reset lower-tier carries) or `independent` (pandas-style ffill).
`--ingest-mode`	No	`fill`	`fill` applies forward-fill. `raw` skips fill-down entirely.
`--drop-blank-rows`	No	`False`	Drop rows where all fill columns are empty after filling.
`--require-non-null`	No	-	JSON array of headers. Drop row if any are null/blank after fill.
`--require-non-null-letters`	No	-	Excel column letters for require-non-null. Merged with `--require-non-null`.
`--row-hash`	No	`False`	Include a `row_hash` column (SHA-256 hex digest over all columns after filling).
`--excel-row-numbers`	No	`False`	Include an `excel_row` column with the original 1-based row number.
`--if-exists`	No	`fail`	`fail`, `replace`, or `append`.

db Subcommand (SQLite Output)

Option	Required	Default	Description
`--db`	Yes	-	SQLite database file path (created if missing).
`--table`	No	Derived from sheet	SQLite table name.
`--batch-size`	No	`1000`	Rows per `executemany()` batch.

Example: Hierarchical fill to SQLite

$ xlfilldown db \
    --infile sales_report.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols '["Region","Country","City"]' \
    --db sales.db \
    --row-hash \
    --excel-row-numbers

Example: Fill by column letters

$ xlfilldown db \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols-letters A C AE \
    --db out.db

Example: Append with row filtering

$ xlfilldown db \
    --infile report.xlsx \
    --insheet "Data" \
    --header-row 2 \
    --fill-cols '["Category","Subcategory"]' \
    --drop-blank-rows \
    --require-non-null '["Category"]' \
    --db warehouse.db \
    --table products \
    --if-exists append

Example: Raw ingest (no fill-down)

$ xlfilldown db \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --ingest-mode raw \
    --db out.db \
    --row-hash \
    --excel-row-numbers \
    --if-exists replace

xlsx Subcommand (Excel Output)

Option	Required	Default	Description
`--outfile`	Yes	-	Output .xlsx file path.
`--outsheet`	No	Derived from input	Output sheet name.

Example: Fill by header names to Excel

$ xlfilldown xlsx \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols '["Region","Country","City"]' \
    --outfile filled.xlsx \
    --outsheet Processed

Example: Fill by column letters to Excel

$ xlfilldown xlsx \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols-letters A D \
    --outfile out.xlsx

Example: Raw ingest to Excel

$ xlfilldown xlsx \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --ingest-mode raw \
    --outfile raw_copy.xlsx \
    --outsheet RawSheet \
    --row-hash \
    --excel-row-numbers \
    --if-exists replace

Python API

Import from xlfilldown.api:

from xlfilldown.api import ingest_excel_to_sqlite, ingest_excel_to_excel

ingest_excel_to_sqlite()

Stream an .xlsx sheet into a SQLite database table with optional forward-fill, row hashing, and row number preservation.

Signature

def ingest_excel_to_sqlite(
    file: str | Path,              # Input .xlsx file path
    sheet: str,                     # Sheet name to read
    header_row: int,                # 1-based header row
    fill_cols: Optional[Sequence[str]] = None,         # Fill by header names
    fill_cols_letters: Optional[Sequence[str]] = None,  # Fill by column letters
    fill_mode: Optional[str] = None,       # "hierarchical" | "independent"
    ingest_mode: str = "fill",            # "fill" | "raw"
    drop_blank_rows: bool = False,
    require_non_null: Optional[Sequence[str]] = None,
    require_non_null_letters: Optional[Sequence[str]] = None,
    row_hash: bool = False,
    excel_row_numbers: bool = False,
    db: str | Path,                  # SQLite database file
    table: Optional[str] = None,    # Table name (default: sheet name)
    batch_size: int = 1000,
    if_exists: str = "fail",         # "fail" | "replace" | "append"
) -> dict

Return value

Key	Type	Description
`table`	str	SQLite table name used.
`columns`	list	List of column names in the table.
`rows_ingested`	int	Number of rows written.
`row_hash`	bool	Whether row hashing was enabled.
`excel_row_numbers`	bool	Whether Excel row numbers were included.

Example

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="sales_report.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country", "City"],
    fill_mode="hierarchical",
    db="sales.db",
    row_hash=True,
    excel_row_numbers=True,
    drop_blank_rows=True,
    if_exists="replace",
)

print(f"Table: {summary['table']}")
print(f"Rows: {summary['rows_ingested']}")

ingest_excel_to_excel()

Stream an .xlsx sheet into a new Excel sheet with optional forward-fill, row hashing, and row number preservation.

Signature

def ingest_excel_to_excel(
    file: str | Path,
    sheet: str,
    header_row: int,
    fill_cols: Optional[Sequence[str]] = None,
    fill_cols_letters: Optional[Sequence[str]] = None,
    fill_mode: Optional[str] = None,
    ingest_mode: str = "fill",
    drop_blank_rows: bool = False,
    require_non_null: Optional[Sequence[str]] = None,
    require_non_null_letters: Optional[Sequence[str]] = None,
    row_hash: bool = False,
    excel_row_numbers: bool = False,
    outfile: str | Path,
    outsheet: Optional[str] = None,
    if_exists: str = "fail",
) -> dict

Return value

Key	Type	Description
`workbook`	str	Output workbook file path.
`sheet`	str	Output sheet name.
`columns`	list	List of column names written.
`rows_written`	int	Number of rows written.
`row_hash`	bool	Whether row hashing was enabled.
`excel_row_numbers`	bool	Whether Excel row numbers were included.

Example

from xlfilldown.api import ingest_excel_to_excel

summary = ingest_excel_to_excel(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country"],
    fill_mode="independent",
    outfile="filled.xlsx",
    outsheet="Processed",
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
)

print(f"Wrote {summary['rows_written']} rows to {summary['workbook']}")

Utility Functions

Re-exported from xlfilldown.api for advanced use:

Function	Description
`normalize_headers(cells)`	Normalize header cell values: trim whitespace, coerce None/blank/"nan" to empty strings. Returns a list of strings.
`canon_list(values)`	Canonicalize a sequence of values to a JSON-format string. Used internally for hash computation.
`sha256_hex(s)`	Compute the SHA-256 hex digest of a string.
`qident(name)`	Quote a SQLite identifier (table or column name) safely.

Fill Modes

Hierarchical (default)

In hierarchical mode, the order of columns matters. The first column is the highest tier. When a higher-tier column receives a new value, all lower-tier carries reset.

Think of it as: Region > Country > City. When a new Region appears, Country and City reset. When a new Country appears, only City resets.

Input:

Region     Country    City
EMEA       UK         London
                      Manchester
           Germany    Berlin
                      Munich
APAC       Japan      Tokyo

Hierarchical output with --fill-cols '["Region","Country","City"]':

Region     Country    City
EMEA       UK         London
EMEA       UK         Manchester    # Region and Country filled from above
EMEA       Germany    Berlin        # Country changed; City has own value
EMEA       Germany    Munich        # Both filled
APAC       Japan      Tokyo         # Region changed; lower tiers reset

Important: "Reset" means the carry is cleared for that row. If the row has its own value for a lower-tier column (like "Berlin" above), that value is kept. The reset only affects empty cells that would otherwise carry a stale value.

Independent (pandas-style)

Each column fills forward on its own. Column order does not matter. Columns do not reset each other.

Same input, with --fill-mode independent:

Region     Country    City
EMEA       UK         London
EMEA       UK         Manchester    # UK carries independently
EMEA       Germany    Berlin
EMEA       Germany    Munich
APAC       Japan      Tokyo

Behavior Details

Headers

Only columns with non-empty headers on --header-row are ingested.
Normalization: whitespace trimmed, case preserved, "nan" string converted to empty.
Empty or duplicate headers after normalization are rejected with a clear error.
When using --fill-cols-letters, each referenced column must have a non-empty header.

Cell treatment

All values stored as TEXT (including numbers, dates, excel_row).
Whitespace-only cells treated as blank.
None/blank/whitespace normalize to NULL (empty).
Numbers canonicalized to stable text: 1 and 1.0 both become "1". No scientific notation.
Dates converted to ISO format.
Strings stripped of leading/trailing whitespace.

Row hash

SHA-256 hex digest over all ingested columns (in header order) after filling.
For completely empty rows, the hash reflects all-empty values.
Numbers normalized before hashing (same canonicalization as storage).
In SQLite mode, a non-unique index is created on row_hash.

Row filtering

--drop-blank-rows: drops rows where all fill columns are blank after filling.
--require-non-null: drops rows where any specified headers are blank after filling.
Both can be combined. Filters run after fill-down logic.

Completely empty rows

Rows where all columns are blank are treated as spacer rows.
No fill-down is applied to them, but the carry persists past them.
Dropped if --drop-blank-rows is enabled.

Raw ingest mode

Set --ingest-mode raw to skip fill-down entirely.

--fill-cols / --fill-cols-letters ignored (no error raised).
Row filtering (--drop-blank-rows, --require-non-null) still applies.
Audit columns (--row-hash, --excel-row-numbers) still work.
Useful for raw copies with audit columns or baseline comparisons.

SQLite Output Details

Schema

All columns are TEXT type.
Column order: [row_hash] [excel_row] headers...
Indexes created automatically on excel_row and row_hash when enabled.

Append mode

Existing table schema must exactly match the expected column order.
Validated using PRAGMA table_info() before inserting.

Workbook reading

Input opened with read_only=True, data_only=True.
Formulas use cached values (last computed result).

Excel Output Details

Sheet-level --if-exists

Value	Behavior
`fail`	Error if the target sheet already exists.
`replace`	Recreate the target sheet fresh.
`append`	Append below existing rows. Destination header must match expected list.

Optimization

Uses write_only=True for new workbooks (streaming writes).
Regular write mode for existing workbooks (to support append/replace).

Compatibility

Requirement	Version	Notes
Python	≥ 3.9	Tested on 3.9 through 3.13
openpyxl	≥ 3.1.0	Only runtime dependency
Input format	.xlsx (OOXML)	Does not support .xls (legacy binary)
Output (DB)	SQLite 3	Python built-in sqlite3 module
Output (Excel)	.xlsx (OOXML)	Via openpyxl
OS	Any	Windows, macOS, Linux
License	MIT	Free for commercial and personal use

Contents

Overview

Installation

From PyPI (recommended)

From source

CLI Commands

Common Input Options

db Subcommand (SQLite Output)

Example: Hierarchical fill to SQLite

Example: Fill by column letters

Example: Append with row filtering

Example: Raw ingest (no fill-down)

xlsx Subcommand (Excel Output)

Example: Fill by header names to Excel

Example: Fill by column letters to Excel

Example: Raw ingest to Excel

Python API

ingest_excel_to_sqlite()

Signature

Return value

Example

ingest_excel_to_excel()

Signature

Return value

Example

Utility Functions

Fill Modes

Hierarchical (default)

Independent (pandas-style)

Behavior Details

Headers

Cell treatment

Row hash

Row filtering

Completely empty rows

Raw ingest mode

SQLite Output Details

Schema

Append mode

Workbook reading

Excel Output Details

Sheet-level --if-exists

Optimization

Compatibility