Reference Guide

Complete documentation for the xlfilldown CLI and Python API.

Contents

Overview

xlfilldown is a Python library and CLI tool that streams an .xlsx (OOXML) sheet into SQLite or a new Excel sheet with forward-fill padding. It preserves original Excel row numbers and computes a stable SHA-256 row hash for audit purposes.

The tool ingests only columns with non-empty headers, stores all values as TEXT strings (numbers and dates are canonicalized to stable text), and streams rows in constant memory using openpyxl's read-only mode.

Key concept: "Fill-down" (or forward-fill) means populating empty cells with the last non-empty value from the same column. This is essential when working with grouped .xlsx exports where parent values appear only once per group.

Installation

From PyPI (recommended)

# Using pip
$ pip install xlfilldown

# Using pipx (isolated environment)
$ pipx install xlfilldown

Requires Python ≥ 3.9. The only dependency is openpyxl ≥ 3.1.0.

From source

$ git clone https://github.com/RexBytes/xlfilldown.git
$ cd xlfilldown
$ pip install .

CLI Commands

The xlfilldown command has two subcommands sharing the same input options:

Common Input Options

These options apply to both db and xlsx subcommands.

OptionRequiredDefaultDescription
--infileYes-Path to the input .xlsx file.
--insheetYes-Sheet name to read from the input file.
--header-rowYes-1-based row number containing column headers.
--fill-colsNo-JSON array of header names to forward-fill. Example: '["Region","Country"]'. Mutually exclusive with --fill-cols-letters.
--fill-cols-lettersNo-Excel column letters to forward-fill, space-separated. Example: A C AE. Resolved to header names via --header-row. Mutually exclusive with --fill-cols.
--fill-modeNohierarchicalhierarchical (higher-tier changes reset lower-tier carries) or independent (pandas-style ffill).
--ingest-modeNofillfill applies forward-fill. raw skips fill-down entirely.
--drop-blank-rowsNoFalseDrop rows where all fill columns are empty after filling.
--require-non-nullNo-JSON array of headers. Drop row if any are null/blank after fill.
--require-non-null-lettersNo-Excel column letters for require-non-null. Merged with --require-non-null.
--row-hashNoFalseInclude a row_hash column (SHA-256 hex digest over all columns after filling).
--excel-row-numbersNoFalseInclude an excel_row column with the original 1-based row number.
--if-existsNofailfail, replace, or append.

db Subcommand (SQLite Output)

OptionRequiredDefaultDescription
--dbYes-SQLite database file path (created if missing).
--tableNoDerived from sheetSQLite table name.
--batch-sizeNo1000Rows per executemany() batch.

Example: Hierarchical fill to SQLite

$ xlfilldown db \
    --infile sales_report.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols '["Region","Country","City"]' \
    --db sales.db \
    --row-hash \
    --excel-row-numbers

Example: Fill by column letters

$ xlfilldown db \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols-letters A C AE \
    --db out.db

Example: Append with row filtering

$ xlfilldown db \
    --infile report.xlsx \
    --insheet "Data" \
    --header-row 2 \
    --fill-cols '["Category","Subcategory"]' \
    --drop-blank-rows \
    --require-non-null '["Category"]' \
    --db warehouse.db \
    --table products \
    --if-exists append

Example: Raw ingest (no fill-down)

$ xlfilldown db \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --ingest-mode raw \
    --db out.db \
    --row-hash \
    --excel-row-numbers \
    --if-exists replace

xlsx Subcommand (Excel Output)

OptionRequiredDefaultDescription
--outfileYes-Output .xlsx file path.
--outsheetNoDerived from inputOutput sheet name.

Example: Fill by header names to Excel

$ xlfilldown xlsx \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols '["Region","Country","City"]' \
    --outfile filled.xlsx \
    --outsheet Processed

Example: Fill by column letters to Excel

$ xlfilldown xlsx \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols-letters A D \
    --outfile out.xlsx

Example: Raw ingest to Excel

$ xlfilldown xlsx \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --ingest-mode raw \
    --outfile raw_copy.xlsx \
    --outsheet RawSheet \
    --row-hash \
    --excel-row-numbers \
    --if-exists replace

Python API

Import from xlfilldown.api:

from xlfilldown.api import ingest_excel_to_sqlite, ingest_excel_to_excel

ingest_excel_to_sqlite()

Stream an .xlsx sheet into a SQLite database table with optional forward-fill, row hashing, and row number preservation.

Signature

def ingest_excel_to_sqlite(
    file: str | Path,              # Input .xlsx file path
    sheet: str,                     # Sheet name to read
    header_row: int,                # 1-based header row
    fill_cols: Optional[Sequence[str]] = None,         # Fill by header names
    fill_cols_letters: Optional[Sequence[str]] = None,  # Fill by column letters
    fill_mode: Optional[str] = None,       # "hierarchical" | "independent"
    ingest_mode: str = "fill",            # "fill" | "raw"
    drop_blank_rows: bool = False,
    require_non_null: Optional[Sequence[str]] = None,
    require_non_null_letters: Optional[Sequence[str]] = None,
    row_hash: bool = False,
    excel_row_numbers: bool = False,
    db: str | Path,                  # SQLite database file
    table: Optional[str] = None,    # Table name (default: sheet name)
    batch_size: int = 1000,
    if_exists: str = "fail",         # "fail" | "replace" | "append"
) -> dict

Return value

KeyTypeDescription
tablestrSQLite table name used.
columnslistList of column names in the table.
rows_ingestedintNumber of rows written.
row_hashboolWhether row hashing was enabled.
excel_row_numbersboolWhether Excel row numbers were included.

Example

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="sales_report.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country", "City"],
    fill_mode="hierarchical",
    db="sales.db",
    row_hash=True,
    excel_row_numbers=True,
    drop_blank_rows=True,
    if_exists="replace",
)

print(f"Table: {summary['table']}")
print(f"Rows: {summary['rows_ingested']}")

ingest_excel_to_excel()

Stream an .xlsx sheet into a new Excel sheet with optional forward-fill, row hashing, and row number preservation.

Signature

def ingest_excel_to_excel(
    file: str | Path,
    sheet: str,
    header_row: int,
    fill_cols: Optional[Sequence[str]] = None,
    fill_cols_letters: Optional[Sequence[str]] = None,
    fill_mode: Optional[str] = None,
    ingest_mode: str = "fill",
    drop_blank_rows: bool = False,
    require_non_null: Optional[Sequence[str]] = None,
    require_non_null_letters: Optional[Sequence[str]] = None,
    row_hash: bool = False,
    excel_row_numbers: bool = False,
    outfile: str | Path,
    outsheet: Optional[str] = None,
    if_exists: str = "fail",
) -> dict

Return value

KeyTypeDescription
workbookstrOutput workbook file path.
sheetstrOutput sheet name.
columnslistList of column names written.
rows_writtenintNumber of rows written.
row_hashboolWhether row hashing was enabled.
excel_row_numbersboolWhether Excel row numbers were included.

Example

from xlfilldown.api import ingest_excel_to_excel

summary = ingest_excel_to_excel(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country"],
    fill_mode="independent",
    outfile="filled.xlsx",
    outsheet="Processed",
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
)

print(f"Wrote {summary['rows_written']} rows to {summary['workbook']}")

Utility Functions

Re-exported from xlfilldown.api for advanced use:

FunctionDescription
normalize_headers(cells)Normalize header cell values: trim whitespace, coerce None/blank/"nan" to empty strings. Returns a list of strings.
canon_list(values)Canonicalize a sequence of values to a JSON-format string. Used internally for hash computation.
sha256_hex(s)Compute the SHA-256 hex digest of a string.
qident(name)Quote a SQLite identifier (table or column name) safely.

Fill Modes

Hierarchical (default)

In hierarchical mode, the order of columns matters. The first column is the highest tier. When a higher-tier column receives a new value, all lower-tier carries reset.

Think of it as: Region > Country > City. When a new Region appears, Country and City reset. When a new Country appears, only City resets.

Input:

Region     Country    City
EMEA       UK         London
                      Manchester
           Germany    Berlin
                      Munich
APAC       Japan      Tokyo

Hierarchical output with --fill-cols '["Region","Country","City"]':

Region     Country    City
EMEA       UK         London
EMEA       UK         Manchester    # Region and Country filled from above
EMEA       Germany    Berlin        # Country changed; City has own value
EMEA       Germany    Munich        # Both filled
APAC       Japan      Tokyo         # Region changed; lower tiers reset

Important: "Reset" means the carry is cleared for that row. If the row has its own value for a lower-tier column (like "Berlin" above), that value is kept. The reset only affects empty cells that would otherwise carry a stale value.

Independent (pandas-style)

Each column fills forward on its own. Column order does not matter. Columns do not reset each other.

Same input, with --fill-mode independent:

Region     Country    City
EMEA       UK         London
EMEA       UK         Manchester    # UK carries independently
EMEA       Germany    Berlin
EMEA       Germany    Munich
APAC       Japan      Tokyo

Behavior Details

Headers

Cell treatment

Row hash

Row filtering

Completely empty rows

Raw ingest mode

Set --ingest-mode raw to skip fill-down entirely.

SQLite Output Details

Schema

Append mode

Workbook reading

Excel Output Details

Sheet-level --if-exists

ValueBehavior
failError if the target sheet already exists.
replaceRecreate the target sheet fresh.
appendAppend below existing rows. Destination header must match expected list.

Optimization

Compatibility

RequirementVersionNotes
Python≥ 3.9Tested on 3.9 through 3.13
openpyxl≥ 3.1.0Only runtime dependency
Input format.xlsx (OOXML)Does not support .xls (legacy binary)
Output (DB)SQLite 3Python built-in sqlite3 module
Output (Excel).xlsx (OOXML)Via openpyxl
OSAnyWindows, macOS, Linux
LicenseMITFree for commercial and personal use