Contents
- Overview
- Installation
- CLI Commands
- Common Input Options
- db Subcommand (SQLite)
- xlsx Subcommand (Excel)
- Python API
- ingest_excel_to_sqlite()
- ingest_excel_to_excel()
- Utility Functions
- Fill Modes
- Hierarchical (default)
- Independent (pandas-style)
- Behavior Details
- SQLite Output Details
- Excel Output Details
- Compatibility
Overview
xlfilldown is a Python library and CLI tool that streams an .xlsx (OOXML) sheet into SQLite or a new Excel sheet with forward-fill padding. It preserves original Excel row numbers and computes a stable SHA-256 row hash for audit purposes.
The tool ingests only columns with non-empty headers, stores all values as TEXT strings (numbers and dates are canonicalized to stable text), and streams rows in constant memory using openpyxl's read-only mode.
Key concept: "Fill-down" (or forward-fill) means populating empty cells with the last non-empty value from the same column. This is essential when working with grouped .xlsx exports where parent values appear only once per group.
Installation
From PyPI (recommended)
# Using pip $ pip install xlfilldown # Using pipx (isolated environment) $ pipx install xlfilldown
Requires Python ≥ 3.9. The only dependency is openpyxl ≥ 3.1.0.
From source
$ git clone https://github.com/RexBytes/xlfilldown.git $ cd xlfilldown $ pip install .
CLI Commands
The xlfilldown command has two subcommands sharing the same input options:
xlfilldown db- write to a SQLite databasexlfilldown xlsx- write to a new .xlsx file
Common Input Options
These options apply to both db and xlsx subcommands.
| Option | Required | Default | Description |
|---|---|---|---|
--infile | Yes | - | Path to the input .xlsx file. |
--insheet | Yes | - | Sheet name to read from the input file. |
--header-row | Yes | - | 1-based row number containing column headers. |
--fill-cols | No | - | JSON array of header names to forward-fill. Example: '["Region","Country"]'. Mutually exclusive with --fill-cols-letters. |
--fill-cols-letters | No | - | Excel column letters to forward-fill, space-separated. Example: A C AE. Resolved to header names via --header-row. Mutually exclusive with --fill-cols. |
--fill-mode | No | hierarchical | hierarchical (higher-tier changes reset lower-tier carries) or independent (pandas-style ffill). |
--ingest-mode | No | fill | fill applies forward-fill. raw skips fill-down entirely. |
--drop-blank-rows | No | False | Drop rows where all fill columns are empty after filling. |
--require-non-null | No | - | JSON array of headers. Drop row if any are null/blank after fill. |
--require-non-null-letters | No | - | Excel column letters for require-non-null. Merged with --require-non-null. |
--row-hash | No | False | Include a row_hash column (SHA-256 hex digest over all columns after filling). |
--excel-row-numbers | No | False | Include an excel_row column with the original 1-based row number. |
--if-exists | No | fail | fail, replace, or append. |
db Subcommand (SQLite Output)
| Option | Required | Default | Description |
|---|---|---|---|
--db | Yes | - | SQLite database file path (created if missing). |
--table | No | Derived from sheet | SQLite table name. |
--batch-size | No | 1000 | Rows per executemany() batch. |
Example: Hierarchical fill to SQLite
$ xlfilldown db \ --infile sales_report.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --fill-cols '["Region","Country","City"]' \ --db sales.db \ --row-hash \ --excel-row-numbers
Example: Fill by column letters
$ xlfilldown db \ --infile data.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --fill-cols-letters A C AE \ --db out.db
Example: Append with row filtering
$ xlfilldown db \ --infile report.xlsx \ --insheet "Data" \ --header-row 2 \ --fill-cols '["Category","Subcategory"]' \ --drop-blank-rows \ --require-non-null '["Category"]' \ --db warehouse.db \ --table products \ --if-exists append
Example: Raw ingest (no fill-down)
$ xlfilldown db \ --infile data.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --ingest-mode raw \ --db out.db \ --row-hash \ --excel-row-numbers \ --if-exists replace
xlsx Subcommand (Excel Output)
| Option | Required | Default | Description |
|---|---|---|---|
--outfile | Yes | - | Output .xlsx file path. |
--outsheet | No | Derived from input | Output sheet name. |
Example: Fill by header names to Excel
$ xlfilldown xlsx \ --infile data.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --fill-cols '["Region","Country","City"]' \ --outfile filled.xlsx \ --outsheet Processed
Example: Fill by column letters to Excel
$ xlfilldown xlsx \ --infile data.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --fill-cols-letters A D \ --outfile out.xlsx
Example: Raw ingest to Excel
$ xlfilldown xlsx \ --infile data.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --ingest-mode raw \ --outfile raw_copy.xlsx \ --outsheet RawSheet \ --row-hash \ --excel-row-numbers \ --if-exists replace
Python API
Import from xlfilldown.api:
from xlfilldown.api import ingest_excel_to_sqlite, ingest_excel_to_excel
ingest_excel_to_sqlite()
Stream an .xlsx sheet into a SQLite database table with optional forward-fill, row hashing, and row number preservation.
Signature
def ingest_excel_to_sqlite( file: str | Path, # Input .xlsx file path sheet: str, # Sheet name to read header_row: int, # 1-based header row fill_cols: Optional[Sequence[str]] = None, # Fill by header names fill_cols_letters: Optional[Sequence[str]] = None, # Fill by column letters fill_mode: Optional[str] = None, # "hierarchical" | "independent" ingest_mode: str = "fill", # "fill" | "raw" drop_blank_rows: bool = False, require_non_null: Optional[Sequence[str]] = None, require_non_null_letters: Optional[Sequence[str]] = None, row_hash: bool = False, excel_row_numbers: bool = False, db: str | Path, # SQLite database file table: Optional[str] = None, # Table name (default: sheet name) batch_size: int = 1000, if_exists: str = "fail", # "fail" | "replace" | "append" ) -> dict
Return value
| Key | Type | Description |
|---|---|---|
table | str | SQLite table name used. |
columns | list | List of column names in the table. |
rows_ingested | int | Number of rows written. |
row_hash | bool | Whether row hashing was enabled. |
excel_row_numbers | bool | Whether Excel row numbers were included. |
Example
from xlfilldown.api import ingest_excel_to_sqlite summary = ingest_excel_to_sqlite( file="sales_report.xlsx", sheet="Sheet1", header_row=1, fill_cols=["Region", "Country", "City"], fill_mode="hierarchical", db="sales.db", row_hash=True, excel_row_numbers=True, drop_blank_rows=True, if_exists="replace", ) print(f"Table: {summary['table']}") print(f"Rows: {summary['rows_ingested']}")
ingest_excel_to_excel()
Stream an .xlsx sheet into a new Excel sheet with optional forward-fill, row hashing, and row number preservation.
Signature
def ingest_excel_to_excel( file: str | Path, sheet: str, header_row: int, fill_cols: Optional[Sequence[str]] = None, fill_cols_letters: Optional[Sequence[str]] = None, fill_mode: Optional[str] = None, ingest_mode: str = "fill", drop_blank_rows: bool = False, require_non_null: Optional[Sequence[str]] = None, require_non_null_letters: Optional[Sequence[str]] = None, row_hash: bool = False, excel_row_numbers: bool = False, outfile: str | Path, outsheet: Optional[str] = None, if_exists: str = "fail", ) -> dict
Return value
| Key | Type | Description |
|---|---|---|
workbook | str | Output workbook file path. |
sheet | str | Output sheet name. |
columns | list | List of column names written. |
rows_written | int | Number of rows written. |
row_hash | bool | Whether row hashing was enabled. |
excel_row_numbers | bool | Whether Excel row numbers were included. |
Example
from xlfilldown.api import ingest_excel_to_excel summary = ingest_excel_to_excel( file="data.xlsx", sheet="Sheet1", header_row=1, fill_cols=["Region", "Country"], fill_mode="independent", outfile="filled.xlsx", outsheet="Processed", row_hash=True, excel_row_numbers=True, if_exists="replace", ) print(f"Wrote {summary['rows_written']} rows to {summary['workbook']}")
Utility Functions
Re-exported from xlfilldown.api for advanced use:
| Function | Description |
|---|---|
normalize_headers(cells) | Normalize header cell values: trim whitespace, coerce None/blank/"nan" to empty strings. Returns a list of strings. |
canon_list(values) | Canonicalize a sequence of values to a JSON-format string. Used internally for hash computation. |
sha256_hex(s) | Compute the SHA-256 hex digest of a string. |
qident(name) | Quote a SQLite identifier (table or column name) safely. |
Fill Modes
Hierarchical (default)
In hierarchical mode, the order of columns matters. The first column is the highest tier. When a higher-tier column receives a new value, all lower-tier carries reset.
Think of it as: Region > Country > City. When a new Region appears, Country and City reset. When a new Country appears, only City resets.
Input:
Region Country City
EMEA UK London
Manchester
Germany Berlin
Munich
APAC Japan Tokyo
Hierarchical output with --fill-cols '["Region","Country","City"]':
Region Country City EMEA UK London EMEA UK Manchester # Region and Country filled from above EMEA Germany Berlin # Country changed; City has own value EMEA Germany Munich # Both filled APAC Japan Tokyo # Region changed; lower tiers reset
Important: "Reset" means the carry is cleared for that row. If the row has its own value for a lower-tier column (like "Berlin" above), that value is kept. The reset only affects empty cells that would otherwise carry a stale value.
Independent (pandas-style)
Each column fills forward on its own. Column order does not matter. Columns do not reset each other.
Same input, with --fill-mode independent:
Region Country City EMEA UK London EMEA UK Manchester # UK carries independently EMEA Germany Berlin EMEA Germany Munich APAC Japan Tokyo
Behavior Details
Headers
- Only columns with non-empty headers on
--header-roware ingested. - Normalization: whitespace trimmed, case preserved,
"nan"string converted to empty. - Empty or duplicate headers after normalization are rejected with a clear error.
- When using
--fill-cols-letters, each referenced column must have a non-empty header.
Cell treatment
- All values stored as TEXT (including numbers, dates,
excel_row). - Whitespace-only cells treated as blank.
- None/blank/whitespace normalize to NULL (empty).
- Numbers canonicalized to stable text:
1and1.0both become"1". No scientific notation. - Dates converted to ISO format.
- Strings stripped of leading/trailing whitespace.
Row hash
- SHA-256 hex digest over all ingested columns (in header order) after filling.
- For completely empty rows, the hash reflects all-empty values.
- Numbers normalized before hashing (same canonicalization as storage).
- In SQLite mode, a non-unique index is created on
row_hash.
Row filtering
--drop-blank-rows: drops rows where all fill columns are blank after filling.--require-non-null: drops rows where any specified headers are blank after filling.- Both can be combined. Filters run after fill-down logic.
Completely empty rows
- Rows where all columns are blank are treated as spacer rows.
- No fill-down is applied to them, but the carry persists past them.
- Dropped if
--drop-blank-rowsis enabled.
Raw ingest mode
Set --ingest-mode raw to skip fill-down entirely.
--fill-cols/--fill-cols-lettersignored (no error raised).- Row filtering (
--drop-blank-rows,--require-non-null) still applies. - Audit columns (
--row-hash,--excel-row-numbers) still work. - Useful for raw copies with audit columns or baseline comparisons.
SQLite Output Details
Schema
- All columns are TEXT type.
- Column order:
[row_hash] [excel_row] headers... - Indexes created automatically on
excel_rowandrow_hashwhen enabled.
Append mode
- Existing table schema must exactly match the expected column order.
- Validated using
PRAGMA table_info()before inserting.
Workbook reading
- Input opened with
read_only=True, data_only=True. - Formulas use cached values (last computed result).
Excel Output Details
Sheet-level --if-exists
| Value | Behavior |
|---|---|
fail | Error if the target sheet already exists. |
replace | Recreate the target sheet fresh. |
append | Append below existing rows. Destination header must match expected list. |
Optimization
- Uses
write_only=Truefor new workbooks (streaming writes). - Regular write mode for existing workbooks (to support append/replace).
Compatibility
| Requirement | Version | Notes |
|---|---|---|
| Python | ≥ 3.9 | Tested on 3.9 through 3.13 |
| openpyxl | ≥ 3.1.0 | Only runtime dependency |
| Input format | .xlsx (OOXML) | Does not support .xls (legacy binary) |
| Output (DB) | SQLite 3 | Python built-in sqlite3 module |
| Output (Excel) | .xlsx (OOXML) | Via openpyxl |
| OS | Any | Windows, macOS, Linux |
| License | MIT | Free for commercial and personal use |