xlfilldown

A Python library for filling down values in .xlsx spreadsheets (OOXML). Stream into SQLite or a new Excel sheet with forward-fill padding, preserved row numbers, and stable SHA-256 row hashes.

$ pip install xlfilldown

The Problem

Hierarchical .xlsx exports are everywhere - and they are a pain to work with programmatically.

📉

Merged and sparse data

Reporting tools export grouped data where "Region" only appears once for a block of rows. Every row below is blank until the next group. You need flat, filled rows for SQL queries, pandas, or dashboards.

🔧

Manual cleanup is fragile

Opening the file and dragging fill-down by hand works once - but breaks the moment a new export lands. You need something repeatable in a script or data pipeline.

🔄

Pandas overhead for simple ETL

Pulling in pandas just to forward-fill a few columns and write to SQLite is a heavy dependency. xlfilldown does the job with only openpyxl, streaming rows in constant memory.

🔍

No audit trail

When something looks wrong downstream, you need to trace a row back to the original spreadsheet. Without preserved row numbers and stable hashes, debugging is guesswork.

⚠️

Hierarchical resets

When "Region" changes, "Country" and "City" should reset - not carry stale values from the previous group. Most simple ffill solutions get this wrong.

📦

No lightweight CLI tool

You want to run a single command in CI/CD or cron to ingest an .xlsx into SQLite, with fill-down, hashing, and row numbers - without writing a custom script every time.

See It in Action

Watch xlfilldown transform sparse hierarchical data into clean, filled rows - then load it into a database.

Interactive Walkthrough

Step through the fill-down process to see how each stage transforms the data.

Step 1 of 4
sales_report.xlsx - Sheet1
ABCDE
1RegionCountryCityProductRevenue
2EMEAUKLondonWidget A12500
3ManchesterWidget B8300
4GermanyBerlinWidget A15200
5MunichWidget C9100
6APACJapanTokyoWidget B22400
7OsakaWidget A11800
8AustraliaSydneyWidget C17600
xlfilldown
sales_filled.xlsx - Processed
ABCDE
1RegionCountryCityProductRevenue
2EMEAUKLondonWidget A12500
3EMEAUKManchesterWidget B8300
4EMEAGermanyBerlinWidget A15200
5EMEAGermanyMunichWidget C9100
6APACJapanTokyoWidget B22400
7APACJapanOsakaWidget A11800
8APACAustraliaSydneyWidget C17600

One command. Clean data.

xlfilldown reads your .xlsx, forward-fills the sparse columns you specify, and writes to SQLite or a new Excel file. No pandas. No manual steps. Just pipe it into your workflow.

# Python API - two lines to go from messy .xlsx to clean SQLite
from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="report.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country"],
    db="output.db",
    row_hash=True,
    excel_row_numbers=True,
)
print(f"Ingested {summary['rows_ingested']} rows")

Who Is This For?

Anyone dealing with grouped .xlsx exports that need to become flat, queryable data.

Data Analyst

Analysts cleaning monthly reports

You get a grouped .xlsx export from finance, HR, or sales every week. You need flat rows for pivot tables, SQL queries, or BI tools. xlfilldown fills the gaps and gives you a clean sheet or SQLite database in one command.

Data Engineer

Engineers building data pipelines

You need a lightweight, scriptable step in your ETL pipeline to normalize .xlsx inputs before loading into a warehouse. xlfilldown streams rows in constant memory, adds row hashes for deduplication, and needs only openpyxl as a dependency.

DevOps / Automation

Automation in CI/CD or cron

Drop a single CLI command into your pipeline. No Python script to maintain - just xlfilldown db with the right flags. Schedule it with cron, run it in a container, or trigger it from Airflow.

Local User

Running locally on your laptop

Install with pip or pipx, run the command, open the output. No server needed, no cloud uploads. Your data stays on your machine. Perfect for one-off cleanup or quick ad-hoc analysis.

Python Developer

Embedding in your app

Use the Python API directly. Two functions - ingest_excel_to_sqlite() and ingest_excel_to_excel() - with clear return summaries. Build upload endpoints, batch processors, or data quality checks on top.

QA / Audit

Audit trails and data validation

Every row carries its original Excel row number and a SHA-256 hash. Trace any database record back to the source spreadsheet line. Detect duplicates. Verify data integrity after transformation.

Features

Everything you need to go from messy .xlsx to clean, queryable data.

⬇️

Hierarchical fill-down

When a parent column changes, child columns reset automatically. Region > Country > City stays correct across group boundaries.

🔁

Independent fill mode

Pandas-style ffill where each column carries forward on its own. No resets, no dependencies between columns.

💾

SQLite & Excel output

Write to a SQLite database or a new .xlsx file. Same input flags for both. Choose your destination, xlfilldown handles the rest.

🔒

SHA-256 row hashes

Stable, deterministic hashes over all columns after filling. Use for deduplication, change detection, and audit trails.

📋

Excel row numbers

Preserve the original 1-based row number from the source file. Trace any output record back to the exact spreadsheet line.

Streaming & constant memory

Uses openpyxl read-only mode. Rows stream through without loading the entire sheet into memory. Handles large files efficiently.

🚫

Row filtering

Drop blank spacer rows or require specific columns to be non-null after filling. Clean output without post-processing.

📥

Raw ingest mode

Skip fill-down entirely. Just load the sheet as-is with optional audit columns. Useful for raw copies and baseline comparisons.

🛠️

Minimal dependencies

Only requires openpyxl. No pandas, no numpy, no heavy frameworks. Installs fast, stays lean.

Quick Start

From install to clean data in under a minute.

1

Install

Grab it from PyPI with pip or pipx. Python 3.9+ required.

$ pip install xlfilldown
2

Fill down to SQLite

Point at your .xlsx, name the columns to fill, and pick a database.

$ xlfilldown db \
    --infile report.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols '["Region","Country"]' \
    --db output.db \
    --row-hash --excel-row-numbers
3

Or fill down to a new .xlsx

Same flags, different destination. Output a clean, filled spreadsheet.

$ xlfilldown xlsx \
    --infile report.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols-letters A B \
    --outfile filled.xlsx
4

Use the Python API

Two functions, clear return values. Embed in Flask, FastAPI, Airflow, or any script.

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="report.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country"],
    db="output.db",
)
# Returns: {table, columns, rows_ingested, row_hash, excel_row_numbers}

Compatibility

Tested and supported across Python and openpyxl versions.

3.9+ Python
3.1.0+ openpyxl
.xlsx OOXML format
SQLite 3 Database output
MIT License
v1.0.3 Current release