CLI Recipes
Common command-line patterns for everyday use.
Basic hierarchical fill to SQLite
The most common use case: a grouped .xlsx export with Region/Country/City hierarchy. Fill down, add audit columns, and load into SQLite.
$ xlfilldown db \ --infile quarterly_sales.xlsx \ --insheet "Q4 Data" \ --header-row 1 \ --fill-cols '["Region","Country","City"]' \ --db sales.db \ --row-hash \ --excel-row-numbers \ --drop-blank-rows
Independent fill using column letters
When columns are independent (not hierarchical), use independent mode. Reference columns by their Excel letter instead of header names - useful when headers are long or contain special characters.
$ xlfilldown xlsx \ --infile inventory.xlsx \ --insheet "Stock" \ --header-row 1 \ --fill-cols-letters A C E \ --fill-mode independent \ --outfile inventory_filled.xlsx
Replace an existing table on re-run
When running the same ingest repeatedly (e.g. weekly report refresh), use --if-exists replace to drop and recreate the table each time.
$ xlfilldown db \ --infile weekly_report.xlsx \ --insheet "Data" \ --header-row 1 \ --fill-cols '["Department","Team"]' \ --db reports.db \ --table weekly_headcount \ --if-exists replace
Append incremental data
For incremental loads where each run adds new rows (e.g. monthly exports appended to an annual table).
$ xlfilldown db \ --infile march_data.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --fill-cols '["Category","Subcategory"]' \ --db annual.db \ --table transactions \ --row-hash \ --excel-row-numbers \ --if-exists append
Note: When appending, the existing table schema must exactly match the expected columns (including row_hash and excel_row if enabled). xlfilldown validates this before inserting.
Raw ingest for audit baseline
Load a sheet exactly as-is (no fill-down) but with audit columns. Useful for keeping a raw copy alongside the filled version for comparison.
# Raw copy $ xlfilldown db \ --infile data.xlsx --insheet "Sheet1" --header-row 1 \ --ingest-mode raw \ --db audit.db --table raw_data \ --row-hash --excel-row-numbers --if-exists replace # Filled copy in the same database $ xlfilldown db \ --infile data.xlsx --insheet "Sheet1" --header-row 1 \ --fill-cols '["Region","Country"]' \ --db audit.db --table filled_data \ --row-hash --excel-row-numbers --if-exists replace
Filter rows with require-non-null
Drop rows where critical columns are blank after filling. Combine with --drop-blank-rows to also remove spacer rows.
$ xlfilldown db \ --infile report.xlsx \ --insheet "Financials" \ --header-row 3 \ --fill-cols '["Division","BU"]' \ --drop-blank-rows \ --require-non-null '["Division","Amount"]' \ --db clean.db
Python API Recipes
Using xlfilldown programmatically in scripts and applications.
Basic script usage
The simplest way to use xlfilldown in a Python script. Two lines to go from messy .xlsx to clean SQLite.
from xlfilldown.api import ingest_excel_to_sqlite summary = ingest_excel_to_sqlite( file="report.xlsx", sheet="Sheet1", header_row=1, fill_cols=["Region", "Country"], db="output.db", row_hash=True, excel_row_numbers=True, if_exists="replace", ) print(f"Ingested {summary['rows_ingested']} rows into {summary['table']}") print(f"Columns: {summary['columns']}")
Excel-to-Excel transformation
Read a messy .xlsx, fill it down, and write a clean .xlsx. No database involved.
from xlfilldown.api import ingest_excel_to_excel summary = ingest_excel_to_excel( file="raw_export.xlsx", sheet="Data", header_row=1, fill_cols=["Department", "Team", "Role"], fill_mode="hierarchical", outfile="clean_export.xlsx", outsheet="Filled", drop_blank_rows=True, if_exists="replace", ) print(f"Wrote {summary['rows_written']} rows")
Process multiple sheets from one workbook
Loop through several sheets and load each into its own SQLite table.
from xlfilldown.api import ingest_excel_to_sqlite sheets = [ ("North America", ["Country", "State"]), ("EMEA", ["Country", "City"]), ("APAC", ["Country", "Region"]), ] for sheet_name, cols in sheets: summary = ingest_excel_to_sqlite( file="global_sales.xlsx", sheet=sheet_name, header_row=1, fill_cols=cols, db="sales.db", row_hash=True, if_exists="replace", ) print(f"{sheet_name}: {summary['rows_ingested']} rows")
Using column letters in the API
When you know column positions but not header names (common with inconsistent exports), use fill_cols_letters.
from xlfilldown.api import ingest_excel_to_sqlite summary = ingest_excel_to_sqlite( file="messy_export.xlsx", sheet="Sheet1", header_row=2, # Headers on row 2 fill_cols_letters=["A", "B", "C"], # First three columns db="output.db", if_exists="replace", )
Integration Patterns
Embed xlfilldown into web apps, pipelines, and automation workflows.
Flask upload endpoint
Accept an .xlsx upload, process it with xlfilldown, and return a summary. Users upload via a form or API call.
import os import tempfile from flask import Flask, request, jsonify from xlfilldown.api import ingest_excel_to_sqlite app = Flask(__name__) @app.route("/ingest", methods=["POST"]) def ingest(): uploaded = request.files["file"] sheet = request.form.get("sheet", "Sheet1") fill_cols = request.form.getlist("fill_cols") with tempfile.TemporaryDirectory() as tmp: inpath = os.path.join(tmp, "upload.xlsx") dbpath = os.path.join(tmp, "result.db") uploaded.save(inpath) summary = ingest_excel_to_sqlite( file=inpath, sheet=sheet, header_row=1, fill_cols=fill_cols, db=dbpath, row_hash=True, excel_row_numbers=True, if_exists="replace", ) return jsonify(summary)
FastAPI upload endpoint
Same idea with FastAPI and async file handling.
import tempfile from pathlib import Path from fastapi import FastAPI, UploadFile, Form from xlfilldown.api import ingest_excel_to_sqlite app = FastAPI() @app.post("/ingest") async def ingest( file: UploadFile, sheet: str = Form("Sheet1"), fill_cols: str = Form('["Region","Country"]'), ): import json cols = json.loads(fill_cols) with tempfile.NamedTemporaryFile(suffix=".xlsx", delete=False) as tmp: tmp.write(await file.read()) tmp_path = tmp.name db_path = tmp_path.replace(".xlsx", ".db") summary = ingest_excel_to_sqlite( file=tmp_path, sheet=sheet, header_row=1, fill_cols=cols, db=db_path, row_hash=True, if_exists="replace", ) Path(tmp_path).unlink(missing_ok=True) return summary
Shell script for cron / CI
A simple shell script that runs in cron or a CI pipeline. Downloads a report, processes it, and logs the result.
#!/bin/bash # /opt/etl/ingest_weekly.sh # Runs every Monday at 6am via cron set -euo pipefail INFILE="/data/incoming/weekly_report.xlsx" DB="/data/warehouse/reports.db" LOG="/var/log/etl/weekly_ingest.log" echo "[$(date -Iseconds)] Starting ingest" >> "$LOG" xlfilldown db \ --infile "$INFILE" \ --insheet "Data" \ --header-row 1 \ --fill-cols '["Region","Country","City"]' \ --db "$DB" \ --table weekly_sales \ --row-hash \ --excel-row-numbers \ --drop-blank-rows \ --if-exists replace \ >> "$LOG" 2>&1 echo "[$(date -Iseconds)] Done" >> "$LOG"
Docker one-liner
Run xlfilldown in a container without installing anything on the host. Mount your data directory and go.
$ docker run --rm \ -v /path/to/data:/data \ python:3.12-slim \ sh -c "pip install -q xlfilldown && xlfilldown db \ --infile /data/report.xlsx \ --insheet Sheet1 \ --header-row 1 \ --fill-cols '[\"Region\",\"Country\"]' \ --db /data/output.db \ --row-hash --excel-row-numbers \ --if-exists replace"
Airflow task
Use xlfilldown as a step in an Airflow DAG with the BashOperator or PythonOperator.
from airflow.decorators import task @task() def ingest_sales_report(infile: str, db: str): from xlfilldown.api import ingest_excel_to_sqlite summary = ingest_excel_to_sqlite( file=infile, sheet="Sheet1", header_row=1, fill_cols=["Region", "Country"], db=db, row_hash=True, excel_row_numbers=True, drop_blank_rows=True, if_exists="replace", ) return summary
Data Quality Patterns
Use audit columns and hashing for validation and debugging.
Deduplication with row hashes
After ingesting with --row-hash, query for duplicate rows. Identical content produces identical hashes.
-- Find duplicate rows SELECT row_hash, COUNT(*) AS cnt FROM Sheet1 GROUP BY row_hash HAVING cnt > 1 ORDER BY cnt DESC; -- Keep only the first occurrence DELETE FROM Sheet1 WHERE rowid NOT IN ( SELECT MIN(rowid) FROM Sheet1 GROUP BY row_hash );
Trace back to the source spreadsheet
With --excel-row-numbers, every record carries its original .xlsx row number. When a downstream user reports a problem, you can pinpoint the exact source row.
-- "This revenue figure looks wrong" - find the source row SELECT excel_row, Region, Country, City, Product, Revenue FROM Sheet1 WHERE Revenue = '99999'; -- Result: excel_row = 47 -- Open sales_report.xlsx, go to row 47, verify the value
Compare raw vs filled data
Ingest twice - once raw, once filled - and compare. Useful for validating that the fill logic matches your expectations.
from xlfilldown.api import ingest_excel_to_sqlite # Raw ingest ingest_excel_to_sqlite( file="data.xlsx", sheet="Sheet1", header_row=1, ingest_mode="raw", db="compare.db", table="raw", excel_row_numbers=True, if_exists="replace", ) # Filled ingest ingest_excel_to_sqlite( file="data.xlsx", sheet="Sheet1", header_row=1, fill_cols=["Region", "Country"], db="compare.db", table="filled", excel_row_numbers=True, if_exists="replace", ) # Now query to see what changed
-- Find rows where fill-down changed a value SELECT r.excel_row, r.Region AS raw_region, f.Region AS filled_region, r.Country AS raw_country, f.Country AS filled_country FROM raw r JOIN filled f ON r.excel_row = f.excel_row WHERE r.Region != f.Region OR r.Country != f.Country OR (r.Region IS NULL AND f.Region IS NOT NULL) OR (r.Country IS NULL AND f.Country IS NOT NULL);
Design Patterns
Architectural approaches for common scenarios.
Hierarchical vs independent - choosing the right mode
Use hierarchical when columns have a parent-child relationship (Region > Country > City). A change in a parent column should invalidate stale child values.
Use independent when columns are unrelated dimensions that happen to share blank patterns. For example, "Salesperson" and "Product Line" might both be sparse but don't depend on each other.
Rule of thumb: If you would use a tree diagram to describe the relationship between columns, use hierarchical. If the columns are flat attributes, use independent.
Header row is not row 1
Many .xlsx exports have title rows, blank rows, or metadata above the actual headers. Use --header-row to point at the correct row. Everything above that row is ignored.
# Headers are on row 3 (rows 1-2 are title/metadata) $ xlfilldown db \ --infile finance_report.xlsx \ --insheet "P&L" \ --header-row 3 \ --fill-cols '["Account Group","Account"]' \ --db finance.db
Batch size tuning for large files
The default batch size of 1000 rows works well for most files. For very large sheets (100K+ rows), a larger batch size reduces SQLite transaction overhead.
$ xlfilldown db \ --infile huge_export.xlsx \ --insheet "Data" \ --header-row 1 \ --fill-cols '["Category"]' \ --db warehouse.db \ --batch-size 5000
Combining require-non-null with letters
Mix header-name and column-letter approaches for filtering. Both are merged and de-duplicated internally.
$ xlfilldown db \ --infile data.xlsx \ --insheet "Sheet1" \ --header-row 1 \ --fill-cols-letters A B C \ --require-non-null '["Amount"]' \ --require-non-null-letters A \ --db filtered.db
Quick Reference
At-a-glance summary of the most common options.
| Task | Key Options |
|---|---|
| Fill by header names | --fill-cols '["Col1","Col2"]' |
| Fill by column letters | --fill-cols-letters A B C |
| Hierarchical fill (default) | --fill-mode hierarchical |
| Independent fill | --fill-mode independent |
| Skip fill-down | --ingest-mode raw |
| Add row hashes | --row-hash |
| Add source row numbers | --excel-row-numbers |
| Drop spacer rows | --drop-blank-rows |
| Require non-null columns | --require-non-null '["Col"]' |
| Overwrite on re-run | --if-exists replace |
| Append to existing | --if-exists append |
| Custom table name | --table my_table |
| Tune batch size | --batch-size 5000 |
API Reference
Key functions and their return values.
| Function | Output | Return Keys |
|---|---|---|
ingest_excel_to_sqlite() |
SQLite database | table, columns, rows_ingested, row_hash, excel_row_numbers |
ingest_excel_to_excel() |
.xlsx file | workbook, sheet, columns, rows_written, row_hash, excel_row_numbers |
normalize_headers() |
list[str] | Trimmed, normalized header strings |
canon_list() |
str | JSON-format canonical string |
sha256_hex() |
str | 64-char hex digest |
qident() |
str | Safely quoted SQLite identifier |