Best Practices

Recipes, patterns, and integration guides for getting the most out of xlfilldown.

CLI Recipes

Common command-line patterns for everyday use.

CLI

Basic hierarchical fill to SQLite

The most common use case: a grouped .xlsx export with Region/Country/City hierarchy. Fill down, add audit columns, and load into SQLite.

$ xlfilldown db \
    --infile quarterly_sales.xlsx \
    --insheet "Q4 Data" \
    --header-row 1 \
    --fill-cols '["Region","Country","City"]' \
    --db sales.db \
    --row-hash \
    --excel-row-numbers \
    --drop-blank-rows
CLI

Independent fill using column letters

When columns are independent (not hierarchical), use independent mode. Reference columns by their Excel letter instead of header names - useful when headers are long or contain special characters.

$ xlfilldown xlsx \
    --infile inventory.xlsx \
    --insheet "Stock" \
    --header-row 1 \
    --fill-cols-letters A C E \
    --fill-mode independent \
    --outfile inventory_filled.xlsx
CLI

Replace an existing table on re-run

When running the same ingest repeatedly (e.g. weekly report refresh), use --if-exists replace to drop and recreate the table each time.

$ xlfilldown db \
    --infile weekly_report.xlsx \
    --insheet "Data" \
    --header-row 1 \
    --fill-cols '["Department","Team"]' \
    --db reports.db \
    --table weekly_headcount \
    --if-exists replace
CLI

Append incremental data

For incremental loads where each run adds new rows (e.g. monthly exports appended to an annual table).

$ xlfilldown db \
    --infile march_data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols '["Category","Subcategory"]' \
    --db annual.db \
    --table transactions \
    --row-hash \
    --excel-row-numbers \
    --if-exists append

Note: When appending, the existing table schema must exactly match the expected columns (including row_hash and excel_row if enabled). xlfilldown validates this before inserting.

CLI

Raw ingest for audit baseline

Load a sheet exactly as-is (no fill-down) but with audit columns. Useful for keeping a raw copy alongside the filled version for comparison.

# Raw copy
$ xlfilldown db \
    --infile data.xlsx --insheet "Sheet1" --header-row 1 \
    --ingest-mode raw \
    --db audit.db --table raw_data \
    --row-hash --excel-row-numbers --if-exists replace

# Filled copy in the same database
$ xlfilldown db \
    --infile data.xlsx --insheet "Sheet1" --header-row 1 \
    --fill-cols '["Region","Country"]' \
    --db audit.db --table filled_data \
    --row-hash --excel-row-numbers --if-exists replace
CLI

Filter rows with require-non-null

Drop rows where critical columns are blank after filling. Combine with --drop-blank-rows to also remove spacer rows.

$ xlfilldown db \
    --infile report.xlsx \
    --insheet "Financials" \
    --header-row 3 \
    --fill-cols '["Division","BU"]' \
    --drop-blank-rows \
    --require-non-null '["Division","Amount"]' \
    --db clean.db

Python API Recipes

Using xlfilldown programmatically in scripts and applications.

Python

Basic script usage

The simplest way to use xlfilldown in a Python script. Two lines to go from messy .xlsx to clean SQLite.

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="report.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["Region", "Country"],
    db="output.db",
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
)

print(f"Ingested {summary['rows_ingested']} rows into {summary['table']}")
print(f"Columns: {summary['columns']}")
Python

Excel-to-Excel transformation

Read a messy .xlsx, fill it down, and write a clean .xlsx. No database involved.

from xlfilldown.api import ingest_excel_to_excel

summary = ingest_excel_to_excel(
    file="raw_export.xlsx",
    sheet="Data",
    header_row=1,
    fill_cols=["Department", "Team", "Role"],
    fill_mode="hierarchical",
    outfile="clean_export.xlsx",
    outsheet="Filled",
    drop_blank_rows=True,
    if_exists="replace",
)

print(f"Wrote {summary['rows_written']} rows")
Python

Process multiple sheets from one workbook

Loop through several sheets and load each into its own SQLite table.

from xlfilldown.api import ingest_excel_to_sqlite

sheets = [
    ("North America", ["Country", "State"]),
    ("EMEA", ["Country", "City"]),
    ("APAC", ["Country", "Region"]),
]

for sheet_name, cols in sheets:
    summary = ingest_excel_to_sqlite(
        file="global_sales.xlsx",
        sheet=sheet_name,
        header_row=1,
        fill_cols=cols,
        db="sales.db",
        row_hash=True,
        if_exists="replace",
    )
    print(f"{sheet_name}: {summary['rows_ingested']} rows")
Python

Using column letters in the API

When you know column positions but not header names (common with inconsistent exports), use fill_cols_letters.

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="messy_export.xlsx",
    sheet="Sheet1",
    header_row=2,                   # Headers on row 2
    fill_cols_letters=["A", "B", "C"],  # First three columns
    db="output.db",
    if_exists="replace",
)

Integration Patterns

Embed xlfilldown into web apps, pipelines, and automation workflows.

Flask

Flask upload endpoint

Accept an .xlsx upload, process it with xlfilldown, and return a summary. Users upload via a form or API call.

import os
import tempfile
from flask import Flask, request, jsonify
from xlfilldown.api import ingest_excel_to_sqlite

app = Flask(__name__)

@app.route("/ingest", methods=["POST"])
def ingest():
    uploaded = request.files["file"]
    sheet = request.form.get("sheet", "Sheet1")
    fill_cols = request.form.getlist("fill_cols")

    with tempfile.TemporaryDirectory() as tmp:
        inpath = os.path.join(tmp, "upload.xlsx")
        dbpath = os.path.join(tmp, "result.db")
        uploaded.save(inpath)

        summary = ingest_excel_to_sqlite(
            file=inpath,
            sheet=sheet,
            header_row=1,
            fill_cols=fill_cols,
            db=dbpath,
            row_hash=True,
            excel_row_numbers=True,
            if_exists="replace",
        )

    return jsonify(summary)
FastAPI

FastAPI upload endpoint

Same idea with FastAPI and async file handling.

import tempfile
from pathlib import Path
from fastapi import FastAPI, UploadFile, Form
from xlfilldown.api import ingest_excel_to_sqlite

app = FastAPI()

@app.post("/ingest")
async def ingest(
    file: UploadFile,
    sheet: str = Form("Sheet1"),
    fill_cols: str = Form('["Region","Country"]'),
):
    import json
    cols = json.loads(fill_cols)

    with tempfile.NamedTemporaryFile(suffix=".xlsx", delete=False) as tmp:
        tmp.write(await file.read())
        tmp_path = tmp.name

    db_path = tmp_path.replace(".xlsx", ".db")
    summary = ingest_excel_to_sqlite(
        file=tmp_path,
        sheet=sheet,
        header_row=1,
        fill_cols=cols,
        db=db_path,
        row_hash=True,
        if_exists="replace",
    )

    Path(tmp_path).unlink(missing_ok=True)
    return summary
Pipeline

Shell script for cron / CI

A simple shell script that runs in cron or a CI pipeline. Downloads a report, processes it, and logs the result.

#!/bin/bash
# /opt/etl/ingest_weekly.sh
# Runs every Monday at 6am via cron

set -euo pipefail

INFILE="/data/incoming/weekly_report.xlsx"
DB="/data/warehouse/reports.db"
LOG="/var/log/etl/weekly_ingest.log"

echo "[$(date -Iseconds)] Starting ingest" >> "$LOG"

xlfilldown db \
    --infile "$INFILE" \
    --insheet "Data" \
    --header-row 1 \
    --fill-cols '["Region","Country","City"]' \
    --db "$DB" \
    --table weekly_sales \
    --row-hash \
    --excel-row-numbers \
    --drop-blank-rows \
    --if-exists replace \
    >> "$LOG" 2>&1

echo "[$(date -Iseconds)] Done" >> "$LOG"
Pipeline

Docker one-liner

Run xlfilldown in a container without installing anything on the host. Mount your data directory and go.

$ docker run --rm \
    -v /path/to/data:/data \
    python:3.12-slim \
    sh -c "pip install -q xlfilldown && xlfilldown db \
      --infile /data/report.xlsx \
      --insheet Sheet1 \
      --header-row 1 \
      --fill-cols '[\"Region\",\"Country\"]' \
      --db /data/output.db \
      --row-hash --excel-row-numbers \
      --if-exists replace"
Pipeline

Airflow task

Use xlfilldown as a step in an Airflow DAG with the BashOperator or PythonOperator.

from airflow.decorators import task

@task()
def ingest_sales_report(infile: str, db: str):
    from xlfilldown.api import ingest_excel_to_sqlite

    summary = ingest_excel_to_sqlite(
        file=infile,
        sheet="Sheet1",
        header_row=1,
        fill_cols=["Region", "Country"],
        db=db,
        row_hash=True,
        excel_row_numbers=True,
        drop_blank_rows=True,
        if_exists="replace",
    )
    return summary

Data Quality Patterns

Use audit columns and hashing for validation and debugging.

Quality

Deduplication with row hashes

After ingesting with --row-hash, query for duplicate rows. Identical content produces identical hashes.

-- Find duplicate rows
SELECT row_hash, COUNT(*) AS cnt
FROM Sheet1
GROUP BY row_hash
HAVING cnt > 1
ORDER BY cnt DESC;

-- Keep only the first occurrence
DELETE FROM Sheet1
WHERE rowid NOT IN (
    SELECT MIN(rowid) FROM Sheet1 GROUP BY row_hash
);
Quality

Trace back to the source spreadsheet

With --excel-row-numbers, every record carries its original .xlsx row number. When a downstream user reports a problem, you can pinpoint the exact source row.

-- "This revenue figure looks wrong" - find the source row
SELECT excel_row, Region, Country, City, Product, Revenue
FROM Sheet1
WHERE Revenue = '99999';

-- Result: excel_row = 47
-- Open sales_report.xlsx, go to row 47, verify the value
Quality

Compare raw vs filled data

Ingest twice - once raw, once filled - and compare. Useful for validating that the fill logic matches your expectations.

from xlfilldown.api import ingest_excel_to_sqlite

# Raw ingest
ingest_excel_to_sqlite(
    file="data.xlsx", sheet="Sheet1", header_row=1,
    ingest_mode="raw",
    db="compare.db", table="raw",
    excel_row_numbers=True, if_exists="replace",
)

# Filled ingest
ingest_excel_to_sqlite(
    file="data.xlsx", sheet="Sheet1", header_row=1,
    fill_cols=["Region", "Country"],
    db="compare.db", table="filled",
    excel_row_numbers=True, if_exists="replace",
)

# Now query to see what changed
-- Find rows where fill-down changed a value
SELECT
    r.excel_row,
    r.Region AS raw_region,
    f.Region AS filled_region,
    r.Country AS raw_country,
    f.Country AS filled_country
FROM raw r
JOIN filled f ON r.excel_row = f.excel_row
WHERE r.Region != f.Region OR r.Country != f.Country
   OR (r.Region IS NULL AND f.Region IS NOT NULL)
   OR (r.Country IS NULL AND f.Country IS NOT NULL);

Design Patterns

Architectural approaches for common scenarios.

Pattern

Hierarchical vs independent - choosing the right mode

Use hierarchical when columns have a parent-child relationship (Region > Country > City). A change in a parent column should invalidate stale child values.

Use independent when columns are unrelated dimensions that happen to share blank patterns. For example, "Salesperson" and "Product Line" might both be sparse but don't depend on each other.

Rule of thumb: If you would use a tree diagram to describe the relationship between columns, use hierarchical. If the columns are flat attributes, use independent.

Pattern

Header row is not row 1

Many .xlsx exports have title rows, blank rows, or metadata above the actual headers. Use --header-row to point at the correct row. Everything above that row is ignored.

# Headers are on row 3 (rows 1-2 are title/metadata)
$ xlfilldown db \
    --infile finance_report.xlsx \
    --insheet "P&L" \
    --header-row 3 \
    --fill-cols '["Account Group","Account"]' \
    --db finance.db
Pattern

Batch size tuning for large files

The default batch size of 1000 rows works well for most files. For very large sheets (100K+ rows), a larger batch size reduces SQLite transaction overhead.

$ xlfilldown db \
    --infile huge_export.xlsx \
    --insheet "Data" \
    --header-row 1 \
    --fill-cols '["Category"]' \
    --db warehouse.db \
    --batch-size 5000
Pattern

Combining require-non-null with letters

Mix header-name and column-letter approaches for filtering. Both are merged and de-duplicated internally.

$ xlfilldown db \
    --infile data.xlsx \
    --insheet "Sheet1" \
    --header-row 1 \
    --fill-cols-letters A B C \
    --require-non-null '["Amount"]' \
    --require-non-null-letters A \
    --db filtered.db

Quick Reference

At-a-glance summary of the most common options.

TaskKey Options
Fill by header names--fill-cols '["Col1","Col2"]'
Fill by column letters--fill-cols-letters A B C
Hierarchical fill (default)--fill-mode hierarchical
Independent fill--fill-mode independent
Skip fill-down--ingest-mode raw
Add row hashes--row-hash
Add source row numbers--excel-row-numbers
Drop spacer rows--drop-blank-rows
Require non-null columns--require-non-null '["Col"]'
Overwrite on re-run--if-exists replace
Append to existing--if-exists append
Custom table name--table my_table
Tune batch size--batch-size 5000

API Reference

Key functions and their return values.

FunctionOutputReturn Keys
ingest_excel_to_sqlite() SQLite database table, columns, rows_ingested, row_hash, excel_row_numbers
ingest_excel_to_excel() .xlsx file workbook, sheet, columns, rows_written, row_hash, excel_row_numbers
normalize_headers() list[str] Trimmed, normalized header strings
canon_list() str JSON-format canonical string
sha256_hex() str 64-char hex digest
qident() str Safely quoted SQLite identifier