How to compare two CSV files
The fastest way: drop the old version into File A and the new version into File B at the top of this page. The tool parses both with PapaParse and shows three tabs — Added, Removed, Changed — each with a downloadable CSV.
- Upload or paste File A (the baseline / previous version).
- Upload or paste File B (the new / updated version).
- Pick a match strategy:
- Whole-row equality (default) — rows match when every column is identical. Best for quick “what changed?” between two snapshots.
- Key column — pick a stable identifier (
id,email,sku) and the tool pairs rows across files, surfacing field-level changes in the Changed tab.
- Review the three tabs, then download
added.csv,removed.csv,changed.csv, orsummary.csv(everything in one file with a__diff_status__flag).
Works for product catalog updates, user-list deltas, audit reports — anything where you have “yesterday’s CSV” and “today’s CSV” and need to know the difference.
Whole-row vs key-column matching
This is the most important decision when diffing CSVs. Pick the wrong one and your diff is meaningless.
Whole-row matching
Two rows match only if every column is identical (column order ignored — the tool sorts headers before hashing). A row that exists in both files with one tiny edit looks like:
- 1 row removed (the old version)
- 1 row added (the new version)
That’s correct in a strict set-difference sense, but rarely what you want for human review. Use whole-row matching when:
- You’re verifying that two exports are byte-for-byte equivalent.
- Your rows don’t have a stable identifier.
- You want to see literal duplicates appear/disappear.
Key-column matching
Pick a column that’s unique per row — id, email, order_number, sku. The tool builds a map keyed by that column, then for each key it asks:
- In B but not A? → Added
- In A but not B? → Removed
- In both, but at least one other field differs? → Changed
The Changed tab highlights the specific cells that differ, so you can see at a glance that “row 4172’s status went from pending to shipped” without scanning a hundred columns. Use key-column matching when:
- Your rows have a stable identifier.
- You care about what changed in each row, not just which rows appeared/disappeared.
- The two files might have re-sorted rows (key-column matching is order-independent).
Tip: if your “key” isn’t a single column (e.g.
order_id+line_number), concatenate them in a spreadsheet first into acomposite_keycolumn, then upload.
Comparing CSVs in Excel (VLOOKUP / XLOOKUP)
Excel has no built-in CSV diff, so people improvise. The classic recipes:
1. Side-by-side conditional formatting
Open both CSVs as separate sheets (Sheet1, Sheet2), then in a third sheet write per-cell:
=IF(Sheet1!A1=Sheet2!A1,"","Diff")
Drag across the whole range. Cells say “Diff” wherever values disagree. Brittle: rows must already be in the same order, and any inserted row breaks every cell below it.
2. VLOOKUP for “rows added/removed”
In File A, add a helper column to flag whether each id exists in File B:
=IF(ISNA(VLOOKUP(A2, FileB!A:A, 1, FALSE)), "REMOVED", "")
Mirror it in File B with the lookup pointing at File A’s id column to flag added rows. Slow on large files (O(n²) lookups) and only tells you about presence/absence — not field changes.
3. XLOOKUP (Excel 365 / 2021+)
Cleaner than VLOOKUP and handles both directions. To compare an email field for the same id:
=XLOOKUP(A2, FileB!A:A, FileB!B:B, "MISSING") <> B2
Returns TRUE for changed rows. Combine with FILTER() to extract just the changed rows. Still falls over past ~50K rows.
For anything bigger or more rigorous, the tool above (or the CLI options below) is faster and less error-prone.
Comparing CSVs with diff / git diff / csvkit / pandas
Command-line options for the cases where browser tools or Excel don’t fit.
Plain diff and git diff
diff old.csv new.csv
git diff --no-index old.csv new.csv
These are line-based — they have no notion of rows, columns, or keys. Re-sort either CSV and the diff is total noise. Move a column and every line changes. Useful only when both files are guaranteed to share the exact same row order and column order.
csvdiff (Aswin’s Go tool)
csvdiff is purpose-built for diffing big CSVs by primary key:
csvdiff --primary-key 0 old.csv new.csv
# Or with multiple key columns:
csvdiff --primary-key 0,1 old.csv new.csv
# Get the result as CSV:
csvdiff --primary-key 0 --format rowmark old.csv new.csv > diff.csv
Hash-based, very fast (millions of rows in seconds), single static binary. The closest CLI equivalent of this page.
csvkit
# Set semantics: rows in A not in B
csvjoin -c id --left old.csv new.csv | csvgrep -c id_2 -r "^$"
Works but verbose. csvkit is best for one-off transforms, less so for diffing.
pandas
The Pythonic version — flexible and scriptable:
import pandas as pd
a = pd.read_csv("old.csv")
b = pd.read_csv("new.csv")
# Outer merge with an indicator tells you where each row lives.
merged = a.merge(b, on="id", how="outer", indicator=True, suffixes=("_old", "_new"))
added = merged[merged["_merge"] == "right_only"]
removed = merged[merged["_merge"] == "left_only"]
both = merged[merged["_merge"] == "both"]
# For "changed", compare every non-key column between the _old and _new variants:
non_key = [c for c in a.columns if c != "id"]
changed_mask = False
for col in non_key:
changed_mask = changed_mask | (both[f"{col}_old"].fillna("") != both[f"{col}_new"].fillna(""))
changed = both[changed_mask]
added.to_csv("added.csv", index=False)
removed.to_csv("removed.csv", index=False)
changed.to_csv("changed.csv", index=False)
The indicator=True flag is the trick — it tags each row with left_only, right_only, or both. Once you have that, Added/Removed/Changed fall out naturally. Same model the tool above uses.
Common diff scenarios
Catalog updates
You export your product catalog every morning. Today’s products.csv has 12,400 rows; yesterday’s had 12,388. Did you really add 12 products, or did 5 get added, 3 removed, and 4 prices change? Pick sku as the key column — the answer is in the Changed tab.
User list deltas
Marketing wants “everyone who signed up this week.” You have last Monday’s users.csv and today’s users.csv. Use email (or user_id) as the key column; the Added tab is your weekly signup list. The Changed tab catches anyone who updated their plan or country during the week.
Audit / reconciliation reports
Finance exported transactions from two systems that should match. Diff with key = transaction_id. The Added and Removed tabs are missing-from-one-side records (the bugs); the Changed tab shows mismatched amounts that need investigation.
Verifying a migration or transform
You ran a one-off script over data.csv to clean up phone numbers. Diff input vs output with whole-row matching to confirm only the rows you expected to change actually changed.
Common pitfalls
Column order differences
Two exports of the same data with columns in a different order will look identical to this tool (column order is ignored), but diff and git diff will show every line as changed. If you’re comparing exports from different systems, normalize column order before using line-based diffs.
Whitespace and encoding
Trailing spaces ("Ada " vs "Ada") count as a diff — and they’re invisible. The tool trims surrounding whitespace before comparing, but embedded double-spaces still count. UTF-8 BOMs ( at the start of a file) silently rename your first column to id. If a single header looks “off,” check for a BOM. Fix broken CSVs here before diffing if encoding is suspect.
Sort order
If you’re using whole-row matching, row order doesn’t matter — rows are hashed into a set. If you’re using key-column matching, order also doesn’t matter. So sort order is rarely an issue here, but it absolutely is for diff / git diff / spreadsheet conditional-formatting approaches.
Numeric vs string equality
"1" and "01" are different strings even though they represent the same number. The tool compares as strings (because CSV is strings). If you want numeric equality, normalize the column upstream — or accept the diff and treat it as a real data inconsistency worth surfacing.
Duplicate keys
If your “key column” actually has duplicates, key-based matching is undefined: only the last row with that key wins on each side. The tool surfaces nothing about this — sanity-check uniqueness with awk -F, '{print $1}' file.csv | sort | uniq -d or in Excel before relying on a key-column diff.
Privacy: nothing is uploaded
Both CSVs are parsed and compared entirely in your browser using PapaParse and a small in-house diff engine. Neither file ever reaches a server — open DevTools → Network and confirm. That matters when you’re comparing customer lists, internal product data, or anything you wouldn’t paste into a public diff site.
Need to do something else with these files? Merge two CSVs into one, view a CSV in a sortable table, or fix a broken CSV before comparing.
Related tools
Convert any CSV file to a real .xlsx Excel workbook in seconds. Free, no signup, files never leave your browser.
Turn any .xlsx or .xls Excel file into a clean CSV. Pick the sheet, pick the delimiter, download. No upload.
Convert any Excel workbook (.xlsx or .xls) to a printable PDF in seconds. Pick the sheet, pick orientation, download. 100% private.
Convert any CSV file to a clean PDF table in seconds. Free, no signup, files never leave your browser.
Frequently asked questions
- How do I diff two CSV files?
Drop File A (the old version) and File B (the new version) into the boxes at the top of this page. The tool parses both with PapaParse and shows three tabs — Added, Removed, Changed — with downloadable CSVs for each.
- When should I pick a key column instead of whole-row matching?
Whole-row matching shows what's strictly added or removed, but a typo in any field looks like 'one row removed + one row added'. If your rows have a stable identifier (id, email, sku, order_number), pick it from the dropdown — the tool will pair rows by that key and surface field-level changes in the Changed tab.
- Does column order matter?
No. The whole-row comparison sorts column names before hashing each row, so re-ordering columns between exports won't show up as a diff. Headers that exist in only one file are added to the union and treated as empty in the other.
- How big can the files be?
Comparison happens in your browser, so it's bounded by your machine's RAM. Two 100K-row files with ~20 columns each are fine on a modern laptop. For multi-million-row diffs, use csvdiff or pandas on the command line — see the article below.
- How is this different from Excel's diff?
Excel doesn't have a built-in diff — people fake it with VLOOKUP, XLOOKUP, or conditional formatting on a side-by-side sheet. That works for tiny files but breaks down past a few thousand rows. This tool diffs end-to-end and gives you Added/Removed/Changed CSVs ready to share.
- How is this different from git diff?
git diff is line-based — re-sorting a CSV or moving columns produces a wall of red and green that doesn't reflect real data changes. This tool diffs at the row level (and optionally by key column), so reordering rows or columns is a non-event.
- Can I see exactly which fields changed?
Yes — pick a key column and switch to the Changed tab. Cells that differ between File A and File B are highlighted in amber. The downloadable changed.csv contains both the before and after rows with a status flag.
- Is my CSV uploaded?
No. Both files are parsed and compared entirely in your browser using PapaParse. Nothing is sent to a server — verify in DevTools → Network.