Browser-native file tools

Repair a Broken CSV File

Drop a broken CSV below — the tool fixes BOM bytes, mixed line endings, the wrong delimiter, mismatched column counts, unescaped quotes, and trailing whitespace, and shows you exactly what it changed.

— or paste a broken CSV below

How to fix a broken CSV

Drop the file into the repair tool at the top of this page. It runs eight checks in one pass, lists every fix it applied, and gives you a clean RFC 4180 CSV back. Everything happens in your browser using PapaParse — no upload.

  1. Upload or paste the broken CSV. Drag-and-drop a .csv file, or paste the contents.
  2. Review the toggles — all eight repairs are on by default. Disable any you don’t want (e.g. keep semicolons instead of converting to comma).
  3. Read the “Repairs applied” report above the output. You’ll see exact counts: “Padded 3 short rows”, “Stripped UTF-8 BOM”, “Recovered from 2 quote mismatches”.
  4. Download or copy the cleaned .csv. Comma-delimited, LF-ended, BOM-free, RFC 4180-compliant.

Common CSV problems and fixes

Mismatched column count

The most common error you’ll see is some variant of:

ParserError: Error tokenizing data. C error: Expected 5 fields in line 12, saw 7

That’s pandas’ way of saying row 12 has more columns than the header. Causes:

  • An unescaped comma inside a field — e.g. Smith, John got split into two columns.
  • An unescaped quote that swallows the next several fields.
  • A row genuinely has extra junk (e.g. a trailing comma in some rows, not others).

The repair tool pads short rows with empty cells and trims over-long rows down to the header length (with a warning so you can investigate). Combined with re-quoting, this fixes the vast majority of “saw N, expected M” errors.

Unescaped quotes inside fields

The CSV spec (RFC 4180) says: a field with a quote in it must be wrapped in quotes, and internal quotes must be doubled. So says "hi" should be written as "says ""hi""". Real-world CSVs often violate this — common breakage looks like:

id,note
1,he said "hello"
2,"unbalanced "quote inside

Symptoms: parsers swallow the rest of the file as one giant field, or report Error: unexpected character after closing quote. The tool re-parses leniently and re-emits every field with proper quoting.

Mixed line endings (\r vs \n vs \r\n)

CSVs concatenated from different sources (or saved on different OSes) end up with a mix of:

  • \n — Unix / macOS / web
  • \r\n — Windows / RFC 4180 spec
  • \r — old Mac OS, very rare but still seen in legacy exports

Some parsers split on one but not the others, producing rows like 1,Ada\rdata@example.com joined into a single cell. The tool normalizes everything to LF (\n).

BOM bytes (UTF-8 BOM)

The Byte Order Mark — three bytes EF BB BF, displayed as  — gets prepended by Notepad, Excel “Save as CSV UTF-8”, and various Windows utilities. It’s invisible in editors but breaks header lookups:

df = pd.read_csv("data.csv")
df.columns
# Index(['id', 'name', 'email'], dtype='object')
df["id"]
# KeyError: 'id'

Fix in pandas: pd.read_csv("data.csv", encoding="utf-8-sig"). Or just run the file through this tool and the BOM is gone.

Wrong encoding (UTF-8 vs Latin-1, mojibake)

If you see caf├⌐ instead of café, or £ instead of £, the file was written in one encoding and read in another. This is byte-level corruption, not text-level — by the time the file is in your browser as a string, it’s too late to fix here. Use the dedicated CSV encoding converter on the original bytes, then run the result through this repair tool.

In Python:

# Read as Latin-1, write as UTF-8
with open("broken.csv", encoding="latin-1") as f:
    text = f.read()
with open("fixed.csv", "w", encoding="utf-8") as f:
    f.write(text)

Wrong delimiter for locale (comma vs semicolon)

European Excel saves CSVs with semicolons (;), because the comma is the decimal separator there. US Excel saves commas. When a file crosses borders, you get a “CSV” that’s actually a SCSV. Symptoms: every row appears as a single cell when opened.

The tool auto-detects the actual delimiter by counting commas, semicolons, tabs, and pipes (outside quoted regions) and converts to comma in the output. If you’d rather keep the semicolons, see CSV comma-delimited for the explicit delimiter swap.

Trailing commas and empty rows

CSVs exported from spreadsheets often have trailing commas (because the spreadsheet has empty trailing columns) and trailing empty rows. They’re harmless but bloat the file and confuse some importers:

id,name,
1,Ada,
2,Linus,
,,
,,

The repair tool drops fully-empty rows and trims trailing whitespace per cell — the trailing empty column is preserved if the header has it (since that’s structural), but the empty rows go.

How to repair CSV in Python (pandas error_bad_lines)

The classic pandas error:

ParserError: Error tokenizing data. C error: Expected 5 fields in line 12, saw 7

Pandas added on_bad_lines in 1.3 (replacing error_bad_lines / warn_bad_lines):

import pandas as pd

# Skip malformed rows entirely:
df = pd.read_csv("broken.csv", on_bad_lines="skip")

# Just warn, don't skip (older pandas):
df = pd.read_csv("broken.csv", error_bad_lines=False, warn_bad_lines=True)

# Forgiving Python engine + custom quoting:
df = pd.read_csv("broken.csv", engine="python", quoting=3)  # QUOTE_NONE

That gets you past the parse error, but you’ve lost rows. The tool above fixes them instead, and you can save the result and read it cleanly:

df = pd.read_csv("fixed.csv")  # no errors

For BOM specifically:

df = pd.read_csv("data.csv", encoding="utf-8-sig")  # strips BOM

For unknown encoding:

import chardet
with open("data.csv", "rb") as f:
    enc = chardet.detect(f.read())["encoding"]
df = pd.read_csv("data.csv", encoding=enc)

How to repair CSV in Excel

Excel is forgiving about opening broken CSVs — too forgiving, sometimes. The “Open” dialog runs the Text Import Wizard if the file looks unusual, but skips it for .csv extensions on modern Excel. Manual repair path:

  1. Rename the file to .txt to force the import wizard.
  2. File → Open → pick the .txt — Excel will ask delimiter, quote char, encoding.
  3. Pick the right delimiter (comma vs semicolon vs tab) and encoding (UTF-8 vs Windows-1252).
  4. Save as CSV UTF-8 (Comma delimited) (.csv) to get a clean copy.

This works but is slow and easy to get wrong. The tool above is faster for small/medium files.

How to repair CSV with csvkit

csvkit is a Python toolkit for CSVs. Useful repair commands:

# Validate against RFC 4180 — reports every malformed row:
csvclean --dry-run broken.csv

# Auto-repair (writes broken.err.csv with bad rows, broken.out.csv with good ones):
csvclean broken.csv

# Convert delimiter explicitly:
csvformat -d ";" broken.csv > comma.csv

# Strip BOM:
csvformat broken.csv > fixed.csv  # csvkit ignores BOM automatically

For one-off browser fixes the tool above is faster; for batch / scripted repairs, csvclean in CI is the right call.

Why CSVs break in the first place

CSV isn’t really a format — it’s a half-dozen near-compatible conventions that mostly agree. Common sources of breakage:

  • Excel exports add a BOM in some versions, omit it in others, and use the OS line endings (CRLF on Windows, LF on macOS in newer versions).
  • Database exports vary by client: mysql -e ... > out.csv writes tab-delimited despite the extension; psql \copy writes proper CSV; SELECT INTO OUTFILE needs explicit FIELDS TERMINATED BY.
  • Locale-specific spreadsheets disagree on the delimiter (, vs ;) and decimal mark (. vs ,).
  • Hand-edited files introduce unescaped quotes, mixed line endings (when concatenated with cat across OSes), and mismatched column counts.
  • Streaming exports that get truncated mid-row leave a half-row at the end.

RFC 4180 settled the spec in 2005, but the real world hasn’t caught up. The repair tool above is forgiving on input and strict on output: whatever you feed in, you get a spec-compliant CSV back.

Privacy: nothing is uploaded

The repair runs entirely in your browser using PapaParse for parsing and a small in-house repair pass for the cleanup steps. No file ever reaches a server — verify in DevTools → Network. That matters when the broken CSV is a production export, customer data, or an audit dump you can’t paste into a public service.

After repair, you can view the fixed CSV, convert delimiters, or fix the encoding if the bytes themselves were corrupted.

Related tools

Frequently asked questions

  • What does this tool actually fix?

    Common breakage: UTF-8 BOM bytes, mixed CRLF/LF/CR line endings, the wrong delimiter (semicolon/tab/pipe instead of comma), unescaped quotes, mismatched column counts (rows shorter or longer than the header), trailing whitespace, and fully-empty rows. Each fix is a checkbox so you can disable repairs you don't want.

  • What about encoding errors — UTF-8 vs Latin-1, mojibake characters?

    True encoding repair (e.g. 'café' arriving as 'café') needs to happen at the byte level when the file is read, before the text reaches this tool. Use the dedicated /csv-encoding-converter/ for that, then run the result through this repair tool. The browser will always read uploaded files as UTF-8 here.

  • My CSV has 'Error: Expected X fields in line N, saw Y' — can this fix it?

    Yes. That error happens when a row has more or fewer columns than the header. With 'Pad short rows / trim over-long rows' enabled, this tool pads short rows with empty cells and trims over-long ones (with a warning so you can investigate). The cause is usually an unescaped comma or unescaped quote — also fixed automatically.

  • What is the BOM and why does it break my CSV?

    The Byte Order Mark (\ufeff, three bytes 0xEF 0xBB 0xBF in UTF-8) is an invisible marker some Windows tools add at the start of UTF-8 files. It makes the first column header look like '\ufeffid' instead of 'id', breaking lookups by column name in pandas, Power BI, and database imports. This tool strips it.

  • Why does my CSV have mixed quotes?

    Usually because data was concatenated from two sources (one quoted, one unquoted), or because a field containing a comma/newline wasn't wrapped in quotes by whatever produced the file. The tool re-quotes every field that needs it (commas, quotes, newlines) and leaves the rest unquoted, producing a fully RFC 4180-compliant output.

  • My CSV won't open in Excel — will this fix it?

    Often yes. Excel needs a comma delimiter (in US locales), no BOM (or specifically a BOM, depending on Excel version — confusingly), and CRLF line endings. This tool's defaults produce a comma-delimited, LF-ended, BOM-free CSV, which Excel reads fine. If your locale uses semicolons, see /csv-comma-delimited/.

  • Does it handle very large CSVs?

    Up to ~50 MB / a few hundred thousand rows works in modern browsers. Past that, the parsing pass slows down — for million-row files, use Python (pandas) or csvkit locally; see the article below for exact commands.

  • Is my CSV uploaded?

    No. Parsing and repair run entirely in your browser using PapaParse. The file never leaves your machine — verify in DevTools → Network. Useful for internal data, exports from production databases, and anything you wouldn't paste into a third-party converter.