Browser-native file tools

Fix CSV Encoding

Convert any CSV to UTF-8, Latin-1, Windows-1252, or UTF-16. Auto-detects the source, fixes mojibake (café → café), adds the UTF-8 BOM Excel needs.

— or drop a file onto the preview below

Fix mojibake (garbled accents)

If your CSV shows things like café instead of café, the bytes are UTF-8 but were read as Windows-1252. To recover:

  1. Set Source encoding to Windows-1252 (or Latin-1).
  2. Set Target encoding to UTF-8 with BOM.
  3. Download — the accents will be correct again.
Common patterns: é → é · ñ → ñ · ü → ü · ’ → ’ · “ → “

How to convert CSV encoding

The fastest way: drop your CSV into the converter at the top of this page, leave Source encoding on Auto-detect, pick a Target encoding (most people want UTF-8 with BOM so Excel reads accents correctly), and click Download. Runs entirely in your browser using the native TextDecoder and TextEncoder APIs — no upload.

  1. Upload your CSV (or drag-and-drop). The file is read as raw bytes, not text.
  2. Auto-detect checks for a BOM (EF BB BF for UTF-8, FF FE for UTF-16 LE, FE FF for UTF-16 BE), then tries strict UTF-8 validation. If that fails it falls back to Windows-1252.
  3. Pick the target. UTF-8 with BOM is the safe default for cross-platform CSVs that need to open in Excel. Plain UTF-8 is best for the web, Linux, and databases.
  4. Download. If any characters can’t be represented in the target (e.g. €, smart quotes when targeting Latin-1), the tool warns and replaces them with ?.

What’s mojibake and how to fix it

Mojibake is what you see when bytes from one encoding are decoded as another. The classic symptom: café shows up as café, niño as niño, it's as it’s, “hello” as “hello”.

Why it happens: the CSV is actually UTF-8, but the program opening it (often Excel on Windows) assumed Windows-1252. UTF-8 encodes é as two bytes C3 A9. Windows-1252 reads those two bytes individually as à (C3) and © (A9). Hence é instead of é.

Common mojibake patterns to recognize:

You seeIt should beReason
ééUTF-8 read as Win-1252
ññUTF-8 read as Win-1252
üüUTF-8 read as Win-1252
’ (right single quote)UTF-8 read as Win-1252
“ / ” UTF-8 read as Win-1252
€UTF-8 read as Win-1252
? or “Win-1252 read as Latin-1

The fix in this tool:

  1. Upload the broken CSV.
  2. Set Source encoding to Windows-1252 (the lying encoding).
  3. Set Target encoding to UTF-8 with BOM.
  4. Download. The bytes are re-interpreted as Win-1252, then re-encoded as UTF-8. The accents come back correctly.

UTF-8 vs Latin-1 vs Windows-1252 — which to use

UTF-8 is the default. Variable-width (1–4 bytes per character), can represent every Unicode character, ASCII-compatible (every ASCII byte is the same in UTF-8). It’s what the web, Linux, macOS, every modern database, and every modern programming language expects. Pick UTF-8 unless you have a specific reason not to.

Latin-1 (ISO-8859-1) is a fixed 1-byte-per-character encoding covering Western European languages. No , no smart quotes, no em-dashes. Used in older Unix systems, some HTTP headers, and as a “tolerant” decoder because every byte 0–255 maps to a Unicode code point.

Windows-1252 is Microsoft’s superset of Latin-1 — it adds , smart quotes, em-dashes, and similar punctuation in the 0x80–0x9F range that Latin-1 leaves as control characters. It’s what “ANSI” actually means in older Windows software. Most “Latin-1” files from Windows are really Windows-1252.

UTF-16 LE / BE uses 2 bytes minimum per character. Excel’s “Unicode Text” export is UTF-16 LE with BOM. Wastes space for ASCII-heavy data; only use it if a target tool specifically asks for it.

Rule of thumb: store and exchange in UTF-8. Convert at the boundary only if a legacy tool refuses anything else.

The BOM problem

A BOM (byte-order mark) is the 3 bytes EF BB BF at the start of a UTF-8 file. It’s not strictly required — UTF-8 has no byte order to mark — but it acts as a signal to software that the file is UTF-8 and not the local ANSI codepage.

Excel on Windows is the reason this matters. When you double-click a .csv file:

  • No BOM → Excel assumes Windows-1252. Accents and non-ASCII characters break.
  • With BOM → Excel correctly opens it as UTF-8. Accents work.

So if your CSV will be opened by non-technical users in Excel, add the BOM. The converter at the top of this page does this when you pick “UTF-8 with BOM”.

The downside: some older tools (older Python 2, some shell scripts using head) treat the BOM as data, and you’ll see a phantom  at the start. For server-side pipelines and databases, prefer plain UTF-8 (no BOM).

How to convert CSV encoding in Python (pandas)

import pandas as pd

# Read a CSV that's actually Windows-1252, write it back as UTF-8 with BOM
df = pd.read_csv("input.csv", encoding="cp1252")
df.to_csv("output.csv", index=False, encoding="utf-8-sig")  # utf-8-sig adds the BOM

Common encodings in pandas: "utf-8", "utf-8-sig" (UTF-8 with BOM), "cp1252" (Windows-1252), "latin1" (ISO-8859-1), "utf-16", "utf-16-le", "utf-16-be".

If you don’t know the encoding, try chardet for detection:

import chardet
with open("input.csv", "rb") as f:
    guess = chardet.detect(f.read(100_000))
print(guess)  # {'encoding': 'Windows-1252', 'confidence': 0.73, 'language': ''}

How to convert CSV encoding with iconv

iconv is a command-line tool available on macOS, Linux, and via Git Bash / WSL on Windows.

# Latin-1 → UTF-8
iconv -f LATIN1 -t UTF-8 in.csv > out.csv

# Windows-1252 → UTF-8
iconv -f WINDOWS-1252 -t UTF-8 in.csv > out.csv

# UTF-8 → UTF-8 with BOM (prepend the 3 BOM bytes)
printf '\xef\xbb\xbf' > out.csv && iconv -f UTF-8 -t UTF-8 in.csv >> out.csv

# UTF-16 LE → UTF-8 (Excel "Unicode Text" exports)
iconv -f UTF-16LE -t UTF-8 in.csv > out.csv

# Replace unmappable characters instead of erroring
iconv -f UTF-8 -t LATIN1//TRANSLIT in.csv > out.csv

The //TRANSLIT suffix tells iconv to substitute close approximations (ée) instead of failing. //IGNORE silently drops them.

How to convert CSV encoding in Excel

Excel doesn’t have a “convert encoding” command — it has separate “save” and “import” paths.

Saving as UTF-8 (Excel 2016 and later):

  1. File → Save As.
  2. In the format dropdown pick CSV UTF-8 (Comma delimited) (*.csv).
  3. This writes UTF-8 with BOM.

Opening a non-Excel CSV correctly:

  1. Data → From Text/CSV (not File → Open).
  2. In the import dialog, set File Origin to “65001: Unicode (UTF-8)” or whatever matches your file.
  3. Click Load.

This is the only reliable way on older Excel versions. The double-click path is what causes most “Excel broke my CSV” complaints.

How to convert CSV encoding in Google Sheets

Sheets always reads and writes UTF-8 internally, so:

  • File → Import → Upload handles UTF-8 CSVs natively. For Latin-1/Win-1252 files, convert to UTF-8 first (use the converter above).
  • File → Download → Comma-separated values (.csv) exports plain UTF-8 (no BOM).

Common encoding errors and what they mean

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 42 The file contains the byte 0xE9 standalone, which is é in Latin-1/Windows-1252 but is invalid UTF-8 (UTF-8 would encode é as 0xC3 0xA9). The file is not UTF-8 — try encoding="cp1252" in pandas, or set Source to Windows-1252 in the converter above.

UnicodeEncodeError: 'ascii' codec can't encode character 'é' You’re trying to save Unicode text as ASCII. Switch to encoding="utf-8".

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff The file probably starts with 0xFF 0xFE — a UTF-16 LE BOM. Try encoding="utf-16" instead.

charmap codec can't decode byte 0x81 in position N (Python on Windows) Python opened the file with the system default codepage (often cp1252). The byte 0x81 isn’t defined in cp1252. Open the file with encoding="utf-8" or encoding="latin1" (which is byte-identity safe) explicitly.

File opens fine in Notepad but not in Excel Notepad auto-detects encoding; Excel doesn’t (without the BOM). Resave as UTF-8 with BOM.

Privacy: nothing is uploaded

The encoding conversion runs entirely in your browser using the native TextDecoder and TextEncoder APIs — no library, no server. Your CSV never reaches any server, including ours. Verify in DevTools → Network: you’ll see zero requests when you upload, decode, or download. Useful for files containing customer data, internal records, or anything you wouldn’t paste into a public converter.

After fixing the encoding, you might also want to view the CSV, normalize the delimiter, or fix other broken-CSV issues like inconsistent quoting or stray line breaks.

Related tools

Frequently asked questions

  • How do I fix mojibake (café → café) in a CSV?

    Mojibake happens when UTF-8 bytes are read as Windows-1252 (or vice versa). Upload your CSV above, set Source encoding to Windows-1252, set Target to UTF-8 with BOM, and download. The accents will be correct again. The garbled patterns é, ñ, ü, ’, “ are the dead giveaway.

  • What's a BOM and do I need one?

    A BOM (byte-order mark) is a 3-byte prefix (EF BB BF) that signals UTF-8 to software that doesn't auto-detect — most importantly, Microsoft Excel on Windows. If you double-click a UTF-8 CSV in Excel without a BOM, accents look like garbage. Add the BOM and Excel reads it correctly. On Linux/macOS, no BOM is preferred.

  • Which encoding should I use — UTF-8, Latin-1, or Windows-1252?

    Default to UTF-8. It handles every language and is the web/database/Linux standard. Use Latin-1 or Windows-1252 only when an old system specifically requires it (legacy Windows software, some banking/government exports, mainframe interchange). Windows-1252 is a superset of Latin-1 and what most 'ANSI' Windows files actually are.

  • Why does Excel mangle accents in my UTF-8 CSV?

    Excel on Windows assumes legacy ANSI (Windows-1252) when opening a CSV via double-click and there's no BOM. Two fixes: (1) save your CSV as UTF-8 with BOM — the converter does this in one click. (2) Use Excel's Data → From Text/CSV import wizard, which lets you pick UTF-8 explicitly.

  • Will I lose data when converting from UTF-8 to Latin-1?

    Possibly. UTF-8 can represent every Unicode character; Latin-1 can only represent 256. Characters outside Latin-1 (e.g. €, “smart quotes”, em-dashes, Cyrillic, Asian scripts) become ?. The converter shows a warning with the count of unmappable characters before you download.

  • How do I round-trip safely between systems?

    Always store and exchange in UTF-8. Convert only at the boundary (when feeding a legacy importer that requires Win-1252). Re-importing a Latin-1 file as UTF-8 is fine; the reverse can lose data. Keep the original UTF-8 as the source of truth.

  • Does UTF-16 work for CSVs?

    Yes — Excel exports 'Unicode Text' as UTF-16 LE with a BOM, and some Windows tools default to it. UTF-16 doubles the file size for ASCII-heavy data, so UTF-8 is preferred unless a tool specifically needs UTF-16.

  • Is my CSV uploaded?

    No. Decoding and re-encoding happen entirely in your browser using the native TextDecoder/TextEncoder APIs. Your file never reaches a server. Verify in DevTools → Network.