dannyman.toldme.com


FreeBSD, JIRA, Linux, Mac OS X, Technical

Strip Non-Ascii From a File

I have had bad luck trying to coax this out of Google, so here’s a Perl one-liner:

perl -pi -e 's/[\x80-\xEF]//g' file.txt

Where file.txt is a file you want to clean up.

Why this comes up is because we have a web application that was set up to hit a MySQL database, which is incorrectly configured to store text as ASCII instead of UTF-8. The application assumes that all text is Unicode and that the database is correctly configured, and every week or two someone asks me why they are getting this weird gnarly error. Typically they are pasting in some weird UTF-8 whitespace character sent to us from Ukraine.

Eventually the database will be reloaded as UTF-8 and the problem will be solved. Until then, I can tell folks to use the Perl command above. It just looks for anything with the high bit set and strips it out.

Read More

Next:
Previous:
Categories: FreeBSD, JIRA, Linux, Mac OS X, Technical