You typed up your blog in word, pasted it in your email, sent it to me to post, and when I did it looked like alphabet soup. No, it’s not just an encoding problem.
MS Word uses characters from the Windows-1252 character encoding set which are not represented in ASCII or ISO-8859-1. This is often a pain in the button of mine. I mean it pushes my buttons. Special characters include:
- The… ellipsis
- ‘Smart’ “quotes”
- En – dash and em — dash
- Dagger † and double dagger ‡
- And etc (these are just the most common).
The solution? Well, how about the following JavaScript function to replace those characters? (Or better yet, the form I create that uses this function)?
function(text) {
var s = text;
// smart single quotes and apostrophe
s = s.replace(/[\u2018|\u2019|\u201A]/g, "\'");
// smart double quotes
s = s.replace(/[\u201C|\u201D|\u201E]/g, "\"");
// ellipsis
s = s.replace(/\u2026/g, "...");
// dashes
s = s.replace(/[\u2013|\u2014]/g, "-");
// circumflex
s = s.replace(/\u02C6/g, "^");
// open angle bracket
s = s.replace(/\u2039/g, "<");
// close angle bracket
s = s.replace(/\u203A/g, ">");
// spaces
s = s.replace(/[\u02DC|\u00A0]/g, " ");
return s;
I’ll post a link to my Word Stripping form soon!
















