This is just a post for others who have the same problem and find my solution through google and more to the point, for my reference in future!!
I’ve had this problem numerous times before, as you do when you mess around with datasets from numerous sources… The quality differs and data formats can cause headaches.
Anyway – I was trying to import data from Amazon into perl, and this had worked fine before, but a particular product description didn’t want to import – instead coming up with the errors;
Wide character in print
Wide character in subroutine
All I want to do is get the text in unicode and convert it, preferably keeping as much ‘quality’ as possible, but ultimately the data should be in 7-bit ASCII.
I searched the web for a quick-and-dirty answer, but to no avail; countless people are out there telling you that it’s unicode and blah blah – there was one useful guy out there suggesting to use the binmode operator; but the downside is that I wasn’t actually writing the data to a file; I was repackaging it as XML data; binmode will only set the file writing mode.
Anyway – to cut a long story short – the best way around this – ie. to process unicode into being ‘just ASCII’ is to use the Text::Unidecode module and then use unidecode on any problem variables that are unicode.
Hope that helps someone and saves a bit of time!