Google Answers Logo
View Question
 
Q: Removing BOM Attribute in UTF-8 text ( No Answer,   2 Comments )
Question  
Subject: Removing BOM Attribute in UTF-8 text
Category: Computers
Asked by: pigskinreferee-ga
List Price: $5.00
Posted: 18 Feb 2006 13:51 PST
Expires: 20 Mar 2006 13:51 PST
Question ID: 447399
I have a FreeBSD 5.4 system. There are several files that have been
created in MS Word and saved in UTF-8 format that I need to convert.
These files have the BOM attribute.

Here are a few links regarding UTF-8:

http://www.unicode.org/faq/utf_bom.html
http://www.cl.cam.ac.uk/~mgk25/unicode.html

I tried writing a simple Perl script to remove this attribute. This is the script.

#!/usr/local/bin/perl
@file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);

It appears to work, but it does not overwrite the file that it is
suppose to be fixing. I am at a lose as to what to do.

In addition, since this script may be included into a larger program,
it really should be able to use the 'use strict' attribute. If I use
it now, the program will fail.

I can supply examples of the files that need to be cleaned of the BOM attribute.
Answer  
There is no answer at this time.

Comments  
Subject: Re: Removing BOM Attribute in UTF-8 text
From: lukas_zapletal-ga on 19 Feb 2006 12:51 PST
 
if you want to use strict, you must make the file variable local:

use strict;

my @file=<>;
$file[0] =~ s/^\xEF\xBB\xBF//;
print(@file);
Subject: Re: Removing BOM Attribute in UTF-8 text
From: lukas_zapletal-ga on 19 Feb 2006 13:04 PST
 
The problem is not in the Perl language, but in the operating system.
Eigher UNIX or Windows do not allow to overwrite opened file. The
diamond operator opens a file (it calls open system call), the
operating system locks the file and it cannot be opened for writing.
No process is allowed to do that, even the process that opened it for
reading.

You need to write a script that will open the file in read-write mode
and "shift" the file content three bytes back. You have to use
open/read/close functions for that and a memory buffer (4096 bytes is
enough).

You can still use your solution, but you need to redirect the output
to a temporary file and then rename it back to original.

Hope this will help.

Important Disclaimer: Answers and comments provided on Google Answers are general information, and are not intended to substitute for informed professional medical, psychiatric, psychological, tax, legal, investment, accounting, or other professional advice. Google does not endorse, and expressly disclaims liability for any product, manufacturer, distributor, service or service provider mentioned or any opinion expressed in answers or comments. Please read carefully the Google Answers Terms of Service.

If you feel that you have found inappropriate content, please let us know by emailing us at answers-support@google.com with the question ID listed above. Thank you.
Search Google Answers for
Google Answers  


Google Home - Answers FAQ - Terms of Service - Privacy Policy