Welcome, Guest
Username: Password: Remember me

TOPIC: UTF-8 generated files

UTF-8 generated files 31 Oct 2013 08:54 #11501

  • Tomaselli
  • Tomaselli's Avatar
  • Offline
  • Elite Member
  • Posts: 293
  • Thank you received: 87
  • Karma: 46
Why not generate UTF-8 files by default on a fresh jcook component?
The administrator has disabled public write access.

UTF-8 generated files 01 Nov 2013 18:39 #11514

  • admin
  • admin's Avatar
  • Offline
  • Administrator
  • Chef
  • Posts: 3711
  • Thank you received: 987
  • Karma: 140
I don't know how to setup this.

When I create the files, I use Joomla JFile class. How can I write in UTF-8 ?
Then I make the archive...
Coding is now a piece of cake
The administrator has disabled public write access.

UTF-8 generated files 02 Nov 2013 19:48 #11525

  • Tomaselli
  • Tomaselli's Avatar
  • Offline
  • Elite Member
  • Posts: 293
  • Thank you received: 87
  • Karma: 46
JFile::write uses the file_put_contents PHP function. even if not clearly explained by the PHP manual the "file_put_contents" automatically detects the encoding of the content to write and automatically setup the file format.
I did few tests and research to understand if it is possible to FORCE JFile::write / file_put_contents to use "UTF-8 without BOM" format
Apparently there is NO WAY (unbelievable!!!) to explicitly setup that format independently by the content to write.
The simplest workaround and trick I found, to create files with format "UTF-8 without BOM" is to add (even as comment) some UTF-8 characters (note: one character is not enough), for example add this string anywhere in the file (as commented line):

؋ 包 ƒ ман лв ¥ ₡ ₱ ¢ ₪ ₹ ﷼ ៛ ₩ ₭ ₮ ₨ ₴ ₫

and magically the file created will be "UTF-8 without BOM".
It's weird but I didn't find any other simple way to generate files with that format! unbelievable!

I'm sure you know the importance to have "UTF-8 without BOM" files instead of ANSI, but just to clarify to the other users, it allow us to use UTF-8 characters inside the file and therefore into the generated webpages.
A common use would be in language files, if we think about chinese, arabic, greek or even italian and french (accented vocals), or NOT-ENGLISH code comments, these would be something common, but there are other cases where the "UTF-8 without BOM" file format is needed.
The administrator has disabled public write access.

UTF-8 generated files 18 Nov 2013 17:05 #11669

  • admin
  • admin's Avatar
  • Offline
  • Administrator
  • Chef
  • Posts: 3711
  • Thank you received: 987
  • Karma: 140
So, let's propose a nice figlet header.
I would like to change the current one... lol
(Nice characters, by the way...)


Well, what about creating the first or last line with such caracters than remove them ?
You mean there is no other php ways ?
One character is not enough ??? how many ?

I really does not have time to search right now, but if you make a proposal, I can implement the solution easily if it is ready.

Very nice catch by the way.
Coding is now a piece of cake
The administrator has disabled public write access.

UTF-8 generated files 18 Nov 2013 17:08 #11670

  • admin
  • admin's Avatar
  • Offline
  • Administrator
  • Chef
  • Posts: 3711
  • Thank you received: 987
  • Karma: 140
Also it is strange because I have a squelleton base empty php file, and all files are created from this one. (or others, INI, XML, CSS...)
Anyway, this empty template file is UTF-8.
But after read / write, the string format has changed.

It goes over my head, I need to study bit more here.
Coding is now a piece of cake
The administrator has disabled public write access.
Time to create page: 0.086 seconds

Get Started