banana7777 VIP
Total posts: 18
16 Apr 2015 22:35

Hi,

don't know if only I had the problem to import UTF-8 encoded csv files. I was not able to import, if unicode characters are used. So I change the in controllers/import.php the function _load_csv as:

private function _load_csv($filename = '', $delimiter = ',')
{
    $header = NULL;
    if(($handle = fopen($filename, 'r')) !== FALSE)
    {
        while(($row = fgetcsv($handle, 2048, $delimiter)) !== FALSE)
        {
            if(!$header)
            {
                $header = $row;
            }
            else
            {
                $data = array_combine($header, $row);
                $data = array_map("utf8_encode", $data);
                $this->_row($data);
            }
        }
        fclose($handle);
    }
}

Just added the line $data = array_map("utf8_encode", $data);

Best, Dirk

Last Modified: 29 Jun 2016


klox7 VIP
Total posts: 914
17 Apr 2015 09:06

Great, thanks for sharing.


Sergey
Total posts: 13,748
22 Apr 2015 12:52

Updated source.


pepperstreet VIP
Total posts: 3,837
09 Jan 2016 13:07

banana7777 Just added the line $data = array_map("utf8_encode", $data);

Funnily, I have no luck with this extra line!?
I have "norwegian" content with typical characters.
As far as I can tell, the CSV file is already in UTF-8 format. I tried different editors. The characters show-up correctly.
But after import, the characters are "converted" and unreadable. For instance å -> Ã¥

As soon as I remove/comment the line, everything works fine.
Any clues or thoughts?


It seems my file does not need a conversion. At least it is converted/encoded again.
I am far from understanding the utf8_encode documentation, but it seems it might need a check if the file is already in UTF-8 or another format...
(just thinking loud)


Sergey
Total posts: 13,748
12 Jan 2016 09:44

pepperstreet but it seems it might need a check if the file is already in UTF-8

There is not such a PHP function. There is mb_* functions but those are additional extension and not installed with every PHP.


pepperstreet VIP
Total posts: 3,837
12 Jan 2016 11:47

Sergey There is not such a PHP function. There is mb_* functions but those are additional extension and not installed with every PHP.

Ah, interesting. But are you sure? Looks like a standard function? The PHP Manual lists it in Reference:

mb_detect_encoding (PHP 4 >= 4.0.6, PHP 5, PHP 7)


Sergey
Total posts: 13,748
15 Jan 2016 10:38

Php manual lists a lot of different extension functions that are not installed by default. And although MultiByte extension is on most of the installations, but not on all of them. Even iconv was a problem.

Anyway I cannot use it. My code is working taking in account that imported text is saved in UTF8.


pepperstreet VIP
Total posts: 3,837
17 Jan 2016 01:12

Sergey My code is working taking in account that imported text is saved in UTF8.

o.k. but why does it "convert" the norwegian characters?

pepperstreet As far as I can tell, the CSV file is already in UTF-8 format. I tried different editors. The characters show-up correctly. But after import, the characters are "converted" and unreadable. For instance å -> Ã¥


Sergey
Total posts: 13,748
18 Jan 2016 10:30

pepperstreet o.k. but why does it "convert" the norwegian characters?

Because that conversion should not be there. Add this conversion was suggested by only one person. Most probably it is not needed.


pepperstreet VIP
Total posts: 3,837
18 Jan 2016 14:08

Sergey Because that conversion should not be there. And this conversion was suggested by only one person. Most probably it is not needed.

But you have implemented it ;) If it works for some but not all imports, what about adding a menuItem Parameter? So, one could choose to use this "conversion" or not.


Sergey
Total posts: 13,748
25 Jan 2016 14:28

I think I have to delete it. I am not sure. I have to run some tests.


banana7777 VIP
Total posts: 18
02 Feb 2016 21:27

It is fine for me to remove it as I'm not using the standard importer anymore. I implemented my own Excel importer.


pepperstreet VIP
Total posts: 3,837
28 Jun 2016 20:05

Sergey I think I have to delete it. I am not sure. I have to run some tests.

Update with Cobalt 8.707

Tested again with "norwegian" CSV and UTF-8 format. All special characters are "converted" and are not readable anymore :(

I had to remove the previous mentioned line!
Only without the line, norwegian characters are preserved.


Sergey
Total posts: 13,748
29 Jun 2016 05:01

Do you mean array_map? I have deleted it already in latest version which I release today.


pepperstreet VIP
Total posts: 3,837
29 Jun 2016 09:01

Sergey Do you mean array_map? I have deleted it already in latest version which I release today.

Ah, Yes! Thank you.

Powered by Cobalt