matcorrao VIP
Total posts: 83
24 Sep 2013 19:53

Hi guys,

How u doing?

This time, I have this site http://clickbeauty.webcoding.com.ar which is in pre-deployment stage due to a problem I'm having with Cobalt, so I really need your help to launch.

As an overview, the site is a directory for Beauty companies located in USA. So the whole point is finding things (using Cobal search features) with an acceptable speed (this is were I'm getting stucked).

During the whole Testing phase we used less than 10 records and everything worked just great.

However, one of the latest steps consisted of importing into Cobalt a big data set of records from a 3rd party provider (which we did with a custom import script)

As a result, we ended with a bit more than 300K records in our Cobalt records table.

By the way, it was a great experience to write such a challenging importing script after checking with Sergey about the implicancies and things to consider.

Before this Import process, pages took less than 10 seconds to load. Now they're taking 10 seconds for the page generation only, after it has been optimized (using JCH Optimize).

**Note: I have other sites with Cobalt working on this same VPS where this site is and they just work fine, so the server is totally discarded.

**

This performance issue is affecting most of Cobalt sections, but specially those related to the type 'Salon' (most of records are of this type).

Here is a link to test this: http://clickbeauty.webcoding.com.ar/hair

In addition, the reindex tool seemed to be broken after this import process: It just won't work with 'Salon' type.

I hope you understand I need to solve this ASAP in order to launch the site and also thought this could be a great chance for you to test Cobalt with big datasets.

Really looking forward to hearing from you sonn!

Hidden text

Thanks,

Matias

Last Modified: 25 Feb 2015



Sackgesicht VIP
Total posts: 1,636
24 Sep 2013 23:13

Matias,

using the new debugger since Joomla 3.1.5 helped me a lot to identify Cobalt SQL queries which were not needed or even unnecessary slow under my existing setup. Especially when using relate fields or a non-multi-type setup. A lot depends on your settings.

You can improve the performance if you don't need the "publish" or "hidden" record feature, filter settings needs to be checked and most of all the Cobalt Relate fields.

In my setting, my biggest section is 60K (overall more than 100K in records table).

After analyzing and optimizing it, my memory consumption decreased and the SQL-query time went down from ~1,600ms to ~60ms.

See here. (I even brought it down a little more)

The more records you have in an individual section, the more critical is the "fine-tuning" of your parameters.


Sergey
Total posts: 13,748
25 Sep 2013 04:29

I am greatly impressed how you did style it. Filters are awesome! I could barely recognize Cobalt at all.

Before this Import process, pages took less than 10 seconds to load

This is way too much. It should be under 1sec for 10 articles installation. Or you mean whole load time?

Anyway let's not count load through browser inspector. Load time is mostly depends on server location, Datacenter speed and so one. It is not affected by Cobalt performance.

I am trying to optimize your site. I have found that query 98 on hair page

SELECT f.id as fid, f.*,g.id as gid, g.title as group_title, g.description AS group_descr, g.icon AS group_icon,( SELECT ordering

FROM sh2it_js_res_fields_group

WHERE id = f.group_id) AS gordering

FROM sh2it_js_res_fields AS f

LEFT JOIN sh2it_js_res_fields_group AS g

ON g.id = f.group_id

WHERE f.type_id = 8

AND f.published = 1

ORDER BY gordering ASC,f.ordering ASC

LIMIT 0, 1000

Is triggered 12 times. I installed eplorer extension changed PHP but I cannot affect it. I mean there is only one place in cobalt which if fields.php model where this query is triggered. But no matter what I do I still have them logged. Although if I echo something, it only shows 2 times one for type 1 and one for type 8.

I am not sure but it looks some sort of customization. What can you say? Please look debug log queries section at very bottom there is section which shows which queries how many times.


Sackgesicht VIP
Total posts: 1,636
25 Sep 2013 16:53

Matias,

your main problem goes down to 3 queries, which accounts for almost all of the time. (8438 out of 8599ms in my test).

You can influence them through your section settings.

Switch everything, what you don't need to "Public". That should already make a difference... :D


matcorrao VIP
Total posts: 83
25 Sep 2013 17:34

Hi Sack,

Actually, it's already set in the way we wanted. Are you suggesting to switch everything back to Public?

I totally agree on these 3 queries that take most of the time.

I guess this is sth you already mentioned in your linked post:

Query #1 --> Get types

Query #2 --> Record Count

Query #3 --> Get Records

Query #2

SELECT r.*,0 as `new`

  FROM sh2it_js_res_record AS r

  WHERE r.section_id = 1 

  AND (r.access IN(1,1,5)) 

  AND r.ctime < '2013-09-25 21:09:24' 

  AND (r.extime = '0000-00-00 00:00:00' OR r.extime > '2013-09-25 21:09:24') 

  AND r.published = 1 

  AND r.hidden = 0 

  AND r.id IN(

...

Query #3

SELECT COUNT(*)

  FROM sh2it_js_res_record AS r

  WHERE r.section_id = 1 

  AND (r.access IN(1,1,5)) 

  AND r.ctime < '2013-09-25 21:09:24' 

  AND (r.extime = '0000-00-00 00:00:00' OR r.extime > '2013-09-25 21:09:24') 

  AND r.published = 1 

  AND r.hidden = 0 

  AND r.id IN(

...

Wouldn't be better to merge Query #2 and #3 and make a count on that programatically?

Thanks,

Matias


Sackgesicht VIP
Total posts: 1,636
25 Sep 2013 17:42

Mathias,

just for testing, set everything to public.

Lets see the queries then. I would even suggest to comment 2 lines out of records.php in the models folder, if you dont use "hidden" and published function. Will make another post for this.

But lets try first ...


Sackgesicht VIP
Total posts: 1,636
25 Sep 2013 17:51

ok .. try to comment the models/records.php

then we have to find out the condition for the inclusion of the list of articles ...


matcorrao VIP
Total posts: 83
25 Sep 2013 17:58

Hi Sack,

Thanks for your quick answer, that was really really useful. Now everything is set back to Pubic.

Guess what? Query time reduced down to ~218ms WWWWOOOOOOWWWW!!!!

Apparently, if I set expired or future articles for sth different than public, the time increases to ~9000ms! That's a lot!

I think I will leave it this way. Now I really need to find what's consuming so much time in the afterDispatch event.

At this point, I was thinking of changing some parts of the core files, do u think it's needed? Because with this improvement I would really like not to touch core files to keep it more 'maintainable'.

Thank you so much for this!


matcorrao VIP
Total posts: 83
25 Sep 2013 18:27

Guys,

I'm still having issues with the reindexing tool. Is there anything I can try with that to make it work?

Thanks again,

Matias


Sackgesicht VIP
Total posts: 1,636
25 Sep 2013 18:37

@mintjoomla

I would suggest to include the hidden and publish exclusion as additional section parameters. It makes only a small difference with small data but it could have an impact depending on other settings as stated already before in other topics.

@matias

I believe, that you can save another 800ms if you track down the problems there. I always wanted to look into this area, but did not find time for it yet.

[offtopic]

2 comments:

Did you notice the pagination break?

and the hoover behaviour per article. Now it sets the bottom line to 0 in template css line 230

in my opinion it looks better if the bottom line stays there to separate the individual records.

Congratulation to your overall integration ... :D


Sackgesicht VIP
Total posts: 1,636
25 Sep 2013 20:44

Matias,

do you intentionally use multiple types in your section 1?

I am just asking, since the type query returns 2 rows.

If your setup would be 1 type only, 500ms could be saved in the preparation for the "SELECT DISTINCT r.type_id" query marked as (1) in fixing the records. The second query can be fixed with the "publish, hidden" fix ...


matcorrao VIP
Total posts: 83
25 Sep 2013 20:58

Hi Sack!

Yes, that's correct, we set a multi-type section, since we're using Salons and Shops in the same listings. This was totally intentionally.

BTW, I changed the design issues according to your feedback, thanks for that!

I don't know why, but Database Query Time, which was about ~300ms, now has grown up to ~8000ms. It looks a bit weird since I haven't changed anything! I even tried modifying the records.php to skip hidden and published but that just double the time.

Any ideas?

Best,

Matias


Sackgesicht VIP
Total posts: 1,636
25 Sep 2013 21:16

the problem query still has the published and hidden condition in it.

The overall performance also depends quite on the initial section sort order. I made some comments to this in another topic.

See your query:

When you are sure about your final setting, it might even be good to change the indices for the res_record table to better fit your purpose.


matcorrao VIP
Total posts: 83
25 Sep 2013 21:17

Ok, I've found that section ordering affects the database query time. It seems that "Most Recent" is the most convenient for performance. "Most hits" isn't that bad, "Title alphabetically" is a bit worse but the worst is random (which in this case seemed to be the most convenient to promote customer interest).


matcorrao VIP
Total posts: 83
25 Sep 2013 21:18

In addition, "Most hits" and "Most recent" seemed to reduce overall performance (afterDispatch event), which was reduced from ~8000ms down to ~1500ms... this is cool :)


Sackgesicht VIP
Total posts: 1,636
25 Sep 2013 22:03

but the worst is random (which in this case seemed to be the most convenient to promote customer interest).

For most of the Cobalt cases, the standard "random" way might be sufficient. Only if you reach records in the thousands, the "random" sort needs a different approach of performance reasons.

See here, especially links and comments. The "correct" pagination will not be an issue. It will just always get the defined number of records from the list of ID's for a page display to be fast.

As of the moment, section settings to "Public", commenting out the "publish" and "hidden" query condition in records.php and setting sort order to "most recent" would give you the best SQL query time ==> ~ 200-300ms

For a better/faster "random" order, you might have to ask Sergey for an additional Sort order implementation.

It might be worth to investigate what happens before the "SELECT DISTINCT" query to see what can be improved there. (I am sure additional 4-500ms could be saved in this area)


Sackgesicht VIP
Total posts: 1,636
25 Sep 2013 22:15


Sergey
Total posts: 13,748
26 Sep 2013 01:31

Wow! That is serious talk you guys have!

@sack I'll add those parameters. For hidden I'll use parameter if user can hide records. And for published I'll create separate section parameter.

@Matias reindex need some refactor. It only forks with limited amount of records. Right now what you can do is to hack tools plugin and select with limit. Change file and run tool again. Until you reach the end.


pepperstreet VIP
Total posts: 3,837
26 Sep 2013 08:27

Following those two geeks ;-) (speaking in a positive sense and meaning)

@sack I'll add those parameters. For hidden I'll use parameter if user can hide records. And for published I'll create separate section parameter.

Cool. I was about to ask for such configuration options.


matcorrao VIP
Total posts: 83
26 Sep 2013 10:28

Hi guys,

Wait wait please! In my case, if I comment the lines to query for hidden and public (as suggested by Sack), the time almost doubles instead of being reduced.

I remember once had a strange experience where the order of the conditions in a query really affected performance. At least in my case, apparently commenting those lines out doesn't help that much.

I played several times with this before making such contradictory statement.

So, it may help in some cases and in others it can affect performance badly.

As Sack pointed out, I was also curious about some blank spaces that ocurrs during some of the SQL queries. Of course, this is due to script processing, but it would be good to know why does it takes so much time (in the worst scenario, it took ~5000ms).

The site is about to launch in a couple of days, so I will remove the debug for a while. If you guys @mintjoomla need to play with this a bit more, we can manage to test it after deployment.

Thanks a lot for all your help!

Thanks,

Matias


matcorrao VIP
Total posts: 83
26 Sep 2013 10:32

Hi guys,

I think I haven't stressed the "Thank you" message enough. I really really appreciate your help, it's great working with such a team behind, please don't mind to ask for help if you're struggling with sth as we were here :)

All the best,

Matias

Powered by Cobalt