Chip's Quips
A tiny spark of wit for a highly flammable world

Jeroaming versions

August 5th, 2006 1:39:44 pm pst by Sterling Camden

Christian Davén has also published an enhanced version of the Jerome’s Keywords plugin for WordPress (thanks for the pointer, Mike). Let’s see what he fixed and how:

  1. Guess what! He tackled the same problem I did regarding the “too general tag search”. Only instead of using multiple LIKE clauses as suggested by Johannes, Christian uses a REGEXP clause. It would be interesting to benchmark the difference. I’ll leave my version the same for now, and let Jerome decide which one gets into the official version.  UPDATE: check out the comments below, where Johannes shares some benchmarks he performed on the two versions — MySQL likes LIKE.
  2. Christian adds the keywords to the categories in the feed. Personally, I don’t equate tags and categories, so I wouldn’t want this option myself — but it might be a nice feature to add to the official version as an option. Another option I’d like to see would be to add the keyword links to the end of the body text, as I have done in my feed (I must confess I hacked the feed source files instead of attempting to make this a plugin as I should — I’m not even sure if that can be done, actually).

That makes at least three people besides Jerome who have contributed to the functionality of this plugin, resulting in three different versions of the plugin itself, an add-on plugin, and a widget. What do you say, Jerome, can we get all these pieces packaged together before they get further split apart? Can we set up some sort of collaborative development project to help you to keep up?

BTW, I added the GNU GPL license file to my download of the plugin as soon as I realized that I had omitted it. Don’t want the FSF Troopers to come knocking on my door.

Posted in Too Oh! | 13 Comments » RSS 2.0

13 Responses to “Jeroaming versions”

  1. LOL – Think we should simply post a good-ol-jeromes-keywords-gold-package.zip and include the changes. I don’t think jerome’s eager to include our enhancements since he’s programming a new version anyway.

    Concerning the performance question: Generally i tend to give the mysql-server as much work as i can find (i’m not talking about triggers ;-) – And REGEXPs are not known to be the fastest solution – Only a very comfortable and flexible one. But that are just my 10 cents – We should have a look ;-)

    greets from Salzburg,

    Johannes

  2. sterling says:

    Guten Tag from Bainbridge Island, Johannes. I was wondering which type of test would be optimized the best in MySQL, but I didn’t take the time to benchmark it. I suppose it would only make a noticeable difference if you had thousands of blog posts, and/or lots of traffic. Most bloggers don’t have those problems yet, but we can always take precautions against our dreams.

  3. Don’t think either that it’s a big difference on a small traffic site. The general concept behind my thought is that i can always move a mysql engine to another server – i can’t move just some work intensive parts of the php code.

    But as you have already stated out: That’s precaution for dreamers ;-)

  4. sterling says:

    Oh, but Johannes, the REGEXP clause is part of the MySQL WHERE clause, so the MySQL engine must be doing the regex pattern matching. I only wonder whether it is an add-on that isn’t well optimized. I also don’t know what versions would support it. I guess you can tell I’m not a MySQL expert, and I’m feeling a little Googlazy today.

  5. [...] Jeroaming versions [...]

  6. oopsi – should have looked into the code – i was assuming that he’d used a php-regexp-modification in the plugin itself, becaused i considered to do that in the first place ;-)

    The REGEXP string comparing operator can be found in mysql version 3.23.4, so i think it can be used safely. A good relative comparison of the two statements could be achieved by using the benchmark function of mysql:

    http://dev.mysql.com/doc/refman/5.0/en/query-speed.html

    If you post the REGEXP clause i’ll test it against the multiple LIKE-clause. Sounds interesting.

    greets from salzburg,

    Johannes

  7. sterling says:

    Here you go, Johannes:

    AND jkeywords_meta.meta_value REGEXP ‘.*(,|^)” . $keyword . “(,|$).*’ ”

    Danke!

  8. Oh ok – that looks relative clear. I tested these two following statements on my local computer (xampp distribution, mysql 4.1.14):

    select benchmark(1000000,
    ‘my,name,is,johannes,and,i,am,testing,this’ like ‘%,i’ or
    ‘my,name,is,johannes,and,i,am,testing,this’ like ‘i,%’ or
    ‘my,name,is,johannes,and,i,am,testing,this’ like ‘%,i,%’ or
    ‘my,name,is,johannes,and,i,am,testing,this’ = ‘i’);

    and

    select benchmark(1000000,
    ‘my,name,is,johannes,and,i,am,testing,this’ REGEXP ‘.*(,|^)i(,|$).*’);

    multiple like statement execution time: 0.75 sec
    regexp statement execution time: 11.39 sec

    didn’t await *such* a difference.

    greets from salzburg,

    Johannes

  9. sterling says:

    OMG – I didn’t expect a 15-fold difference either! But I did kinda suspect that “when in SQL, do as the SQLs do” might be good advice. Looks like your version carries the day!

    Thanks for the research, Johannes!

  10. Hehe – Slap on the head from my associate while discussing the matter :-> There’s an easy way to reduce complexity a little bit more:

    Other (much more slower) machine, all test results again:

    select benchmark(100000,
    ‘my,name,is,johannes,and,i,am,testing,this’ REGEXP ‘.*(,|^)i(,|$).*’);

    10,78 seconds

    select benchmark(100000,
    ‘my,name,is,johannes,and,i,am,testing,this’ like ‘%,i’ or
    ‘my,name,is,johannes,and,i,am,testing,this’ like ‘i,%’ or
    ‘my,name,is,johannes,and,i,am,testing,this’ like ‘%,i,%’ or
    ‘my,name,is,johannes,and,i,am,testing,this’ = ‘i’);

    0,71 seconds

    select benchmark(100000,
    CONCAT(‘,’,'my,name,is,johannes,and,i,am,testing,this’,',’) like ‘%,Kino,%’);

    0,51 seconds

    hehe.

    After discussings things like “how did jerome implement functions like gimme-all-unique-tags and gimme-top-50-tags” i’m not sure if i should seriously alter jerome’s code to store tags in a clean normalized db-form instead of using comma-separated values in wp_postmeta. Things like caching *are* a performance-topic after some hundred articles and some thousand different tags.

    greets from Salzburg,

    Johannes

  11. Mike says:

    Sterling,

    Not sure if you saw this or not, but the beta of Jerome’s Keywords 2.0 is out. http://vapourtrails.ca/2006-08/keywords-20-beta

    Mike

  12. sterling says:

    Mike: yes, I intend to test it soon and see what, if any, changes are needed for the widget. Thanks for the heads up, though.

    Johannes: Looks like Jerome has decided to normalize the data in his version 2.0. But thanks for the research!

  13. jerome says:

    Yes, I never liked the original storage solution (comment-separated keywords in a post meta) but the post meta table is a strange beast. Tag searches and cloud generation are much faster using the new table, especially with a large body of posts and tags. Which you seem to have here. :)

    I would have included both yours and Johannes’ additions, but I didn’t have the energy after finishing the 2.0 version. I’d be more than happy to include them if you can send me a 2.0-compatible version.

Leave a Reply