PEAR Text_Diff doesn’t split words on punctuation

The PEAR Text_Diff system’s inline parser has a silly word splitting algorithm: it only defines word boundaries as spaces or newlines (\n).

This causes problems with punctuation. Suppose you are diffing the following two sentences:

The quick cat jumped over the lazy fox.
The quick cat jumped over the lazy dog.

The final rendered output will look like this:

The quick cat jumped over the lazy fox.dog.

Notice how the period is included in the word boundary? That makes messy markup. This comparison is worse:

The quick cat jumped over the lazy fox, who was totally lazy and should be shot.
The quick cat jumped over the lazy fox.

Here’s how PEAR Text_Diff does the diff:

The quick cat jumped over the lazy fox, who was totally lazy and should be shot.fox.

This final diff is difficult to read. You are not deleting and reinserting fox, you are in fact just changing the punctuation on its right. But because the inline diff renderer only considers space and newline as word boundaries, it doesn’t catch this basic punctuation issue.

The fix took me 1.5 hours of PHP code review to figure out the system, but it’s painfully easy to do it. Edit PEAR/Text/Diff/Renderer/inline.php. At lines 158 and 159 (per the online source code), you’ll see " \n" at the end. That is a collection of word boundaries, passed as a mask to the PHP strspn function. Simply add your word boundaries between the quotes, and the diff engine works correctly.

I’ve reported this as PHP PEAR bug 16774.

Gallery 3, Windows 2008 R2, and IIS 7

EDIT: Gallery’s maintainers decline to fully support Gallery 3 on IIS. See http://gallery.menalto.com/node/90281 for more info.

Yes, you can run Gallery 3 on Windows 2008 and IIS 7. Here’s how I did it:

  1. Clean install of Windows 2008 R2 x64. NOTE: These days, 32 bit is pretty ridiculous. The instructions below are only guaranteed to work on x64.
  2. Install the Web Server (IIS) role. I think this will also force a portion of the Application Server role to be installed, too.
  3. Install PHP 5.3. Just go through the default installation steps. I used the latest VC9 x86 Non Thread Safe version from the Windows binary download page.
  4. Install MySql Community Edition for Windows x64. I used default options through the process.
  5. Download phpMyAdmin. Unzip and copy files to C:\inetpub\wwwroot\phpmyadmin.
  6. Visit http://localhost/phpmyadmin, sign in using your MySql’s root account, and create a new database for Gallery 3.
  7. Download Gallery 3. As of this writing, the latest version is beta 2.
  8. Extract files and place in C:\inetpub\wwwroot\gallery3.
  9. If you run Gallery right now, it will squawk about missing some PHP settings that are in its .htaccess file. That file is not read by IIS, so you must implement differently:
    1. Create C:\inetpub\wwwroot\gallery3\.user.ini (more info on .user.ini) and open with a text editor. (Might need to use Notepad launched as administrator because of the protection Windows gives to files in C:\inetpub\.) Yes, you do need the period before user in the filename.
    2. Add these lines:
      short_open_tag    =    1
      magic_quotes_gpc   =   0
      magic_quotes_sybase =  0
      magic_quotes_runtime = 0
      register_globals  =    0
      session.auto_start =   0
      upload_max_filesize =  20M
      post_max_size =      100M
      date.timezone = "America/Chicago"

      Note that the date.timezone is because of an additional problem with Gallery 3’s underlying Kohana framework and PHP 5.3 (link).
  10. Create a new directory at C:\inetpub\wwwroot\gallery3\var. Edit its permissions and give the Users and IIS_IUSRS groups Modify permissions. NOTE WELL: Generally, you should use the principle of least privilege and only give enhanced privileges to the smallest number of users possible, which means not the Users group. I’ll revise in the future if I confirm that only IIS_IUSRS–or even a specific account–is all you need.
  11. Set up mod_rewrite:
    1. Download and install the URL Rewrite Module x64.
    2. In Server Manager, click on Server Manager > Roles > Web Server (IIS) > Internet Information Services (IIS) Manager. To the right, find your gallery3’s directory under your web server under Sites. Click on that directory.
    3. Click URL Rewrite then Import Rules…
    4. Copy the mod_rewrite rules, including the IfModule directives, from the end of Gallery3’s .htaccess file and paste into the Rewrite rules field of the Import mod_rewrite rules screen. Remove the # characters at the beginning of each line; otherwise, they are just code comments.
    5. Delete the line containing RewriteBase. It is not supported, and the rules will not import until that is fixed.
    6. Click Apply on the right hand side.
  12. Now run Gallery 3 setup at http://localhost/gallery3.

Viola, you have Gallery 3 on IIS.

This may seem like a lot of steps, but it’s actually not much different than a setup on Ubuntu. It’s easier than how it used to be with IIS 6 or PHP 5.2. Kudos to Microsoft and The PHP Group for a dramatically easier setup process.

How I got field diffs working with Drupal, PEAR Text_Diff, and Dreamhost

I have a Drupal site where I will propose major changes to a policy document. The site has nodes with current and proposed versions of document sections.

I want auto-generated diffs to make the proposed changes obvious. The diff needs to look like legislation, where deletions are struck through and additions are underlined.

Here’s all the steps to make this work. This assumes you already have a working Drupal install.

1. Drupal Computed Field module

The Computed Field module is a great concept: it executes PHP code to populate a new field with calculations based on other fields or any other data accessible to the PHP engine. Since the module can execute any PHP script, you can actually do anything available to the PHP system or Drupal API upon node save. It doesn’t have to save values to a field.

Computed Field for Drupal 6 has rough edges, however. It has been stuck on beta 1 for 7 months, and its MySql’s longtext field type is broken (I found a workaround).

How to configure the module:

  • Create a new Computed Field type in your node with Store using the database settings below set to varchar with a large enough Data Length to prevent data overflow errors. (This is the workaround to the broken longtext field.)
  • Put this code in the Computed Code field:
    $path = ‘/pathToPear’sParentDirectory/pear/PEAR’;
    set_include_path(get_include_path() . PATH_SEPARATOR . $path);
    require_once ‘PEAR.php’;
    include_once “Text/Diff.php”;
    include_once “Text/Diff/Renderer.php”;
    include_once “Text/Diff/Renderer/inline.php”;

    $diff = &new Text_Diff(‘auto’, array(array($node->field_nameOfOneFieldToCompare[0][‘value’]), array($node->field_nameOfOtherFieldToCompare[0][‘value’])));
    $renderer = &new Text_Diff_Renderer_inline();

    $node_field[0][‘value’] = $renderer->render($diff);

TechRepublic confused me: they got the Text_Diff constructor signature wrong in Compare file contents and render the output with PHP and PEAR. You don’t pass two files, you pass a string and an array. I credit them, however, for pointing me to the inline renderer.

2. Install your own PEAR

Dreamhost’s main PEAR install is out of date. It cannot install up to date  PEAR components such as Text_Diff 1.1.0.

Solution: install your own PEAR.

I used http://pear.php.net/go-pear. Save that page as go-pear.php in a pear directory off your account’s home directory (if you’re not sure, get to it with cd ~ from the command line). Run it from the command line using php -q go-pear.php.

I accepted all defaults.

It will instruct you to fix your php.ini. You may not need to do anything; see the optional section below.

3. Install Text_Diff

As simple as pear install pear/Text_Diff. You may need to prefix the pear executable with the path to your new install so you don’t run Dreamhost’s old install.

OPTIONAL: Override Dreamhost’s PHP configuration

Dreamhost runs PHP in CGI mode. That gives security and usability improvements, but it disallows local php.ini files or the php_value include_path “path statement goes here in the .htaccess file.

To change values in the php.ini, you must either use PHP’s set_include_path or override Dreamhost’s master php.ini.

I chose set_include_path. I probably won’t have many PEAR-dependent computed fields, so this is easy to maintain.

However, if you will use PEAR a lot, you may want to override the php.ini. Use the Custom php.ini across Multiple domains section as it is the most flexible solution.

A pitfall with overriding the php.ini is you won’t get php.ini changes made by Dreamhost. I just checekd, and the last update was only 5 days ago. While I can manage my own php.ini, I use a hosting provider because I’d rather let someone else handle infrastructure and operations.

The result

Field A: The quick brown fox jumped over the lazy dog.

Field B: The red fox is awesome.

Difference (auto generated): The quick brownred fox jumped over the lazy dogis awesome.

Bye, bye 1and1.com

[CORRECTION: I lost no prepaid domain registration time. Dreamhost’s domain transfer requires purchase of a 1 year additional registration on top of existing registration. Existing registration time is retained.]

1and1.com lost my business.

Yesterday, that web host screwed up my hosting package, causing a multi-hour email and web outage.

Being sick of 1and1’s routine incompetence, I already plotted my escape. I changed settings so my domains would no longer auto-renew. I probably had $30-$35 of prepaid domain registration time left with the 6 domains I am keeping, so I figured I would keep them registered at 1and1 and transfer later.

Instead, 1and1 screwed up all my DNS settings and initiated a total package cancellation, causing a major service outage.

This was the last straw, so I expedited my move to Dreamhost.

I am almost running again. Let me know if you got any bounces on emails sent to me.

Even though Dreamhost has a mixed reputation, it can’t be worse than 1and1.com. Some of my web apps run noticably more quickly. And their support staff responded with a coherent answer. Wow!