Last week I presented on the super cool PECL/mysqlnd_ms PHP extension at the Minnesota PHP User Group. In short, mysqlmd_ms will provide some transparency for PHP web applications to interact with master and slave MySQL database setups.

Here’s my slide deck:

There’s a couple things mysqlnd_ms needs to support before I can use it such as support for single master (i.e. local dev) environments and detection of “dead” MySQL servers so traffic isn’t passed to them over and over again.


I use a MacBook Pro for my day-to-day operations here at CB1, INC. I’m a huge believer that a development environment should mimic the production environment, so I find myself running a couple virtual machines in VMware Fusion.

The following guide is a reference for myself as well as possibly a helpful resource for setting up your own Linux development environment. Here’s an checklist of the tasks to perform and software to install:

Operating System

Start by installing Ubuntu 10.10 Desktop (or server). I’m not going to cover installing Ubuntu since there are already several other resources out there. Once Ubuntu is installed, open a Terminal:

user@ubuntu:~# sudo passwd root
[sudo] password for user: <type your password>
Enter new UNIX password: <type new root password>
Retype new UNIX password: <type new root password again>
passwd: password updated successfully

user@ubuntu:~# sudo apt-get update
user@ubuntu:~# sudo apt-get upgrade

user@ubuntu:~# mkdir ~/src

New File Permissions

user@ubuntu:~# sudo pico /etc/profile

Change 022 to 002. This setting controls the default permissions when a new file or directory is created. This is mostly useful when managing files over Samba.

Network IP Addresses

Optionally, you may want to assign a static IP address. I set up one IP address for Apache and another for nginx.

user@ubuntu:~# sudo pico /etc/network/interfaces

The following is a reference for adding two static IPs. Change the IPs to meet your needs.

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
	address 192.168.1.200
	netmask 255.255.255.0
	gateway 192.168.1.1

auto eth0:1
iface eth0:1 inet static
	address 192.168.1.201
	netmask 255.255.255.0
user@ubuntu:~# sudo /etc/init.d/networking restart

Packages

Here’s a bunch of packages that will set up compilers, version control, Java, MySQL, Apache, PHP, Memcache, Gearman, Samba, and more.

user@ubuntu:~# sudo apt-get install build-essential autotools-dev autoconf \
 autoconf2.13 openssh-server ethtool traceroute openjdk-6-jdk \
 mysql-server-5.1 bzr subversion subversion-tools ntp ntpdate \
 libpcre3-dev libevent-dev automake bison libtool scons  g++ \
 ncurses-dev libreadline-dev libz-dev libssl-dev  libcurl4-openssl-dev \
 ruby rubygems libzip-ruby1.8 libzip-ruby1.9.1 python-dev ruby-dev \
 libdbus-glib-1-dev uuid-dev libpam0g libpam0g-dev gperf samba valgrind \
 libxml2-dev libfreetype6-dev curl libcurl4-openssl-dev \
 libjpeg62-dev libpng12-dev sqlite3 libsqlite3-dev git-core \
 postgresql postgis gearman libgearman-dev php5 \
 libapache2-mod-php5 php5-dev memcached php5-memcached \
 php5-curl php5-gd php5-mysql php5-pgsql php-apc \
 php5-xdebug php5-fpm libapache2-mod-fastcgi

MySQL

During the package install above, MySQL will prompt you for the root password.

After the packages are installed, we need to allow remote MySQL connections.

user@ubuntu:~# sudo pico /etc/mysql/my.cnf

Comment out the bind-address line.

# bind-address          = 127.0.0.1

SSH

Next, you may optionally increase the connection keep alive interval for remote ssh connections. Timeouts aren’t really an issue for SSH’ing into a local VM, but really helps for remote installs.

user@ubuntu:~# sudo echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config

Samba

Samba allows me to drag and drop files between my Mac and Linux VM. I personally do not enable/install Samba on production servers.

user@ubuntu:~# sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.orig
user@ubuntu:~# sudo pico /etc/samba/smb.conf

You can add a share such as the following:

[ubuntu]
        force user = <your username>
        writeable = yes
        create mode = 644
        path = /home/<your username>
        directory mode = 755
        force group = <your username>

Then create yourself a Samba user:

user@ubuntu:~# sudo smbpasswd -a <your username>

Apache 2

Apache is mostly configured out of the box, but I like to enable rewrite and SSL so I can test production features.

user@ubuntu:~# sudo a2enmod rewrite
user@ubuntu:~# sudo a2enmod ssl

Since I’m going to run Apache and nginx, I’m going bind Apache to eth0.

user@ubuntu:~# sudo pico /etc/apache2/ports.conf
NameVirtualHost 192.168.1.200:80
Listen 192.168.1.200:80

<IfModule mod_ssl.c>
    Listen 192.168.1.200:443
</IfModule>

Now we need to add eth0‘s IP to the default host:

user@ubuntu:~# sudo pico /etc/apache2/sites-enabled/000-default
<VirtualHost 192.168.1.200:80>
        ServerAdmin webmaster@localhost

        DocumentRoot /var/www
        <Directory />
                Options FollowSymLinks
                AllowOverride None
        </Directory>
        <Directory /var/www/>
                Options Indexes FollowSymLinks MultiViews
                AllowOverride None
                Order allow,deny
                allow from all
        </Directory>

        ErrorLog ${APACHE_LOG_DIR}/error.log
        LogLevel warn
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Restart Apache for the changes to take effect.

user@ubuntu:~# sudo apache2ctl restart

Gearman

By default, Gearman uses memory to store pending jobs in the queue, but I prefer to use MySQL for persistent storage. To do this, first create the queue database and table:

user@ubuntu:~# mysqladmin -uroot -p123123 create gearman
user@ubuntu:~# mysql -uroot -p123123 -e "CREATE TABLE gearman.gearman_queue (
  unique_key VARCHAR(64) NOT NULL,
  function_name VARCHAR(255) NULL,
  priority INT NULL,
  data LONGBLOB NULL,
  PRIMARY KEY (unique_key)
) ENGINE = InnoDB;"

Next update the init script to tell Gearman to use the database:

user@ubuntu:~# sudo mv /etc/default/gearman-job-server /etc/default/gearman-job-server.bak
user@ubuntu:~# sudo echo "PARAMS=\"-q libdrizzle --libdrizzle-host=127.0.0.1" \
   "--libdrizzle-user=root --libdrizzle-password=123123 --libdrizzle-db=gearman" \
   "--libdrizzle-table=gearman_queue --libdrizzle-mysql\"" > /etc/default/gearman-job-server
user@ubuntu:~# sudo /etc/init.d/gearman-job-server restart

Gearman PHP Extension

We need to download and install the Gearman PHP extension if we want to write PHP workers or post jobs to the queue.

user@ubuntu:~# cd ~/src
user@ubuntu:~/src# wget http://pecl.php.net/get/gearman-0.7.0.tgz
user@ubuntu:~/src# tar xzf gearman-0.7.0.tgz
user@ubuntu:~/src# rm gearman-0.7.0.tgz package.xml
user@ubuntu:~/src# cd gearman-0.7.0
user@ubuntu:~/src# phpize
user@ubuntu:~/src# ./configure
user@ubuntu:~/src# make
user@ubuntu:~/src# sudo make install

Next, add the config file to load the Gearman PHP extension:

user@ubuntu:~# sudo echo "extension=gearman.so" >> /etc/php5/conf.d/gearman.ini

memcached PHP Extension

Since we have memcached and the memcached PHP extension install, let’s use it for storing session data:

user@ubuntu:~/src# sudo echo "session.save_handler = memcached
session.save_path = \"127.0.0.1:11211\"" >> /etc/php5/conf.d/memcached.ini

nginx

nginx is web server that is really fast. I use nginx as my primary development web server unless I’m running a web app that only works with Apache. You can choose to install nginx from package, but I like to live life on the bleeding edge, so I’ll be building nginx from source. To install nginx, we need to download the source, compile it, install it, and configure it.

user@ubuntu:~# cd ~/src
user@ubuntu:~/src# wget http://nginx.org/download/nginx-0.8.52.tar.gz
user@ubuntu:~/src# tar xzf nginx-0.8.52.tar.gz
user@ubuntu:~/src# rm nginx-0.8.52.tar.gz
user@ubuntu:~/src# cd nginx-0.8.52
user@ubuntu:~/src# mkdir /var/lib/nginx
user@ubuntu:~/src# ./configure \
    --sbin-path=/usr/sbin \
    --conf-path=/etc/nginx/nginx.conf \
    --error-log-path=/var/log/nginx/error.log \
    --pid-path=/var/run/nginx.pid \
    --lock-path=/var/lock/nginx.lock \
    --http-log-path=/var/log/nginx/access.log \
    --http-client-body-temp-path=/var/lib/nginx/body \
    --http-proxy-temp-path=/var/lib/nginx/proxy \
    --http-fastcgi-temp-path=/var/lib/nginx/fastcgi \
    --http-uwsgi-temp-path=/var/lib/nginx/uwsgi \
    --http-scgi-temp-path=/var/lib/nginx/scgi \
    --with-http_stub_status_module
user@ubuntu:~/src# make
user@ubuntu:~/src# sudo make install

user@ubuntu:~# sudo pico /etc/init.d/nginx

Here’s the init script that will start nginx for us:

#! /bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/nginx
NAME=nginx
DESC=nginx
test -x $DAEMON || exit 0
case "$1" in
  start)
        echo -n "Starting $DESC: "
        start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
        echo "$NAME."
        ;;
  stop)
        echo -n "Stopping $DESC: "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        echo "$NAME."
        ;;
  restart|force-reload)
        echo -n "Restarting $DESC: "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        sleep 1
        start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
        echo "$NAME."
        ;;
  reload)
        echo -n "Reloading $DESC configuration: "
        start-stop-daemon --stop --signal HUP --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        echo "$NAME."
        ;;
  *)
        echo "Usage: /etc/init.d/$NAME {start|stop|restart|reload|force-reload}" >&2
        exit 1
        ;;
esac
exit 0

Now we need to make the init script executable and enable it:

user@ubuntu:~# sudo chmod +x /etc/init.d/nginx
user@ubuntu:~# sudo update-rc.d nginx defaults

user@ubuntu:~# sudo pico /etc/nginx/nginx.conf

Here’s a starter nginx.conf with some basic settings:

user  www-data www-data;
worker_processes  2;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile                on;
    tcp_nodelay             on;
    tcp_nopush              on;
    keepalive_timeout       65;
    server_name_in_redirect off;
    server_tokens           off;

    add_header Strict-Transport-Security max-age=1800;
    add_header X-Frame-Options deny;

    gzip            on;
    gzip_buffers    16 8k;
    gzip_comp_level 9;
    gzip_types      text/plain text/xml application/x-javascript text/css;

    include /etc/nginx/sites/*;
}
user@ubuntu:~# sudo mkdir /etc/nginx/sites
user@ubuntu:~# sudo pico /etc/nginx/sites/default

Now we need to set up a default host that supports PHP (via PHP-FPM, PHP’s FastCGI Process Manager) and we want the default host to use the eth0:1 IP address:

server {
    listen       192.168.1.201:80 default;
    server_name  _;
    root   /var/www;
    index  index.php;
    location / {
        if (!-e $request_filename) {
            rewrite ^/(.*)$ /index.php?q=$1 last;
            break;
        }
    }
    location ~ \.php$ {
        fastcgi_pass   127.0.0.1:9000;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME  /var/www$fastcgi_script_name;
        include        fastcgi_params;
    }
    location ~* (\.(htaccess|engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)|code-style\.pl|Entries.*|Repository|Root|Tag|Template)$ {
        deny all;
    }
}

After the config files are good to go, start nginx:

user@ubuntu:~# sudo /etc/init.d/nginx start

Service Names

I also like to add service names so I can see what ports are in use when I run netstat. I added drizzle and Cassandra for fun despite this post not including them.

user@ubuntu:~# sudo cp /etc/services /etc/services.bak
user@ubuntu:~# su
root@ubuntu:~# echo "drizzle     4427/tcp
drizzle     4427/udp
memcached   11211/tcp
memcached   11211/udp
gearmand    4730/tcp
gearmand    4730/udp
fastcgi     9000/tcp
cassandra   9160/tcp" >> /etc/services
root@ubuntu:~# exit

Android SDK

The Android SDK is unfortunately not in package, so you’ll need to download it from the Android Developer site: http://developer.android.com/sdk/index.html.

user@ubuntu:~# wget http://dl.google.com/android/android-sdk_r07-linux_x86.tgz
user@ubuntu:~# tar xzf android-sdk_r07-linux_x86.tgz
user@ubuntu:~# rm android-sdk_r07-linux_x86.tgz
user@ubuntu:~# sudo mv android-sdk-linux_x86 /usr/local
user@ubuntu:~# sudo find /usr/local/android-sdk-linux_x86 -type d -exec chmod 777 {} \;

You’ll need to add the Android SDK path near the top of your ~/.bash_profile or ~/.bashrc:

export PATH=${PATH}:/usr/local/android-sdk-linux_x86/tools

To manage your Android SDK packages and virtual devices, you’ll need to run the android app:

user@ubuntu:~# android

First go to Available Packages and download version 1.6 and 2.2 Android SDK packages. You can also choose to download the documentation, samples, and Google APIs.

Downloading the package may take several minutes. You don’t have to create a virtual device right now if you are planning on installing Appcelerator’s Titanium platform. You can exit the Android app when you’re done.

Desktop Apps

If you’re running Ubuntu Desktop, there are a couple handy apps I install. The first is Google Chrome and can be directly downloaded from the Google Chrome download page.

I find KCachegrind and GHex to be useful:

user@ubuntu:~# sudo apt-get install kcachegrind ghex

Appcelerator Titanium

Titanium is an awesome platform for developing desktop applications for Linux, Mac OS X, and Windows as well as mobile apps for iPhone and Android. We use Titanium Developer to create Titanium projects. Begin by downloading the 64-bit version of Titanium:

user@ubuntu:~# wget -O titanium.tgz http://www.appcelerator.com/download-linux64

There’s also a 32-bit version available at http://www.appcelerator.com/download-linux32.

Next we unpack Titanium Developer and move it to a safe place:

user@ubuntu:~# tar xzf titanium.tgz
user@ubuntu:~# rm titanium.tgz

Next you need to run the installer by double-clicking the Titanium Developer executable. Run the executable and then click the Install button. You can try installing to /opt/titanium, but you might need root privileges.

Next, there are a few issues with outdated libraries, so we simply delete them:

user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgobject-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libglib-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgio-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgthread-2.0.*

Titanium Developer also complains if /bin/java doesn’t exist, so create a quick link:

user@ubuntu:~# sudo ln -s /usr/bin/java /bin/java

Relaunch Titanium Developer and enter your login credentials. If you don’t have a login, you can get a free account.

After signing in, you may notice there are some updates available in the upper right corner of the window. Click in the box and the updates will be downloaded and installed.

Optionally you can create a launcher icon for your GNOME panel. Don’t forget to escape spaces in the command with a backslash!

Finishing Touches

Lastly, I like to re-arrange my desktop to maximize my coding real estate.

Conclusion

That should get you up and running with a neato dev environment. If you need to run SSL, I wrote a post on Creating Self-Signed Certs on Apache 2.2 and Virtual Hosts and Wildcard SSL Certificates with Apache 2.2.

If you find any typos or additions, please feel free to sound off in the comments!


Tonight I gave a talk about Cassandra at the Minnesota PHP User Group. I would hope that everyone that came out learned a little something. I have to admit the talk was a bit unorganized, but hey, I love to talk. It also didn’t help that there was major construction going on at the meeting venue that made it hard for people to hear me.

After rambling for an hour and a half, I finally ran out of Cassandra-related stuff to talk about. Since some people were interested in some new features of PHP 5.3, I showed off my MVC framework I’ve been working on called Elevate. In Elevate’s code, I use some of PHP 5.3′s new features such as closures, namespaces, and the ternary operator (?:). I also showed Elevate’s super cool Gearman worker daemon that I used for sending e-mails.


This Thursday, May 6th, at 6pm I’ll be giving a talk about Cassandra at the Minnesota PHP User Group.

Cassandra

I’ll be covering Cassandra’s architecture and how it scales to handle tons of data and still be fast and fault tolerant. I’ll also cover Cassandra’s API and data modeling, specifically with PHP.

The user group will meet at Nerdery Interactive Labs (9555 James Ave S, Suite 245, Bloomington, MN 55431). It will surely be packed with tons of awesome, so be sure to RSVP!


Keynotes

The keynote was kick started by Marten Mickos.  If you’ve never met Marten, he is, on a personal note, one of the greatest CEOs I’ve ever met.  The keynotes were especially interesting for me because it was the first time I’ve had the opportunity to listen to Jonathan Schwartz, the CEO of Sun Microsystems.  Jonathan seems like a great guy who gives the impression he "gets it".

The last keynote was by Werner Vogels of Amazon.  His talk covered Amazon’s growth and the new services they offer including EC2.  He announced that EC2 now supports persistent storage, which is a huge improvement, but doesn’t quite solve all of the problems.

Testing PHP/MySQL Applications with PHPUnit/DbUnit

I’ve never been big into testing, but I’m trying to change that.  Sebastian Bergmann, the author of PHPUnit Pocket Reference (free online version), talked about PHPUnit and DbUnit and why I should use them.  Installing PHPUnit is extremely simple if you have pear installed:

pear channel-discover pear.phpunit.de
pear install phpunit/PHPUnit

Once installed, just require PHPUnit:

// php
require_once 'PHPUnit/Framework.php';

He just scratched the surface on writing unit tests. One thing he pointed out was using CruiseControl for automated testing. What’s really cool is you can fire off CruiseControl from Subversion commit hooks. If the testing fails, CruiseControl can send an email with the results and who is to blame.

Practical MySQL for Web Applications

Domas Mituzas of MySQL and Wikipedia fame gave a good talk that covered practical design of web applications. The talk covered simple stuff, so I didn’t learn a whole lot. Nevertheless, Domas sometimes says some funny things that make the talk enjoyable.

EXPLAIN Demystified

Baron Schwartz gave a talk about the EXPLAIN statement. EXPLAIN is run by prepending the word EXPLAIN to your SELECT statements. It only works on SELECT statements. When the query is run, it outputs an execution plan.

After running through the output of the EXPLAIN statement, he showed us mk-visual-explain which is one of the tools in Maatkit. It is a neato command line tool that takes the EXPLAIN output and reformats it as a tree structure. It’s a great way to visualize the execution plan. Now if only there was a GUI version…

Upgrading to Elegant Versatile Database Architecture using PHP5 Data Objects

This talk was given by Sigurd Magnusson of SilverStripe and covered PDO. I already researched and used PDO, so it was mostly review.

After talking to some of the other people at the conference, I’ve been seriously thinking of moving away from PDO and using MySQL specific functions because they expose some *really* cool debugging and profiling information.

Exploring Amazon EC2 for Scale-out Applications

The thought of EC2 sounds really cool. The ability to create a server instance and host your stuff on it within minutes is sweet. Need more servers, no problem, add another instance. The speakers, Morgan Tocker of MySQL and Carl Mercier of Defensio, talked about their experience with EC2.

There are some serious data and management issues. Until the other day, there wasn’t any kind of persistent storage, meaning when the server went offline, you lost all your data. Now you can mount a drive that persists across restarts. But one issue for critical business transactions is how and when data is written to disk. Is the data written immediately to disk or is buffered in the kernel or in some RAID card’s cache?

Another issue they ran into is when a new machine is created, there’s remnants of the previous machine’s instance’s data. So they need to zero out the drive which takes 5 hours on single instance.

What I took away from the talk is EC2 is great if your app is simple and relies on 3rd party services (i.e. Facebook, Google, etc) that are more reliable than EC2.

Service Oriented Architecture with PHP and MySQL

Joe Stump, a PHP hacker at Digg, gave a talk about SOA. It wasn’t as much about “web services” as it is managing tasks and processing them asynchronously.

After talking to Joe, he highly recommended Gearman to manage tasks. From the Gearman site: “Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.”

So, if a user uploads an image, you can add the task of resizing the image to a backend processing mechanism. This allows for a responsive front-end for the user.

Joe, along with Chris Goffinet, are working on netgearman which is a PEAR package for interfacing with Gearman.

Memcached Hackathon BOF

This was a birds of a feature session where a bunch of people informally got together to discuss all things memcached. Patrick Galbraith of Grazr showed off Memcached Functions for MySQL. This is super cool. It allows you to set and get data in memcached within your SQL code via user defined functions.

So instead of pulling data from the DB to the app, then pushing it to memcached, you can just have a trigger or stored procedure store the value directly to memcached. One caveat is when you rollback a transaction, it won’t unset the value from memcached.

There was some discussion about the memcached MySQL storage engine. After listening to them discuss it, I have to wonder if it is really worth it. It acts like a distributed memory table, except when a server in a cluster goes down, it will affect all the other servers.


Here at the MySQL Conference and Expo, Laura Thomson gave a great talk about Scalability and Performance Best Practices.

She had some interesting points about scalability. She basically said that no matter what language you write your web application, whether it’s compiled (C/C++) or interpreted (PHP, Java), you are subject to scalability issues. Another potential problem is optimizing before you know what exactly to optimize. This can lead to a loss of time that could have been spent on more important things.

Laura’s talk covers three types of best practices: general, scalability, and performance.

General Best Practices

First tip is to profile early and profile often. The earlier you can detect poor performance, the easier it is to fix. There are a handful of tools (APD, Xdebug, Zend) that can help with profiling. Use system profiling tools such as strace, dtrace, and ltrace to gather more information.

There are two types of effective profiling: debugging and habitual. Debugging profiling is about spotting deviations from the norm and habitual profiling is making the norm better. Profiling is an art and requires lots of practice to know where to look.

It is essential that the IT admins and the developers cooperate. This allows crisis’ to be handled properly, especially in production environments were outages are time critical. Team members should report alert the developers of any abnormal behavior changes after a new code release is pushed. Before pushing new code, schedule a launch window and procedures for having developers fix problems and possibly falling back to a previous version. Avoid pushing releases on Fridays, otherwise key team members may be unavailable or over worked on the weekend.

It is recommended to test your application with production data. Test data may not take into account certain scenarios where bugs can be introduced. It is advised to have a staging environment which uses production data and also undergoes simulated load testing.

In order for you to track your application’s performance, you should record your applications performance over a period of time, then analyze the data to find potential issues. There are several means of tracking performance including access logs, system metrics, application profiling, and query profiling.

When a problem occurs, don’t make assumptions. The problem may be caused by something other than what you think it might be.

Scalability Best Practices

When the web application begins to suffer performance issues, start to decouple and isolate components to track down the source. If you need to tweak code, spend only enough time to refactor as needed. Reduce load on servers by moving static content on to dedicated servers.

By default, PHP stores session data on the hard drive. This can cause performance issues and can benefit from storing the session data in a database or better yet in a distributed cache such as memcached.

The most important thing you can do to improve performance is to cache as much data as possible. There are many levels of caching. You can cache data sets or precomputed fragments. For things like images, you can set up dedicated services for caching and serving static content. The usual suspects are recommended for caching (APC, memcached, Squid).

PHP out of the box does not cache compiled pages. That means that ever request, each page has to be parsed and executed. Extensions such as APC and Zend can cache the compiled pages for an immediate speed improvement.

MySQL’s query cache works, but isn’t necessarily implemented the best way. If you query a table, the results are cached on the MySQL server. If a row is inserted, updated, or deleted, the entire cache is flushed. This supposedly has been fixed in MySQL 5.1, but a setting has to be set to not flush the cache.

To scale, your data can be federated across multiple MySQL servers. There can be complications with regards to data reliability and table joins can suffer from major speed hits.

A more reliable way to scale is to use replication. Replication does suffer from “slave lag” issues. The reason the lag can be high is because the master server uses multiple threads to store the data locally. The slave server has to process the replicated items in a single threaded mode to ensure the order of which the transactions is preserved. You can display the status of MySQL’s I/O and SQL thread by executing a SHOW PROCESSLIST statement.

The more database writes, the greater the lag. Depending on your application, you may only want to use replication for failover or backups.

Sometimes you may benefit from designing your application to avoid situations where data is hard to scale and can’t easily be cached.

Performance Best Practices

You definitely want to use a compiler cache. As described above, use APC or Zend for PHP5. If you are connecting to an external data source, perhaps a web service or data feed, minimize the number of instances you request them. Cache their response if possible. You may be able to load the data dynamically using Javascript and a little Ajax magic. Maybe the data isn’t a must have or maybe you can have a page dedicated to display the 3rd party data.

When tuning your applications performance, change one thing at a time. If you change more than one thing, how do you know which change caused the improvement and how do you know if you didn’t introduce new bugs changing the other things. Use MySQL’s EXPLAIN statement to profile your queries and enable the slow query logging. Use MyTop or InnoTop to help profile your queries.

It is crucial that your database is properly indexed. If a table has poorly designed indexes or perhaps too many indexes. Use the smallest data type possible and try to design your tables to be fixed width. That means, use char instead of varchar, set the length of your fields to logical lengths (ie use 128 chars instead of 107 chars). De-normalize when necessary. Remove static data out of the database or store it in a MEMORY table. Use the appropriate storage engine for each table.

For your queries, minimize the number of queries and cache them outside the database when possible.

She claims that deeply recursive code is expensive in PHP. Make sure you are not doing unnecessary looping. If you find that you are, chances are you are doing something wrong and that there is a better idiom for performing the task.

Don’t try to work around or re-write perceived inefficiencies in PHP. Use regular expressions to do intense string manipulation. instead of writing complex serialization code, use PHP’s extensions to do the heavy lifting. Before spending time to write some boilerplate function, check to see if there are any extensions that exist that could help save you time.

Laura gave this talk on behalf of George Schlossnagle. George’s original presentation can be found at http://omniti.com/~george/talks. This was an excellent session and proved to be beneficial.


PHP Performance and Security

Apr 24, 2007

Today, Rasmus Lerdorf, the man who kick started PHP, gave a great talk about PHP Performance and Security.

He began by talking about the new MySQL native driver. He did some benchmarks in which it appeared that the driver offered little performance improvements.

The first tool to use to track down performance issues is Callgrind that runs on top of Valgrind. Callgrind dumps a file that can be opened with KCachegrind, which is available on the Callgrind site.

Performance can be improved by installing APC: Alternative PHP Cache. Next he recommends installing the Xdebug extension to profile your PHP application and find performance issues. Caching is key.

For security, he described a scenario where you can browse to a website and the website lists a bunch of links to various websites. With Javascript, the site can detect if the link has been visited. If it has, he could have the Javascript check those pages to see if your session is still valid and then cause problems such as transfer money or open your firewall. You can reduce the surface area of these kinds of attacks by passing all inputs through filters. Each form field, URL, cookie, whatever must go through a filter to escape potential problems.

Next he covers attacks by passing extra stuff in the URL. Even escaping URL parameters with htmlspecialchars() doesn’t protect you from characters that are already escape that are evaluated in the browser to do harmful things.

He talked about a spoofing trick where older versions of the Flash plugin in Internet Explorer can add attributes to the request header such as the domain. When providing links to download PDF files, the URL can include Javascript code that is executed when the PDF plugin is loaded. This can be prevented by setting the mime type for .pdf to application/octet-stream which forces PDFs to be downloaded.

Cross-site request forgery is another huge problem. By adding a hidden input field with some sorts of a session token, or “crumb” as he calls it, in combination with your session cookie, can be used to verify the request is valid.

Rasmus has made his presentation available online at http://talks.php.net/show/mysql07.