Last week I presented on the super cool PECL/mysqlnd_ms PHP extension at the Minnesota PHP User Group. In short, mysqlmd_ms will provide some transparency for PHP web applications to interact with master and slave MySQL database setups.

Here’s my slide deck:

There’s a couple things mysqlnd_ms needs to support before I can use it such as support for single master (i.e. local dev) environments and detection of “dead” MySQL servers so traffic isn’t passed to them over and over again.


I use a MacBook Pro for my day-to-day operations here at CB1, INC. I’m a huge believer that a development environment should mimic the production environment, so I find myself running a couple virtual machines in VMware Fusion.

The following guide is a reference for myself as well as possibly a helpful resource for setting up your own Linux development environment. Here’s an checklist of the tasks to perform and software to install:

Operating System

Start by installing Ubuntu 10.10 Desktop (or server). I’m not going to cover installing Ubuntu since there are already several other resources out there. Once Ubuntu is installed, open a Terminal:

user@ubuntu:~# sudo passwd root
[sudo] password for user: <type your password>
Enter new UNIX password: <type new root password>
Retype new UNIX password: <type new root password again>
passwd: password updated successfully

user@ubuntu:~# sudo apt-get update
user@ubuntu:~# sudo apt-get upgrade

user@ubuntu:~# mkdir ~/src

New File Permissions

user@ubuntu:~# sudo pico /etc/profile

Change 022 to 002. This setting controls the default permissions when a new file or directory is created. This is mostly useful when managing files over Samba.

Network IP Addresses

Optionally, you may want to assign a static IP address. I set up one IP address for Apache and another for nginx.

user@ubuntu:~# sudo pico /etc/network/interfaces

The following is a reference for adding two static IPs. Change the IPs to meet your needs.

auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
	address 192.168.1.200
	netmask 255.255.255.0
	gateway 192.168.1.1

auto eth0:1
iface eth0:1 inet static
	address 192.168.1.201
	netmask 255.255.255.0
user@ubuntu:~# sudo /etc/init.d/networking restart

Packages

Here’s a bunch of packages that will set up compilers, version control, Java, MySQL, Apache, PHP, Memcache, Gearman, Samba, and more.

user@ubuntu:~# sudo apt-get install build-essential autotools-dev autoconf \
 autoconf2.13 openssh-server ethtool traceroute openjdk-6-jdk \
 mysql-server-5.1 bzr subversion subversion-tools ntp ntpdate \
 libpcre3-dev libevent-dev automake bison libtool scons  g++ \
 ncurses-dev libreadline-dev libz-dev libssl-dev  libcurl4-openssl-dev \
 ruby rubygems libzip-ruby1.8 libzip-ruby1.9.1 python-dev ruby-dev \
 libdbus-glib-1-dev uuid-dev libpam0g libpam0g-dev gperf samba valgrind \
 libxml2-dev libfreetype6-dev curl libcurl4-openssl-dev \
 libjpeg62-dev libpng12-dev sqlite3 libsqlite3-dev git-core \
 postgresql postgis gearman libgearman-dev php5 \
 libapache2-mod-php5 php5-dev memcached php5-memcached \
 php5-curl php5-gd php5-mysql php5-pgsql php-apc \
 php5-xdebug php5-fpm libapache2-mod-fastcgi

MySQL

During the package install above, MySQL will prompt you for the root password.

After the packages are installed, we need to allow remote MySQL connections.

user@ubuntu:~# sudo pico /etc/mysql/my.cnf

Comment out the bind-address line.

# bind-address          = 127.0.0.1

SSH

Next, you may optionally increase the connection keep alive interval for remote ssh connections. Timeouts aren’t really an issue for SSH’ing into a local VM, but really helps for remote installs.

user@ubuntu:~# sudo echo "ClientAliveInterval 60" >> /etc/ssh/sshd_config

Samba

Samba allows me to drag and drop files between my Mac and Linux VM. I personally do not enable/install Samba on production servers.

user@ubuntu:~# sudo cp /etc/samba/smb.conf /etc/samba/smb.conf.orig
user@ubuntu:~# sudo pico /etc/samba/smb.conf

You can add a share such as the following:

[ubuntu]
        force user = <your username>
        writeable = yes
        create mode = 644
        path = /home/<your username>
        directory mode = 755
        force group = <your username>

Then create yourself a Samba user:

user@ubuntu:~# sudo smbpasswd -a <your username>

Apache 2

Apache is mostly configured out of the box, but I like to enable rewrite and SSL so I can test production features.

user@ubuntu:~# sudo a2enmod rewrite
user@ubuntu:~# sudo a2enmod ssl

Since I’m going to run Apache and nginx, I’m going bind Apache to eth0.

user@ubuntu:~# sudo pico /etc/apache2/ports.conf
NameVirtualHost 192.168.1.200:80
Listen 192.168.1.200:80

<IfModule mod_ssl.c>
    Listen 192.168.1.200:443
</IfModule>

Now we need to add eth0‘s IP to the default host:

user@ubuntu:~# sudo pico /etc/apache2/sites-enabled/000-default
<VirtualHost 192.168.1.200:80>
        ServerAdmin webmaster@localhost

        DocumentRoot /var/www
        <Directory />
                Options FollowSymLinks
                AllowOverride None
        </Directory>
        <Directory /var/www/>
                Options Indexes FollowSymLinks MultiViews
                AllowOverride None
                Order allow,deny
                allow from all
        </Directory>

        ErrorLog ${APACHE_LOG_DIR}/error.log
        LogLevel warn
        CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>

Restart Apache for the changes to take effect.

user@ubuntu:~# sudo apache2ctl restart

Gearman

By default, Gearman uses memory to store pending jobs in the queue, but I prefer to use MySQL for persistent storage. To do this, first create the queue database and table:

user@ubuntu:~# mysqladmin -uroot -p123123 create gearman
user@ubuntu:~# mysql -uroot -p123123 -e "CREATE TABLE gearman.gearman_queue (
  unique_key VARCHAR(64) NOT NULL,
  function_name VARCHAR(255) NULL,
  priority INT NULL,
  data LONGBLOB NULL,
  PRIMARY KEY (unique_key)
) ENGINE = InnoDB;"

Next update the init script to tell Gearman to use the database:

user@ubuntu:~# sudo mv /etc/default/gearman-job-server /etc/default/gearman-job-server.bak
user@ubuntu:~# sudo echo "PARAMS=\"-q libdrizzle --libdrizzle-host=127.0.0.1" \
   "--libdrizzle-user=root --libdrizzle-password=123123 --libdrizzle-db=gearman" \
   "--libdrizzle-table=gearman_queue --libdrizzle-mysql\"" > /etc/default/gearman-job-server
user@ubuntu:~# sudo /etc/init.d/gearman-job-server restart

Gearman PHP Extension

We need to download and install the Gearman PHP extension if we want to write PHP workers or post jobs to the queue.

user@ubuntu:~# cd ~/src
user@ubuntu:~/src# wget http://pecl.php.net/get/gearman-0.7.0.tgz
user@ubuntu:~/src# tar xzf gearman-0.7.0.tgz
user@ubuntu:~/src# rm gearman-0.7.0.tgz package.xml
user@ubuntu:~/src# cd gearman-0.7.0
user@ubuntu:~/src# phpize
user@ubuntu:~/src# ./configure
user@ubuntu:~/src# make
user@ubuntu:~/src# sudo make install

Next, add the config file to load the Gearman PHP extension:

user@ubuntu:~# sudo echo "extension=gearman.so" >> /etc/php5/conf.d/gearman.ini

memcached PHP Extension

Since we have memcached and the memcached PHP extension install, let’s use it for storing session data:

user@ubuntu:~/src# sudo echo "session.save_handler = memcached
session.save_path = \"127.0.0.1:11211\"" >> /etc/php5/conf.d/memcached.ini

nginx

nginx is web server that is really fast. I use nginx as my primary development web server unless I’m running a web app that only works with Apache. You can choose to install nginx from package, but I like to live life on the bleeding edge, so I’ll be building nginx from source. To install nginx, we need to download the source, compile it, install it, and configure it.

user@ubuntu:~# cd ~/src
user@ubuntu:~/src# wget http://nginx.org/download/nginx-0.8.52.tar.gz
user@ubuntu:~/src# tar xzf nginx-0.8.52.tar.gz
user@ubuntu:~/src# rm nginx-0.8.52.tar.gz
user@ubuntu:~/src# cd nginx-0.8.52
user@ubuntu:~/src# mkdir /var/lib/nginx
user@ubuntu:~/src# ./configure \
    --sbin-path=/usr/sbin \
    --conf-path=/etc/nginx/nginx.conf \
    --error-log-path=/var/log/nginx/error.log \
    --pid-path=/var/run/nginx.pid \
    --lock-path=/var/lock/nginx.lock \
    --http-log-path=/var/log/nginx/access.log \
    --http-client-body-temp-path=/var/lib/nginx/body \
    --http-proxy-temp-path=/var/lib/nginx/proxy \
    --http-fastcgi-temp-path=/var/lib/nginx/fastcgi \
    --http-uwsgi-temp-path=/var/lib/nginx/uwsgi \
    --http-scgi-temp-path=/var/lib/nginx/scgi \
    --with-http_stub_status_module
user@ubuntu:~/src# make
user@ubuntu:~/src# sudo make install

user@ubuntu:~# sudo pico /etc/init.d/nginx

Here’s the init script that will start nginx for us:

#! /bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DAEMON=/usr/sbin/nginx
NAME=nginx
DESC=nginx
test -x $DAEMON || exit 0
case "$1" in
  start)
        echo -n "Starting $DESC: "
        start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
        echo "$NAME."
        ;;
  stop)
        echo -n "Stopping $DESC: "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        echo "$NAME."
        ;;
  restart|force-reload)
        echo -n "Restarting $DESC: "
        start-stop-daemon --stop --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        sleep 1
        start-stop-daemon --start --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON -- $DAEMON_OPTS
        echo "$NAME."
        ;;
  reload)
        echo -n "Reloading $DESC configuration: "
        start-stop-daemon --stop --signal HUP --quiet --pidfile /var/run/$NAME.pid --exec $DAEMON
        echo "$NAME."
        ;;
  *)
        echo "Usage: /etc/init.d/$NAME {start|stop|restart|reload|force-reload}" >&2
        exit 1
        ;;
esac
exit 0

Now we need to make the init script executable and enable it:

user@ubuntu:~# sudo chmod +x /etc/init.d/nginx
user@ubuntu:~# sudo update-rc.d nginx defaults

user@ubuntu:~# sudo pico /etc/nginx/nginx.conf

Here’s a starter nginx.conf with some basic settings:

user  www-data www-data;
worker_processes  2;

events {
    worker_connections  1024;
}

http {
    include       mime.types;
    default_type  application/octet-stream;

    sendfile                on;
    tcp_nodelay             on;
    tcp_nopush              on;
    keepalive_timeout       65;
    server_name_in_redirect off;
    server_tokens           off;

    add_header Strict-Transport-Security max-age=1800;
    add_header X-Frame-Options deny;

    gzip            on;
    gzip_buffers    16 8k;
    gzip_comp_level 9;
    gzip_types      text/plain text/xml application/x-javascript text/css;

    include /etc/nginx/sites/*;
}
user@ubuntu:~# sudo mkdir /etc/nginx/sites
user@ubuntu:~# sudo pico /etc/nginx/sites/default

Now we need to set up a default host that supports PHP (via PHP-FPM, PHP’s FastCGI Process Manager) and we want the default host to use the eth0:1 IP address:

server {
    listen       192.168.1.201:80 default;
    server_name  _;
    root   /var/www;
    index  index.php;
    location / {
        if (!-e $request_filename) {
            rewrite ^/(.*)$ /index.php?q=$1 last;
            break;
        }
    }
    location ~ \.php$ {
        fastcgi_pass   127.0.0.1:9000;
        fastcgi_index  index.php;
        fastcgi_param  SCRIPT_FILENAME  /var/www$fastcgi_script_name;
        include        fastcgi_params;
    }
    location ~* (\.(htaccess|engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)|code-style\.pl|Entries.*|Repository|Root|Tag|Template)$ {
        deny all;
    }
}

After the config files are good to go, start nginx:

user@ubuntu:~# sudo /etc/init.d/nginx start

Service Names

I also like to add service names so I can see what ports are in use when I run netstat. I added drizzle and Cassandra for fun despite this post not including them.

user@ubuntu:~# sudo cp /etc/services /etc/services.bak
user@ubuntu:~# su
root@ubuntu:~# echo "drizzle     4427/tcp
drizzle     4427/udp
memcached   11211/tcp
memcached   11211/udp
gearmand    4730/tcp
gearmand    4730/udp
fastcgi     9000/tcp
cassandra   9160/tcp" >> /etc/services
root@ubuntu:~# exit

Android SDK

The Android SDK is unfortunately not in package, so you’ll need to download it from the Android Developer site: http://developer.android.com/sdk/index.html.

user@ubuntu:~# wget http://dl.google.com/android/android-sdk_r07-linux_x86.tgz
user@ubuntu:~# tar xzf android-sdk_r07-linux_x86.tgz
user@ubuntu:~# rm android-sdk_r07-linux_x86.tgz
user@ubuntu:~# sudo mv android-sdk-linux_x86 /usr/local
user@ubuntu:~# sudo find /usr/local/android-sdk-linux_x86 -type d -exec chmod 777 {} \;

You’ll need to add the Android SDK path near the top of your ~/.bash_profile or ~/.bashrc:

export PATH=${PATH}:/usr/local/android-sdk-linux_x86/tools

To manage your Android SDK packages and virtual devices, you’ll need to run the android app:

user@ubuntu:~# android

First go to Available Packages and download version 1.6 and 2.2 Android SDK packages. You can also choose to download the documentation, samples, and Google APIs.

Downloading the package may take several minutes. You don’t have to create a virtual device right now if you are planning on installing Appcelerator’s Titanium platform. You can exit the Android app when you’re done.

Desktop Apps

If you’re running Ubuntu Desktop, there are a couple handy apps I install. The first is Google Chrome and can be directly downloaded from the Google Chrome download page.

I find KCachegrind and GHex to be useful:

user@ubuntu:~# sudo apt-get install kcachegrind ghex

Appcelerator Titanium

Titanium is an awesome platform for developing desktop applications for Linux, Mac OS X, and Windows as well as mobile apps for iPhone and Android. We use Titanium Developer to create Titanium projects. Begin by downloading the 64-bit version of Titanium:

user@ubuntu:~# wget -O titanium.tgz http://www.appcelerator.com/download-linux64

There’s also a 32-bit version available at http://www.appcelerator.com/download-linux32.

Next we unpack Titanium Developer and move it to a safe place:

user@ubuntu:~# tar xzf titanium.tgz
user@ubuntu:~# rm titanium.tgz

Next you need to run the installer by double-clicking the Titanium Developer executable. Run the executable and then click the Install button. You can try installing to /opt/titanium, but you might need root privileges.

Next, there are a few issues with outdated libraries, so we simply delete them:

user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgobject-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libglib-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgio-2.0.*
user@ubuntu:~# rm ~/.titanium/runtime/linux/1.0.0/libgthread-2.0.*

Titanium Developer also complains if /bin/java doesn’t exist, so create a quick link:

user@ubuntu:~# sudo ln -s /usr/bin/java /bin/java

Relaunch Titanium Developer and enter your login credentials. If you don’t have a login, you can get a free account.

After signing in, you may notice there are some updates available in the upper right corner of the window. Click in the box and the updates will be downloaded and installed.

Optionally you can create a launcher icon for your GNOME panel. Don’t forget to escape spaces in the command with a backslash!

Finishing Touches

Lastly, I like to re-arrange my desktop to maximize my coding real estate.

Conclusion

That should get you up and running with a neato dev environment. If you need to run SSL, I wrote a post on Creating Self-Signed Certs on Apache 2.2 and Virtual Hosts and Wildcard SSL Certificates with Apache 2.2.

If you find any typos or additions, please feel free to sound off in the comments!


Here’s my presentation I gave June 9, 2008, at the Twin Cities MySQL and PHP User Group about my highly available cluster using DRBD and Heartbeat.

I added a few slides and cleaned things up a bit. The presentation went well and we had a lot of good questions.

The MySQL and PHP User Group will be taking some time off over the summer. There will be another meetup mid-summer to come up with some ideas for future meetings.


This Saturday, May 10th, is MinneBar, Minnesota’s BarCamp. MinneBar is described as an “(un)Conference” which means it’s a free, ad-hoc gathering of technology folks where everyone is encouraged to contribute.

MinneBar

There are a lot of great sessions this year. I’ll be giving a presentation titled “Memcached & MySQL Sitting in a Tree.” The talk is about the new Memcached Functions for MySQL. I’ll talk a bit about the what, why, and how about this set of awesome UDFs.

I’m not sure what time I present and I think I have 50 minutes, but I don’t know for sure. I’m trying something new this time around; I’ll be publishing my presentation on SlideShare.

We are still 3 days away and there are currently 356 people signed up which is right around how many people were signed up last year. If you are in the Minneapolis/St. Paul area, you should come to participate and learn!

To register, visit their website, click the “login” link in the top right, use the password “c4mp” to login, then edit the main page, and add yourself to the bottom. Registration starts at 8:00am, so remember to set an alarm. :)

Hope to see you there!


Apache has a neat module called mod_dbd that allows your Apache modules to connect to a database. mod_dbd interfaces with apr_dbd, an Apache Portable Runtime (APR) abstraction layer around database specific drivers.

Back when Ubuntu 7.04 (fiesty) was released, a MySQL driver was not bundled with Apache for licensing concerns. So, in order to use mod_dbd to connect to a MySQL database, you need to get the MySQL driver source code from WebThing (apr_dbd_mysql.c) and manually re-compile apr-utils.

You also need the source code for Apache 2.2.3 (which includes apr-utils 1.2.7) from the Ubuntu 7.04 repositories, then copy the apr_dbd_mysql.c file into the Apache source apr-utils/dbd directory. The Ubuntu guys made a nice INSTALL.MySQL file in the apr-utils with some basic instructions.

What they don’t tell you is you need to install the MySQL source. To make matters worse, once you install it, the apr-utils 1.2.7 configure script can’t find it, even if you tell it where it is.

<snip>
configure: checking for mysql in /usr/src/mysql-dfsg-5.0-5.0.38/include
checking mysql.h usability... no
checking mysql.h presence... no
checking for mysql.h... no
<snip>

This apparently was a known issue and was fixed in apr-utils 1.2.8.

Starting with apr-utils 1.2.11, the MySQL driver is bundled with it. Unfortunately, even Ubuntu 7.10 (gutsy) still ships with apr-utils 1.2.7. So, you are forced to download the source and compile.

Or, you can wait a couple days and Ubuntu 8.04 (hardy) which has Apache 2.2.8 and apr-utils 1.2.11. In theory the MySQL driver will work out of the box.

As for me, I’ll be compiling Apache, PHP, MySQL, memcached, and <insert essential infrastructure software> from source like I should have done in the beginning.


MySQL Conference Day 4 Thoughts

Apr 17, 2008

Scaling out MySQL: Hardware Today and Tomorrow

Jeremy Cole and Eric Bergen over at Proven Scaling LLC gave a talk about the hardware side of MySQL. They covered pretty much every aspect of hardware.

For starters, Jeremy said go 64-bit hardware and operating system. For CPU, faster is better. The current versions of MySQL and InnoDB don’t take full advantage of 8 core servers, so unless you have the budget, Jeremy recommended a single quad-core or a dual dual-core setup. He recommended getting as much RAM as possible. RAM is cheap so go for 32GB, or at least 16GB.

For storage, Jeremy discussed the many options including direct attached storage (DAS), SAN, NAS, and the various hard drive interfaces. From what I gathered, they prefer configuring each DB server with RAID 10. If the RAID controller has battery backed cache, then you should do “write back”, otherwise “write through”. Write back offers faster performance since it caches the data and doesn’t make the system wait for the data to be written to disk. The battery backed cache means that you won’t lose the data pending to be written if the system loses power. There was a brief discussion of SATA vs. SAS. SAS offers faster drives (15,000 RPM) and have processors to handle commands just as SCSI has which improves performance. SAS has another interesting feature where a single drive can be hooked up to two separate SAS controllers in the event one controller should become unavailable.

They buy all of their gear from Dell, but HP, Sun, and IBM are good too. Dell just happens to be significantly cheaper, especially when you go through a sales rep. They mentioned some of the smaller guys including SuperMicro and Silicon Mechanics. I personally really like SuperMicro’s 6015T server because it has 2 nodes in a 1U chassis. This is actually denser than any blade server solution I’ve ever seen. Each node is capable of two quad-core processors and 32GB of RAM. The only downside is you can only have 2 hard drives and both nodes share a single non-redundant power supply. So this would make a decent slave, but you would need to architect your application so it could quickly pick another slave if/when it goes down or use MySQL Proxy.

For databases using InnoDB, they said the InnoDB buffer pools should be 2GB less than to total system memory, so 14GB on a 16GB system. Jeremy mentioned special hardware to speed things up, specifically Kickfire and Violin Memory. Kickfire is a SQL appliance that includes a special SQL chip to speed up operations significantly. Violin Memory’s 1010 memory appliance is sweet. For only $170k you can add 512GB of DRAM in 2U to your database server of a PCI-Express bus. It holds 84 x 6GB chips that can be hot swapped. You can lose 2 sticks before you’re screwed.

Jeremy concluded with high-speed interconnects including InfiniBand and Dolphin Interconnect. InfiniBand is fast and you can hook them all into a switch. Dolphin’s interconnect is also fast and are chained together in a loop similar to external SCSI devices, but you need to make sure they have a driver for your hardware.

I talked to Jeremy after his talk and asked him about diskless slaves which would basically have a RAM drive for the data. While it would be fast, it would take memory that would otherwise be used by MySQL and would be a pain to manage when they come online. So scratch that idea.

Helping InnoDB Scale on Servers with Many CPU Cores and Disks

One of the more popular talks was by Mark Callaghan at Google who talked about ways they managed to get InnoDB to take advantage of system with more than 4 cores and many disks. The primary change they made was to InnoDB’s mutex code used to control concurrent read/writes to pages.

They replaced the existing pthreads mutex code with a more efficient platform specific compare and swap CPU instruction (CAS). They managed to get much better performance. He said they are hoping to get a patch out by the end of the year with their changes. They don’t want to release it until they know it is rock solid.

Scaling Heavy Concurrent Writes In Real Time

Dathan Pattishall, formerly with Flickr, and now with RockYou.com talked about an analytics system he helped build for Flickr. Flickr keeps track of each photos stats including external links. Whenever someone directly embeds a picture from a Flickr Pro user, they record that information, then make those stats available in near realtime.

The old design basically involved inserting records as they came in, but it was killing the servers, especially since those servers were also handling reads for people viewing the stats. Their solution was to create a separate Java daemon that queues up pending inserts. This means only a single thread is used on the MySQL server and it doesn’t block the web servers from serving up the information.

They are inserting the stats into 3 tables, one for daily, weekly, and monthly stats. To keep things in order, they tried a VARCHAR of the URL as the primary key, but ran into major performance issues. So instead they decided to store a hash as bigint:

// php
$id = hexdec(substr(md5(url),0,16),16,10);

This code generates a 32 character MD5 of the URL, then takes the first 16 characters and converts them from string of hex numbers to base 10 number. The resulting number fits perfectly in a bigint.

He also mentioned using ibbackup for backing up the databases, but it is not a free solution.

Geo Distance Search with MySQL

Ever since Google Maps API was released, I’ve had an interest in playing around with it. Alexander Rubin of MySQL talked about ways of querying for locations within a given distance of a lat/lng. He first abstracts the distance math into a user-defined function (UDF). Then just calls the UDF from within the query.

I’ve already played with geo searching before, so it was mostly review. He didn’t go into much depth such as MySQL’s spatial extensions.

Dinner at the Tied House

We had a great turnout of around 18 people at the Tied House in Mountain View. We had a number of people from places including MySQL, PrimeBase, and Facebook.

Thanks to the PrimeBase guys! They have a neat transaction storage engine that supports streaming blob data. Normal blobs in MySQL are held in memory during the transaction. The PBXT Storage Engine is designed to stream blob data in and out very efficiently.

I’d like to give a special thanks to Jay Pipes for getting me to come to the conference this year. I truly had a great time. Thanks!


MySQL Conference Day 3 Thoughts

Apr 16, 2008

Keynotes

The conference committee managed to get Rick Falkvinge of the Swedish Pirate Party to speak. I heard him speak at OSCON 2007. What I took away from his talk is copyright is evil. Copyright is the excuse industries (i.e. the music industry) are using as a tool to justify monitoring all of your communications. Not only do they want to monitor you, but prohibit certain kinds of communications. What it comes down to is your privacy vs. copyright. It’s scary stuff.

The second part of the keynote was a panel consisting of a representative from MySQL, Sun, flickr, FotoLog, Wikipedia, Facebook, and YouTube. They were discussing scaling at each of their sites. It was a great discussion. Informative and funny. Paul Tuckfield of YouTube had a great saying: “Replication is the answer. You just need to rephrase the question.” Farhan “Frank” Mashraqi of FotoLog made an interesting observation where Sun Sparc Niagara 1 servers make great master servers due to their high speed and Sun Sparc Niagara 2 servers make great slave servers due to their large concurrency.

Grand Tour of the information_schema

The information_schema database is a built-in database that contains metadata about data including tables, partitions, privileges, character sets, constraints, indexes, server settings, server status, and routines. This database is an alternative to MySQL’s proprietary SHOW commands.

I see a real utility being able to query the information_schema database to check server status. Another interesting use is to auto-generate schema documentation. I’m curious what kind of user metadata you can associate to objects.

Applied Partitioning and Scaling Your Database System

Phil Hildebrand gave his talk about the different ways of partitioning your data. The types are range, hash, key, and list. You can read more about partitioning types in MySQL’s documentation.

MySQL Performance Under a Microscope: The Tobias and Jay Show

This was an entertaining talk by MySQL’s Tobias Asplund and Jay Pipes. They showed the results of a few benchmarks comparing multiple ways to do something.

In the first test, they wanted to see what was the fastest way for getting the total count of records. They tried a handful of ways, but COUNT(*) when query caching was enabled was the fastest.

On of the other interesting tests they did was DATETIME vs. INT UNSIGNED for storing a date. The best method was to use an INT UNSIGNED and do the date to int conversion on the application tier. In PHP, use the strtotime() function.

The MySQL Query Cache

Query caching can bring huge performance gains to your web application. Baron Schwartz of Percona gave a talk describing why query cache rocks.

MySQL caches query results, not execution plans. It stores the results in a big hash table where the key is the query. They key is case-sensitive and whitespace-sensitive. Only SELECT statement results are cached since it doesn’t make a whole lot of sense to cache INSERT or UPDATE results. Only deterministic queries are cached. If the query contains a non-deterministic function call, such as a function that returns the current time, then it cannot cache the results.

You can display the query cache information by executing the following:

SHOW GLOBAL STATUS LIKE 'qcache%';
SHOW GLOBAL STATUS LIKE 'query_cache%';

The way the query cache memory is allocated can potentially cause fragmentation. You can get a feel for how bad it is by comparing the number of free blocks to the number of total blocks. If you are running out of free blocks, you either have filled your cache or you have bad fragmentation.

Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL

The talk about Grazr was given by Patrick Galbraith and Michael Kowalchik. Patrick is one of the fellows that showed of some awesome memcached stuff at tutorial and the BOF. Grazr filters out feeds to only the information it thinks you’d be interested in. This was a pretty general discussion and they managed to get through their slides pretty quickly. Since the talk was winding down early, I headed over to Eli’s talk.

Help, My Website has been Hacked! Now What?

If you have a popular site, you are going get people attempt to hack your site. Eli White of digg talked about some of the ways your site can get hacked.

One thing he pointed out that I didn’t think about was you can’t just block someone’s IP address. If there is a proxy between the user and the web server, then IP address you get is the proxy’s, not the user’s. You need to check the x-forwarded-for HTTP header. If there are more than one proxies involved, the x-forwarded-for will contain a comma separated list of addresses.

I talked to Eli after his session and he recommended blocking the IPs on the firewall instead of the PHP code. This is means less load on the app server, but unless you have a fancy firewall, I would be curious to know how often a particular IP is trying to attack me.

Performing MySQL Backups Using LVM Snapshots

The last session of the day was by Lenz Grimmer of MySQL. LVM snapshots can be a great way to backup your databases, especially InnoDB. The basic procedure is:

  • flush tables
  • flush tables with read lock
  • lvcreate -s
  • show master/slave status
  • unlock tables
  • mount snapshot, perform backup
  • unmount and discard the snapshot

InnoDB ignores the “flush tables with read lock” step, but if you have any MyISAM tables, you’ll still need to do it. Flushing the tables does impact performance, especially while the snapshot is active. As soon as you mount the LVM partition snapshot, you can back it up and then unmount and discard the snapshot.

There is a Perl script called mylvmbackup which can help with these procedures.

An alternative to LVM snapshots for backups is to replicate to a slave server, stop the replication, perform the backup on the slave, then start replication again. The downside is it requires an extra machine as the slave in which MySQL can be stopped so that InnoDB tables can be properly flushed.

MySQL Quiz Show and Sun party

The quiz show is a absolute blast. The show is moderated by the infamous Jay Pipes. Facebook was kind enough to sponsor the quiz show this year. There was plenty of beer and popcorn to go around. People won a ton of books and Sheeri Kritzer Cabral won the grand prize: an Apple iPhone. Lucky!

Everybody ended up coming out of the wood work for the Sun after party. It was nice to finally get to meet Baron Schwartz. Everybody should go by his book! High Performance MySQL.

I also had a chat with Brian Moon of dealnews.com. He claims PHP can be made to work with the Apache worker MPM. Hmm, looks like I have a new project!


Keynotes

The keynote was kick started by Marten Mickos.  If you’ve never met Marten, he is, on a personal note, one of the greatest CEOs I’ve ever met.  The keynotes were especially interesting for me because it was the first time I’ve had the opportunity to listen to Jonathan Schwartz, the CEO of Sun Microsystems.  Jonathan seems like a great guy who gives the impression he "gets it".

The last keynote was by Werner Vogels of Amazon.  His talk covered Amazon’s growth and the new services they offer including EC2.  He announced that EC2 now supports persistent storage, which is a huge improvement, but doesn’t quite solve all of the problems.

Testing PHP/MySQL Applications with PHPUnit/DbUnit

I’ve never been big into testing, but I’m trying to change that.  Sebastian Bergmann, the author of PHPUnit Pocket Reference (free online version), talked about PHPUnit and DbUnit and why I should use them.  Installing PHPUnit is extremely simple if you have pear installed:

pear channel-discover pear.phpunit.de
pear install phpunit/PHPUnit

Once installed, just require PHPUnit:

// php
require_once 'PHPUnit/Framework.php';

He just scratched the surface on writing unit tests. One thing he pointed out was using CruiseControl for automated testing. What’s really cool is you can fire off CruiseControl from Subversion commit hooks. If the testing fails, CruiseControl can send an email with the results and who is to blame.

Practical MySQL for Web Applications

Domas Mituzas of MySQL and Wikipedia fame gave a good talk that covered practical design of web applications. The talk covered simple stuff, so I didn’t learn a whole lot. Nevertheless, Domas sometimes says some funny things that make the talk enjoyable.

EXPLAIN Demystified

Baron Schwartz gave a talk about the EXPLAIN statement. EXPLAIN is run by prepending the word EXPLAIN to your SELECT statements. It only works on SELECT statements. When the query is run, it outputs an execution plan.

After running through the output of the EXPLAIN statement, he showed us mk-visual-explain which is one of the tools in Maatkit. It is a neato command line tool that takes the EXPLAIN output and reformats it as a tree structure. It’s a great way to visualize the execution plan. Now if only there was a GUI version…

Upgrading to Elegant Versatile Database Architecture using PHP5 Data Objects

This talk was given by Sigurd Magnusson of SilverStripe and covered PDO. I already researched and used PDO, so it was mostly review.

After talking to some of the other people at the conference, I’ve been seriously thinking of moving away from PDO and using MySQL specific functions because they expose some *really* cool debugging and profiling information.

Exploring Amazon EC2 for Scale-out Applications

The thought of EC2 sounds really cool. The ability to create a server instance and host your stuff on it within minutes is sweet. Need more servers, no problem, add another instance. The speakers, Morgan Tocker of MySQL and Carl Mercier of Defensio, talked about their experience with EC2.

There are some serious data and management issues. Until the other day, there wasn’t any kind of persistent storage, meaning when the server went offline, you lost all your data. Now you can mount a drive that persists across restarts. But one issue for critical business transactions is how and when data is written to disk. Is the data written immediately to disk or is buffered in the kernel or in some RAID card’s cache?

Another issue they ran into is when a new machine is created, there’s remnants of the previous machine’s instance’s data. So they need to zero out the drive which takes 5 hours on single instance.

What I took away from the talk is EC2 is great if your app is simple and relies on 3rd party services (i.e. Facebook, Google, etc) that are more reliable than EC2.

Service Oriented Architecture with PHP and MySQL

Joe Stump, a PHP hacker at Digg, gave a talk about SOA. It wasn’t as much about “web services” as it is managing tasks and processing them asynchronously.

After talking to Joe, he highly recommended Gearman to manage tasks. From the Gearman site: “Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages.”

So, if a user uploads an image, you can add the task of resizing the image to a backend processing mechanism. This allows for a responsive front-end for the user.

Joe, along with Chris Goffinet, are working on netgearman which is a PEAR package for interfacing with Gearman.

Memcached Hackathon BOF

This was a birds of a feature session where a bunch of people informally got together to discuss all things memcached. Patrick Galbraith of Grazr showed off Memcached Functions for MySQL. This is super cool. It allows you to set and get data in memcached within your SQL code via user defined functions.

So instead of pulling data from the DB to the app, then pushing it to memcached, you can just have a trigger or stored procedure store the value directly to memcached. One caveat is when you rollback a transaction, it won’t unset the value from memcached.

There was some discussion about the memcached MySQL storage engine. After listening to them discuss it, I have to wonder if it is really worth it. It acts like a distributed memory table, except when a server in a cluster goes down, it will affect all the other servers.