The keynote was kick started by Marten Mickos. If you've never met Marten, he is, on a personal note, one of the greatest CEOs I've ever met. The keynotes were especially interesting for me because it was the first time I've had the opportunity to listen to Jonathan Schwartz, the CEO of Sun Microsystems. Jonathan seems like a great guy who gives the impression he "gets it".
The last keynote was by Werner Vogels of Amazon. His talk covered Amazon's growth and the new services they offer including EC2. He announced that EC2 now supports persistent storage, which is a huge improvement, but doesn't quite solve all of the problems.
I've never been big into testing, but I'm trying to change that. Sebastian Bergmann, the author of PHPUnit Pocket Reference (free online version), talked about PHPUnit and DbUnit and why I should use them. Installing PHPUnit is extremely simple if you have pear installed:
pear channel-discover pear.phpunit.de pear install phpunit/PHPUnit
Once installed, just require PHPUnit:
require_once 'PHPUnit/Framework.php';
He just scratched the surface on writing unit tests. One thing he pointed out was using CruiseControl for automated testing. What's really cool is you can fire off CruiseControl from Subversion commit hooks. If the testing fails, CruiseControl can send an email with the results and who is to blame.
Domas Mituzas of MySQL and Wikipedia fame gave a good talk that covered practical design of web applications. The talk covered simple stuff, so I didn't learn a whole lot. Nevertheless, Domas sometimes says some funny things that make the talk enjoyable.
Baron Schwartz gave a talk about the EXPLAIN statement. EXPLAIN is run by prepending the word EXPLAIN to your SELECT statements. It only works on SELECT statements. When the query is run, it outputs an execution plan.
After running through the output of the EXPLAIN statement, he showed us mk-visual-explain which is one of the tools in Maatkit. It is a neato command line tool that takes the EXPLAIN output and reformats it as a tree structure. It's a great way to visualize the execution plan. Now if only there was a GUI version...
This talk was given by Sigurd Magnusson of SilverStripe and covered PDO. I already researched and used PDO, so it was mostly review.
After talking to some of the other people at the conference, I've been seriously thinking of moving away from PDO and using MySQL specific functions because they expose some *really* cool debugging and profiling information.
The thought of EC2 sounds really cool. The ability to create a server instance and host your stuff on it within minutes is sweet. Need more servers, no problem, add another instance. The speakers, Morgan Tocker of MySQL and Carl Mercier of Defensio, talked about their experience with EC2.
There are some serious data and management issues. Until the other day, there wasn't any kind of persistent storage, meaning when the server went offline, you lost all your data. Now you can mount a drive that persists across restarts. But one issue for critical business transactions is how and when data is written to disk. Is the data written immediately to disk or is buffered in the kernel or in some RAID card's cache?
Another issue they ran into is when a new machine is created, there's remnants of the previous machine's instance's data. So they need to zero out the drive which takes 5 hours on single instance.
What I took away from the talk is EC2 is great if your app is simple and relies on 3rd party services (i.e. Facebook, Google, etc) that are more reliable than EC2.
Joe Stump, a PHP hacker at Digg, gave a talk about SOA. It wasn't as much about "web services" as it is managing tasks and processing them asynchronously.
After talking to Joe, he highly recommended Gearman to manage tasks. From the Gearman site: "Gearman is a system to farm out work to other machines, dispatching function calls to machines that are better suited to do work, to do work in parallel, to load balance lots of function calls, or to call functions between languages."
So, if a user uploads an image, you can add the task of resizing the image to a backend processing mechanism. This allows for a responsive front-end for the user.
Joe, along with Chris Goffinet, are working on netgearman which is a PEAR package for interfacing with Gearman.
This was a birds of a feature session where a bunch of people informally got together to discuss all things memcached. Patrick Galbraith of Grazr showed off Memcached Functions for MySQL. This is super cool. It allows you to set and get data in memcached within your SQL code via user defined functions.
So instead of pulling data from the DB to the app, then pushing it to memcached, you can just have a trigger or stored procedure store the value directly to memcached. One caveat is when you rollback a transaction, it won't unset the value from memcached.
There was some discussion about the memcached MySQL storage engine. After listening to them discuss it, I have to wonder if it is really worth it. It acts like a distributed memory table, except when a server in a cluster goes down, it will affect all the other servers.
Here at the MySQL Conference and Expo, Laura Thomson gave a great talk about Scalability and Performance Best Practices.
She had some interesting points about scalability. She basically said that no matter what language you write your web application, whether it's compiled (C/C++) or interpreted (PHP, Java), you are subject to scalability issues. Another potential problem is optimizing before you know what exactly to optimize. This can lead to a loss of time that could have been spent on more important things.
Laura's talk covers three types of best practices: general, scalability, and performance.
General Best Practices
First tip is to profile early and profile often. The earlier you can detect poor performance, the easier it is to fix. There are a handful of tools (APD, Xdebug, Zend) that can help with profiling. Use system profiling tools such as strace, dtrace, and ltrace to gather more information.
There are two types of effective profiling: debugging and habitual. Debugging profiling is about spotting deviations from the norm and habitual profiling is making the norm better. Profiling is an art and requires lots of practice to know where to look.
It is essential that the IT admins and the developers cooperate. This allows crisis' to be handled properly, especially in production environments were outages are time critical. Team members should report alert the developers of any abnormal behavior changes after a new code release is pushed. Before pushing new code, schedule a launch window and procedures for having developers fix problems and possibly falling back to a previous version. Avoid pushing releases on Fridays, otherwise key team members may be unavailable or over worked on the weekend.
It is recommended to test your application with production data. Test data may not take into account certain scenarios where bugs can be introduced. It is advised to have a staging environment which uses production data and also undergoes simulated load testing.
In order for you to track your application's performance, you should record your applications performance over a period of time, then analyze the data to find potential issues. There are several means of tracking performance including access logs, system metrics, application profiling, and query profiling.
When a problem occurs, don't make assumptions. The problem may be caused by something other than what you think it might be.
Scalability Best Practices
When the web application begins to suffer performance issues, start to decouple and isolate components to track down the source. If you need to tweak code, spend only enough time to refactor as needed. Reduce load on servers by moving static content on to dedicated servers.
By default, PHP stores session data on the hard drive. This can cause performance issues and can benefit from storing the session data in a database or better yet in a distributed cache such as memcached.
The most important thing you can do to improve performance is to cache as much data as possible. There are many levels of caching. You can cache data sets or precomputed fragments. For things like images, you can set up dedicated services for caching and serving static content. The usual suspects are recommended for caching (APC, memcached, Squid).
PHP out of the box does not cache compiled pages. That means that ever request, each page has to be parsed and executed. Extensions such as APC and Zend can cache the compiled pages for an immediate speed improvement.
MySQL's query cache works, but isn't necessarily implemented the best way. If you query a table, the results are cached on the MySQL server. If a row is inserted, updated, or deleted, the entire cache is flushed. This supposedly has been fixed in MySQL 5.1, but a setting has to be set to not flush the cache.
To scale, your data can be federated across multiple MySQL servers. There can be complications with regards to data reliability and table joins can suffer from major speed hits.
A more reliable way to scale is to use replication. Replication does suffer from "slave lag" issues. The reason the lag can be high is because the master server uses multiple threads to store the data locally. The slave server has to process the replicated items in a single threaded mode to ensure the order of which the transactions is preserved. You can display the status of MySQL's I/O and SQL thread by executing a SHOW PROCESSLIST statement.
The more database writes, the greater the lag. Depending on your application, you may only want to use replication for failover or backups.
Sometimes you may benefit from designing your application to avoid situations where data is hard to scale and can't easily be cached.
Performance Best Practices
You definitely want to use a compiler cache. As described above, use APC or Zend for PHP5. If you are connecting to an external data source, perhaps a web service or data feed, minimize the number of instances you request them. Cache their response if possible. You may be able to load the data dynamically using Javascript and a little Ajax magic. Maybe the data isn't a must have or maybe you can have a page dedicated to display the 3rd party data.
When tuning your applications performance, change one thing at a time. If you change more than one thing, how do you know which change caused the improvement and how do you know if you didn't introduce new bugs changing the other things. Use MySQL's EXPLAIN statement to profile your queries and enable the slow query logging. Use MyTop or InnoTop to help profile your queries.
It is crucial that your database is properly indexed. If a table has poorly designed indexes or perhaps too many indexes. Use the smallest data type possible and try to design your tables to be fixed width. That means, use char instead of varchar, set the length of your fields to logical lengths (ie use 128 chars instead of 107 chars). De-normalize when necessary. Remove static data out of the database or store it in a MEMORY table. Use the appropriate storage engine for each table.
For your queries, minimize the number of queries and cache them outside the database when possible.
She claims that deeply recursive code is expensive in PHP. Make sure you are not doing unnecessary looping. If you find that you are, chances are you are doing something wrong and that there is a better idiom for performing the task.
Don't try to work around or re-write perceived inefficiencies in PHP. Use regular expressions to do intense string manipulation. instead of writing complex serialization code, use PHP's extensions to do the heavy lifting. Before spending time to write some boilerplate function, check to see if there are any extensions that exist that could help save you time.
Laura gave this talk on behalf of George Schlossnagle. George's original presentation can be found at http://omniti.com/~george/talks. This was an excellent session and proved to be beneficial.
Today, Rasmus Lerdorf, the man who kick started PHP, gave a great talk about PHP Performance and Security.
He began by talking about the new MySQL native driver. He did some benchmarks in which it appeared that the driver offered little performance improvements.
The first tool to use to track down performance issues is Callgrind that runs on top of Valgrind. Callgrind dumps a file that can be opened with KCachegrind, which is available on the Callgrind site.
Performance can be improved by installing APC: Alternative PHP Cache. Next he recommends installing the Xdebug extension to profile your PHP application and find performance issues. Caching is key.
For security, he described a scenario where you can browse to a website and the website lists a bunch of links to various websites. With Javascript, the site can detect if the link has been visited. If it has, he could have the Javascript check those pages to see if your session is still valid and then cause problems such as transfer money or open your firewall. You can reduce the surface area of these kinds of attacks by passing all inputs through filters. Each form field, URL, cookie, whatever must go through a filter to escape potential problems.
Next he covers attacks by passing extra stuff in the URL. Even escaping URL parameters with htmlspecialchars() doesn't protect you from characters that are already escape that are evaluated in the browser to do harmful things.
He talked about a spoofing trick where older versions of the Flash plugin in Internet Explorer can add attributes to the request header such as the domain. When providing links to download PDF files, the URL can include Javascript code that is executed when the PDF plugin is loaded. This can be prevented by setting the mime type for .pdf to application/octet-stream which forces PDFs to be downloaded.
Cross-site request forgery is another huge problem. By adding a hidden input field with some sorts of a session token, or "crumb" as he calls it, in combination with your session cookie, can be used to verify the request is valid.
Rasmus has made his presentation available online at http://talks.php.net/show/mysql07.
Recent comments
2 weeks 3 days ago
5 weeks 1 day ago
6 weeks 5 days ago
6 weeks 6 days ago
7 weeks 6 days ago
9 weeks 5 days ago
9 weeks 5 days ago
10 weeks 1 day ago
10 weeks 5 days ago
10 weeks 5 days ago