Latest Posts

Most Popular Posts

Many content management systems and forums were originally created using the latin1 character set and the latin1_swedish_ci collation in MySQL. The problem many of these systems are facing today is the growing demand for multi-language content using special characters that cannot be accurately represented in the latin1 character set. This is where utf8 comes in.

The problem with simply converting a database from latin1 to utf8 is in the data itself. When you convert a database or table to a different character set or collation, it does not convert the content held within the tables. What happens is users end up with ‘strange’ characters in their data.

strange_a_e

latin1_convert_to_utf8

You can prevent this by taking steps to protect the data before you convert the database or tables.

An overview of the steps:

  1. Backup the database. No, really, do it.
  2. Alter the string field types to their binary equivalents in all tables. This will protect the data during conversion.
    • VARCHAR to VARBINARY
    • CHAR to BINARY
    • LONGTEXT to LONGBLOB
    • MEDIUMTEXT to MEDIUMBLOB
    • TEXT to BLOB
    • TINYTEXT to TINYBLOB
  3. Convert the database and tables from latin1 to utf8 character set and latin1_swedish_ci to utf8_general_ci collation.
  4. Convert the binary field types back to their original string field types.
  5. Backup the resulting database.
  6. Finished

Ex. Generic convert column to BLOB from TEXT

ALTER TABLE tbl_name MODIFY column_name BLOB

Ex. Generic convert table to new character set

ALTER TABLE tbl_name CHARACTER SET charset_name COLLATE collation_name

Ex. Generic convert database to new character set utf8

ALTER DATABASE database_name CHARACTER SET utf8;

(average: 5.00 out of 5)

When marking comments as “spam” in Wordpress, it doesn’t actually remove them from the database.

From my own experience, most Wordpress installations have thousands of spam comments in their databases, taking up valuable space.

In the wp_comments table the comment_approved column will show a value of “spam” on any comment marked as spam – and a value of “1″ for any approved comment.

To quickly delete ALL spam comments, execute the following query on your MySQL database:

DELETE FROM wp_comments WHERE comment_approved = “spam”;

You can expand the SQL command to delete from multiple databases and multiple tables, quickly and effectively.


(No Ratings Yet)

Wordpress Dynamic Replacement

Posted By Chris Stinson in General, MySQL, PHP on September 12, 2008

Wordpress has long had a feature that dynamically replaces standard characters with their more visually appealing (and perhaps more accurate) symbols. In some cases the formatting may not be desirable.

Last year I wrote a few articles on Exchange Server errors. The errors typically follow the form of 0xNumber, ex. 0x8000000F. What wordpress was doing was replacing the “x” in the error with a “times” or multiplication symbol. So 0x0 would show up as 0×0.

Dynamic Replace Active

Dynamic Replace Inactive

My concern was that search engines DO distinguish between “x” and the symbol for multiplication. So when people were searching for 0x8004011D my posts did not show up, but when searching for 0×8004011D or 8004011D they did. If you do a google search with the “times” symbol instead of an “x” for exchange errors, you will find many posts otherwise hidden from the world.

The last string of the regular expression below represents a case for when there is a character followed by an x and by another character without spaces. The x will be replaced with the special character for “times” or &# 215 ;

The characters in the image below are the symbols for replacement. Generally the quotes are not misleading to search engines.

The following dynamic string variable is found in the \wp-includes\formatting.php file. If you wish to remove all dynamic characters, simply delete both lines and your wordpress posts will render correctly.

$dynamic_characters = array
('/\'(\d\d(?:’|\')?s)/', '/(\s|\A|")\'/',
'/(\d+)"/', '/(\d+)\'/', '/(\S)\'([^\'\s])/', '/(\s|\A)"(?!\s)/',
'/"(\s|\S|\Z)/', '/\'([\s.]|\Z)/', '/(\d+)x(\d+)/');

 $dynamic_replacements = array
('’$1','$1‘', '$1″',
'$1′', '$1’$2', '$1“$2',
'”$1', '’$1', '$1×$2');

If you only want to remove the “x” replacement symbol ( × ) use the following lines in formatting.php in place of the originals.

$dynamic_characters = array(’/\’(\d\d(?:’|\’)?s)/’, ‘/(\s|\A|”)\’/',’/(\d+)”/’, ‘/(\d+)\’/', ‘/(\S)\’([^\'\s])/’, ‘/(\s|\A)”(?!\s)/’,'/”(\s|\S|\Z)/’, ‘/\’([\s.]|\Z)/’);

$dynamic_replacements = array(’’$1′,’$1‘’, ‘$1″’, ‘$1′’, ‘$1’$2′, ‘$1“$2′, ‘”$1′, ‘’$1′);


(No Ratings Yet)

Avg Disk Queue Length is one of the main counters in the perfmon application. Avg Disk Queue Length is an estimate of requests on the physical or logical disk that are either in service or waiting for service. The value is a product of Disk Transfers/sec (response X I/O) and Avg Disk sec/Transfer.

What does it all mean? It’s confusing for many, but there are many instances where a high Avg Disk Queue Length does not mean a bottleneck. To see whether Avg Disk Queue Length is indeed showing a true representation of your disk’s performance, you need to compare Current Disk Queue Length over an interval. Add the Current Disk Queue Length to the counters graph in perfmon.

If the Current Disk Queue Length for the previous interval matches the Current Disk Queue Length for the current interval, then indeed the Avg. Disk Queue Length can be used as a general representation of the condition of your storage system.

Say your Avg. Disk Queue Length shows a value of 4, and the Current Disk Queue Length for the current interval is 3, and the previous interval was 0. This means the number of I/O arrivals is greater than the I/O completions during the interval. This results in an incorrect value for Avg Disk Queue Length – often to the horror of System Administrators.

Suppose you have determined the value of Avg Disk Queue Length is indeed accurate and useful – how much is too much? As a general rule for hard disks, an Avg Disk Queue Length greater than 2 (per hard disk) for extended periods of time is considered undesirable. If you have a RAID system with 8 disks, you do not want an Avg Disk Queue Length greater than 16. Faster hard disks with quicker access times (and therefore I/O) will allow greater flexibility with these numbers. Avg Disk sec/read and Avg Disk sec/write should be under 10ms – over 20ms may indicate a bottleneck. If while Avg. Disk Queue Length is over 2 and % Disk Time is hovering at 60% or above, you may want to look into a possible I/O bottleneck.

Below is a perfmon graph taken on a test machine. Avg Disk Queue Length reaches 36!! on a 2 disk RAID1 configuration.

Using Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx) we are able to see which applications have the highest I/O reads and writes. The following screenshot shows over 9 million I/O reads and 260 000 I/O writes in a little over 4 hours uptime for a DBServer application.

Using another program called FileMon (http://technet.microsoft.com/en-us/sysinternals/bb896642.aspx) we are able to see each program being accessed on the machine in real-time. The small screenshot shows a section of DBServer operations all within the same second. As it turns out, there were well over 300 instances during a one-second interval, correlating to the spike that sent the Avg Disk Queue Length to 36.

This particular situation was a stress test comprised of 12 users performing typical operations at the same time on a networked database server. Obviously a 2 disk RAID1 system (10K SAS) was not up to the task.


(average: 4.00 out of 5)

One of the challenges with a traditional SATA or SAS hard disk based server is data manipulation.

Small databases are able to run entirely in memory, while large databases need to be stored on disk and pulled into memory when needed. One of the challenges with database servers is having a storage subsystem that can handle the raw throughput and read/write IOPS required to manipulate gigabytes or even terabytes worth of data.

As you may have seen in my benchmarks with the Dell MD1000 DAS with 14 – 15k SAS drives, the throughput is pretty good (500MB/s in a RAID 10 array) but the read and write IOPS just aren’t there yet. Hard disk access times increase with more complex RAID arrays – which is the exact opposite of what we want in a quick and powerful storage system.

SSDs are considerably more expensive than a similarly-sized SAS drive. The SSDs have IOPS that are 8-14 times that of an SAS drive. The new line of Intel SSDs (X25-E) have read IOPS in the range of 30, 000 and write IOPS in the 3000 range. If you are working with small (4K) reads and writes in a database system, you can replace 8 SAS drives with 1 SSD and achieve similar performance.

Databases with large queries will rely more heavily on the raw throughput numbers of a hard disk to measure performance. In this area SSDs still outperform SAS drives 3-to-1. I’ve worked with many companies that require reports and data collaboration that run into the gigabytes per query. It takes several minutes for some queries to be performed, something that can be drastically reduced with an SSD subsystem.

Not all large databases have large data requirements. Some large databases have mostly ’small’ queries that move small amounts of data. Likewise, some small databases move large amounts of data with very few queries.

When sizing a disk and memory system for a database server, you must take into consideration the type of data to be stored, the main purpose of the data (for storage archive or retrieval), and the number of queries/users.

SSDs are still too expensive for most databases that have large data speed requirements. Although SSDs outperform the SAS disks 3-to-1 or slightly more in MB/s, the cost for a reliable setup is still running about 5-to-1 or more.

It is a different story when sizing a system for a web server, forum or search archive (or a similar high-use, small data transfer) database. With systems that have a large number of users pulling small data blocks, an SSD can perform read IOPS way beyond that of an SAS drive. A typical LAMP server running a vBulletin forum can handle more than 30 times the traffic of an SAS drive.

How nimble is your data? With the new crop of SSDs, ease of data manipulation will increase substantially. It is far too taxing on database servers to run various “what-if” scenarios. It simply takes too long to perform. MySQL, Oracle, Sybase, MSSQL all run well until they run out of memory and need to page the hard disks. This is where SSDs will be leveraged. Large enterprises will be able to manipulate their data faster and more effectively without their servers grinding to a halt.


(No Ratings Yet)
Page 1 of 3123

How often do you back up your important files?

View Results

Loading ... Loading ...