Symfony2 MSSQL support with pdo_dblib

We now maintain a library for connecting to a MSSQL database, since we are using it in one of our Symfony2 applications. You can find the code here:

https://github.com/LeaseWeb/LswDoctrinePdoDblib

If you are trying to use Doctrine 2 (for Symfony 2) to connect to MSSQL server from a Linux machine, you might encounter the same problems we did. To be able to connect to MSSQL from a Ubuntu Linux server you will have to install “php5-sybase” (Sybase / MS SQL Server module for php5).

$ sudo apt-get install freetds-bin php5-sybase

Add the following lines to “/etc/freetds/freetds.conf”, and make sure you set it to talk the right MSSQL protocol. The TDS protocol version should be 8.0 for MSSQL server 2005 and up, which may be confusing. Also replace “{mssql-server-ip-address}” with your MSSQL servers IP address (without the brackets).

    [mssql_freetds]
        host = {mssql-server-ip-address}
        port = 1433
        tds version = 8.0
        client charset = UTF-8
        text size = 20971520

If you only connect to one server, you can also set the global TDS protocol version to 8.0 like this:

     [global]
         # TDS protocol version
         tds version = 8.0

Now test that your freetds is working by using the command line client. You have to replace {username} and {password} with your username and password (without the brackets):

$ tsql -S mssql_freetds -U {username}
Password: {password}
locale is "en_US.UTF-8"
locale charset is "UTF-8"
using default charset "UTF-8"
1> quit

If you specify in “app/config/parameters.ini” that you want to use the “pdo_dblib” driver like this:

database_driver="pdo_dblib"

You will see the error:

[Doctrine\DBAL\DBALException]
The given 'driver' pdo_dblib is unknown, Doctrine currently supports only the following drivers:
pdo_mysql, pdo_sqlite, pdo_pgsql, pdo_oci, oci8, ibm_db2, pdo_ibm, pdo_sqlsrv

This is why you should install our LswDoctrinePdoDblib bundle, and following the instructions, you should comment out the “driver” and add a “driver_class” like this:

    # Doctrine Configuration
    doctrine:
        dbal:
            #driver:   %database_driver%
            driver_class:   Lsw\DoctrinePdoDblib\Doctrine\DBAL\Driver\PDODblib\Driver

After that you should be able to run this command without any errors:

$ php app/console doctrine:database:create

Now do not expect everything to work flawlessly, since MSSQL is not officially supported, and Doctrine is very picky in what it does and does not accept as database structure. However, this should get you started. Post your experience and questions in the comments. Good luck!

Share

Features of PostgreSQL

Database systems can cost lots of money, this is fairly known. Products like Microsoft SQL Server and Oracle Standard Edition are also billed per CPU (even per core) and may also require client licenses. The costs and issues of licensing may drive people to free (not only as in beer) software. When free database systems are discussed, most people immediately think about MySQL (also owned by Oracle). But there is another, maybe even better, player in the open source market that is less known. Its name is “PostgreSQL”.

An enterprise class database, PostgreSQL boasts sophisticated features such as Multi-Version Concurrency Control (MVCC), point in time recovery, tablespaces, asynchronous replication, nested transactions (savepoints), online/hot backups, a sophisticated query planner/optimizer, and write ahead logging for fault tolerance. It supports international character sets, multibyte character encodings, Unicode, and it is locale-aware for sorting, case-sensitivity, and formatting. It is highly scalable both in the sheer quantity of data it can manage and in the number of concurrent users it can accommodate. There are active PostgreSQL systems in production environments that manage in excess of 4 terabytes of data. — source: http://www.postgresql.org/about/

This database system is free and very powerful. It supports almost all of the features that the paid (and free) counterparts have. Some of the interesting features are: GIS support, hot standby for high availablity and index-only scans (great for big data). Still not convinced? Check out the impressive Feature Matrix of PostgreSQL yourself. It has excellent support on Linux systems (also OSX and Windows) and integrates well with PHP and the frameworks like Symfony2.

Download & try PostgreSQL (for OSX or Windows) or follow installation instructions (for Ubuntu Linux)

Note: If you want a tool like PHPMyAdmin for PostgreSQL you might consider Adminer.

Share

Scalable RDBMS

My name is Mukesh, I worked with fairly large (or medium large) scale websites as my previous assignments – and now in LeaseWeb’s cloud team, as an innovation engineer. When I say large scale I’m talking about a website serving 300 million webpages per day (all rendered within a second), a website storing about half a billion photos & videos, a website with an active user base of ~10 million, a web application with 3000 servers …and so on!

We all know it takes a lot to keep sites like these running especially if the company has decided to run it on commodity hardware. Coming from this background, I’d like to dedicate my first blog post to the subject of scalable databases.

A friend of mine,  marketing manager by profession, inspired by technology, asked me why are we using MySQL in knowing that it does not scale (or there is some special harry potter# magic?). He wanted to ask, from what reasons we have chosen MySQL?  And are there any plans to move to another database?

Well the answer for later one is easy “No, we’re not planning to move to another database”. The former question  however, can’t be answered in a single line.

#Talking of Harry Potter, what do you think about ‘The Deathly Hallows part -II’?

Think about Facebook –  a well recognised social networking website. Facebook handles more than 25 billion page views per day; even they use MySQL.

The bottleneck is not MySQL (or any common database). Generally speaking, every database product in the market has the following characteristics to some extent:

  1. PERSISTENCE:  Storage and (random) retrieval of data<
  2. CONCURRENCY:  The ability to support multiple users simultaneously (lock granularity is often an issue here)
  3. DISTRIBUTION:  Maintenance of relationships across multiple databases (support of locality of reference, data replication)
  4. INTEGRITY:  Methods to ensure data is not lost or corrupted (features including automatic two-phase commit, use of dual log files, roll-forward recovery)
  5. SCALABILITY:  Predictable performance as the number of users or the size of the database increase

This post deals about scalability, which we hear quite often when we talk about large systems/big data.

Data volume can be managed if you shard it. If you break the data on different servers at the application level, the scalability of MySQL is not such a big problem. Of course, you cannot make a JOIN with the data from different servers, but choosing a non-relational database doesn’t help either. There is no evidence that even Facebook uses (back in early 2008 its very own) Cassandra as primary storage, and it seems that the only things that’s needed there is a search for incoming messages.

In reality, distributed databases such as Cassandra, MongoDB and CouchDB or any new database (if that matters) lacks on scalability & stability unless there are some real users (I keep seeing post from users running into issues, or annoyance ) For example, the guys at Twitter were trying to move on with MySQL and Cassandra for about a year (great to see that have a bunch of feature working).
I’m not saying they aren’t good, they are getting better with time.  My point is any new database needs more time & cover a few large profiles to be mature (to be considered over MySQL). Of course, if someone tells about how he used any of these databases as primary storage for 1 billion cases in one year, then I’ll change my opinion.

I believe it’s a bad idea to risk your main base on new technology. It would be a disaster to lose or damage the database, and you may not be able to restore everything. Besides, if you’re not a developer of one of these newfangled databases and one of those few who actually use them in combat mode, you can only pray that the developer will fix bugs and issues with scalability as they become available.

In fact, you can go very far on a single MySQL without even caring about a partitioning data at the application level. While it’s easy to scale a server up on a bunch of kernels and tons of RAM, do not forget about replication. In addition, if the server is in front of the memcached layer (which simply scales), the only thing that your database cares is writes. For storing large objects, you can use S3 or any other distributed hash table.  Until you are sure that you need to scale the base as it grows, do not shoulder the burden of making the database an order of magnitude more scalable than you need it.

Most problems arise when you try to split the data over a large number of servers, but you can use an intermediate layer between the base, which is responsible for partitioning. Like for example FriendFeed does.
I believe that the relational model is the correct way of structuring data in most applications – content that users create. Schemes can contain data in a particular form as new versions of the service; they also serve as documentation and help avoid a heap of errors. SQL allows you to process more data as needed instead of getting tons of raw information, which then still need to be reprocessed in the application. I think once the whole hype around the NoSQL is over, someone will finally develop a relational database with free semantics.

Conclusion!

  1. Use MySQL or other classic databases for important, persistent data.
  2. Use caching delivery mechanisms – or maybe even NoSQL – for fast delivery
  3. Wait until the dust settles, and the next generation, free-semantics relational database rises up.
Share