edit: links to home.etherpunk.com will likely not work properly since I'm moving servers around. I'll fix this when I get a chance.
Yesterday a bug hit me. I wanted to see what was the fastest way to read and write data to a database. The methods of which to do the writing was the fun part. I chose LINQ, SqlCommand (2.1 Pakala), and SubSonic. There are a couple different ways to use each and I wondered which way was the right way. I found some interesting results.
Firstly, I'd like to provide the link to the code that I used so you can verify the results yourself or critique my methods. It can be found at: http://home.etherpunk.com/svn/DatabaseSpeedTest/DatabaseSpeedTest/WriteT...
Secondly, the database structure is *very* simple. The only table I'm using is a TypeLookup table which is essentially a primary key (set as a unique identifier), a name, and createdon fields. This is a table I'm using for another application (that application is exceptionally simple, which is why I chose it).
Here are the results (in seconds):
Running write tests. Creating 10000 entities.
LINQ using Attach: 3.523
LINQ using InsertOnSubmit: 38.25
SqlCommand Dynamic Sql (constant connection string): 15.989
SqlCommand Dynamic Sql (static connection string): 15.123
SqlCommand Individual (constant connection string): 8.86
SqlCommand Individual (static connection string): 9.526
SqlCommand Individual More Effecient (constant connection string): 8.862
SqlCommand Individual More Effecient (static connection string): 9.527
SqlCommand with StringBuilder (constant connection string): 6.056
SqlCommand with StringBuilder (static conenction string): 6.144
SubSonic with individual saves: 8.945
SubSonic using Collections: 9.654
An intersting thing is how horrible InsertOnSubmit() is for LINQ... I'm guessing that I'm using it wrong because holy crap... it doesn't scale well at all however using the Attach() method seems to be significantly faster than everything else. I tried using 100,000 entities however many of these just didn't scale well and it took longer than 10 minutes. I'll probably do that later tonight and update this blog with those numbers in the morning.
I should also note that I learned how to use LINQ yesterday... so it's very likely I'm not using it properly but it's a pain to find good and simple examples. Luckily Visual Studio 2008 came with some which got me going... but some of those were either lacking in areas I needed or more complex than I cared for -- fortunately Google to the rescue.
I find it very intersting that opening and closing a connection have next to zero overhead for creating entities. Granted, this is likely using pooling but still -- I would have thought even the code overhead would have been more than that. I also find it intersting how much faster a constant value is than a static value -- the read access must really be something to take in to consideration.
Something I learned about LINQ is it seems to bring over the constraints from the database in to the code itself while SubSonic seems to be just setting up classes and being fairly stupid (which isn't neccassarily a bad thing and can give you more control in certain situations). Because of this, I wasn't able to take advantage of the LINQ collections because it forced me to make a unique ID upon adding it or else it would throw an exception ("Cannot add an entity with a key that is already in use.") -- one which I haven't figured out how to get around just yet because I don't want to assign those keys in code, I want those to be assigned when they are saved and have the database assign them.
I'd really like to compare how LINQ handles revision control systems against SubSonic... since regenerating that SubSonic DAL sucks because you normally delete the folder, recreate the folder, regenerate the DAL -- and doing all that in a version control system can be very slow. LINQ doesn't seem to need tons of files however for bigger databases it seems to take longer generating the XML. This was using LINQ .NET 3.5 SP1.
SubSonic also doesn't seem to handle constraints very well or at least when I placed constraints forcing a db structure such as linking PersonID's or forcing UNIQUE names (e.g. room names) it seemed to not finish the code upon generation but never threw an error and as such it would never compile to generated DAL. As such, I had to choose a simple table structure to make sure the playing field was level however given a database that needs to maintain ACID compliance and one of moderate size (will have more than a couple users and more than a couple tables), I would not choose SubSonic (which 2 weeks ago I would not have said this) however I plan on looking back at it later. This was using SubSonic 2.1 Pakala which was somewhat recently released -- so I'm hoping it's just a few bugs to work out. However if I were using Sqlite -- I would likely choose SubSonic.
I've recently come to the conclusion that using constraints is the only sane way of keeping a database consistant since everything else requires a *ton* of overhead checking (and you will still never be 100% safe without doing row-level or table-level locking) or you take a chance at your database being inconsistant or having it slowly become more corrupt over time. I'm still trying to figure out how to cleanly handle exceptions thrown which violate these constraints... without having try/catches over ever database write. Herely lately I've been pushing for everything being saved in a stored procedure and wrapped in a transaction. This was accomplished only because I found out how to use the XML datatype in SQL Server and use it to pass in mini-datatables. Doing this seems to have made things *really* fast and most likely faster than LINQ, SqlCommand, or SubSonic will ever be because it's all in the database... only catch is you have a metric ton of variables for the save stored procedure -- which is where a strongly type data access layer comes in handy.
The next areas I plan on working on is the read tests and then a full database population read/write test -- with using constraints and without constraints. I'm very curious to know what I'll find.
Like I said before, I really dislike try/catches. They slow your code down and they are generally used when you don't know how to appropriatly handle data. For example, using try/catch to convert a string to an int versus using Int32.TryParse(). Yes, I've really seen this on the wild... and not from an amatuer either. I'll openly admit, I'm an OK programmer. I have much to learn. I just learned how to implement Transactions in code. I have learned how to make .NET crash so hard -- it throws an exception *outside* of your applications domain. This means you can't catch it. At all. But I'll get to that in a minute. I'd like to start by explaining when it's a good idea to use a try/catch.
Using try/catches are a good when you can't calculate everything. For example, network issues. Sure -- you can ping a server. Doesn't mean the database is up. Sure, you can do many other things such as test the database (SELECT 0 comes to mind) however you can't calculate the network being dropping in the middle of everything. So far the only method I've found that is reliable is using a try/catch because the medium is so volatile, even after doing your check it can go down. So, the answer is to use a transaction (if you are doing multiple inserts) and using a try/catch looking for a SqlException. But this won't stop everything... I'll explain how I pissed off .NET with pictures and network failures.
Apaprently .NET handles System.Image in an intersting way. It is only a pointer/reference to the location of the file. What this means is that if you reference an image across the network and it goes down -- System.Image throws an exception. In fact it throws a pretty gnarly exception that you can't catch. Here is how you can reproduce this. Create a blank project. Drop in a timer and set the interval to 15 second or an amount that a little longer than it will take you to unplug your network cable. Modify the form load to engage the timer and have the form load the image. Have the form hide itself. On the timer tick, have the form display itself (this will cause the image to be repointed and thus having to look at the network to pull the image). Start the app and then unplug your network cable. It should have a nasty crash. One you can't stop. Another thing I learned is that it puts a lock on the file so you can't delete/modify it. So far the only idea I've come up with to have an in-memory cache is to load it in to a byte array from a stream and have you rference that byte array. What I don't know is if it's still a pointer to the file. I might assume that that stream will push the bits in to the byte array and thusly remove the dependance of the network.
Using SubSonic the code to implement transactions and SqlException is pretty straight foreward and you would just surround your .Saves (personally, I like to place everything in the try/catch block just to make the code look cleaner) with a System.Transactions.TransactionScope. A TransactionScope will put a lock on your table when doing an insert, and for good reason. This is because you can't rely on the data to be inserted until you do a commit. So this will stop people from seeing the data (or any in the table for that matter) so they don't reference it and then have the transaction cancelled and then you have an inconsistant database. If you want a way around that then you should use System.Transactions.TransactionOptions. TO has only two properties -- timeout and the scope options. Microsoft has an article explaining the enumerations and their values and which ones you would want to use for which circumstances. I should probably link it but I'm too lazy.
I'm in the process of getting an app going under .NET 3.0 using Visual Studio 2008 to see the differences in speed between Linq, SubSonic, and using a plain SQLDataAdapter. Yes, I'm well aware using a DataReader / DataAdapter is going to be faster -- the question is: How much faster? Is it worth making the code that much more cleaner? What about if the code is only going to be used once 4 months from now and, hopefully, never touched again? My problem is I want something to be as fast as possible without making code ugly and without spending a lot of time tweaking -- so I'm taking the shotgun approach. At the moment, it seems like they are all fairly close unless you are dealing with LOTS (millions) of records -- which I am not. SubSonic has its fair share of bugs such as puking if a table doesn't have a primary key. I can understand it not understanding how to right back or do complex queries, I just want to populate a collection with a 'Select *' type statement -- so it doesn't need a care about primary keys as I'm just going to loop through item in the collection and pull data as I see fit. Also, SubSonic seems to always do a 'SELECT *' type query if I want to use the simple code such as 'Some_DB.Some_TableCollection TC = new Some_DB.Some_TableCollection().Load();'. If I want to be specific then I have to use a Query class. Even worse, it's /horrible/ at doing OR's. They really prefer you to use views or stored procedures. Neither of those are bad, but I'm not writing code for something anyone else but me should be using -- so I don't care about exploits or any safety measures. Even if I was, this engine doesn't have any direct input mechanism, so good luck with that. Linq seems overly complicated. Fucking A is it a bitch to setup. Nothing seems automatic. Code doesn't /feel/ cleaner (so far from my, admittedly, little testing). Google doesn't seem to know how to use it either. I'm finding contradicting information. So, this means I have to do a "best case scenario" speed test to get both Linq and SubSonic working, which isn't a bad idea. One of the programming styles I learned was "assume everything will work perfectly, *then* make exceptions". The alternative, which is more common, is plan for the worst then if it's the best do X. The problem with that is you are always occurring the overhead of the worst instead of doing the best first and if it fails (Which it rarely should) then go to some exception handling. This places me back at square 1. Here is my game plan:
I'm also probably going to try some of this code on other DB engines such as MySQL and SQLite.