Category Archives: database

SQLite with .NET: excellent but some oddities

I have been porting a C# application which uses an MDB database (the old Access/JET format) to one that uses SQLite. The process has been relatively smooth, but I encountered a few oddities.

One is puzzling and is described by another user here. If you have a column that normally stores string values, but insert a string that happens to be numeric such as “12345”, then you get an invalid cast exception from the GetString method of the SQLite DataReader. The odd thing is that the GetFieldType method correctly returns String. You can overcome this by using GetValue and casting the result to a string, or calling GetString() on the result as in dr.GetValue().ToString().

Another strange one is date comparisons. In my case the application only stores dates, not times; but SQLite using the .NET provider stores the values as DateTime strings. The SQLite query engine returns false if you test whether “yyyy-mm-dd 00:00:00” is equal to “yyy-mm-dd”. The solution is to use the date function: date(datefield) = date(datevalue) works as you would expect. Alternatively you can test for a value between two dates, such as more than yesterday and less than tomorrow.

Performance is excellent, with SQLite . Unit tests of various parts of the application that make use of the database showed speed-ups of between 2 and 3 times faster then JET on average; one was 8 times faster. Note though that you must use transactions with SQLite (or disable synchronous operation) for bulk updates, otherwise database writes are very slow. The reason is that SQLite wraps every INSERT or UPDATE in a transaction by default. So you get the effect described here:

Actually, SQLite will easily do 50,000 or more INSERT statements per second on an average desktop computer. But it will only do a few dozen transactions per second. Transaction speed is limited by the rotational speed of your disk drive. A transaction normally requires two complete rotations of the disk platter, which on a 7200RPM disk drive limits you to about 60 transactions per second.

Without a transaction, a unit test that does a bulk insert, for example, took 3 minutes, versus 6 seconds for JET. Refactoring into several transactions reduced the SQLite time to 3 seconds, while JET went down to 5 seconds.

SQLite adds support for .NET Core 2.0 and .NET Standard 2.0

image

The open source SQLite database engine goes from strength to strength, largely by not changing that much: it remains small, fast, reliable, cross-platform, and completely free. The engine is written in C but there are many wrappers for different languages, a recent addition being .NET Core 2.0 and .NET Standard 2.0:

1.0.109.0: Add preliminary support for .NET Core 2.0 and the .NET Standard 2.0. Pursuant to [5c89cecd1b].

.NET developers using SQLite are fortunate in that System.Data.SQLite, the .NET provider, is supported by the SQLite team and has its own sub-site on sqlite.org. “The SQLite team is committed to supporting System.Data.SQLite long-term,” states the home page.

The addition of .NET Core 2.0 support is valuable, in part because .NET Core is where Microsoft’s energy is now focused, and will make it easier to write cross-platform code. There is a snag though: there is no official cross-platform GUI for .NET Core, which would be useful for SQLite given that it is a local database engine. However, Microsoft’s Xamarin framework, which is cross-platform, does support .NET Standard 2.0 so this should work though I have not tried it.

The truth is that almost any framework can be made to work with SQLite. I did some work myself on a wrapper for Delphi (Object Pascal) which still has some users today.

Back in 2007 I interviewed SQLite’s creator, Dr Richard Hipp, for Guardian Technology. Worth a read if you are wondering why SQLite, unlike most open source projects, has no licence: it is simply public domain:

“I looked at all of the licences,” Hipp says, “and I thought, why not just put it in the public domain? Why have these restrictions on it? I never expected to make one penny. I just wanted to make it available to other people to solve their problem.”

Inside Azure Cosmos DB: Microsoft’s preferred database manager for its own high-scale applications

At Microsoft’s Build event in May this year I interviewed Dharma Shukla, Technical Fellow for the Azure Data group, about Cosmos DB. I enjoyed the interview but have not made use of the material until now, so even though Build was some time back I wanted to share some of his remarks.

Cosmos DB is Microsoft’s cloud-hosted NoSQL database. It began life as DocumentDB, and was re-launched as Cosmos DB at Build 2017. There are several things I did not appreciate at the time. One was how much use Microsoft itself makes of Cosmos DB, including for Azure Active Directory, the identity provider behind Office 365. Another was how low Cosmos DB sits in the overall Azure cloud system. It is a foundational piece, as Shukla explains below.

image

There were several Cosmos DB announcements at Build. What’s new?

“Multi-master is one of the capabilities that we announced yesterday. It allows developers to scale writes all around the world. Until yesterday Cosmos DB allowed you to scale writes in a single region but reads all around the world. Now we allow developers to scale reads and writes homogeneously all round the world. This is a huge deal for apps like IoT, connected cars, sensors, wearables. The amount of writes are far more than the amount of reads.

“The second thing is that now you get single-digit millisecond write latencies at the 99 percentile not just in one region.

“And the third piece is that what falls out of this high availability. The window of failover, the time it takes to failover from one region when a disaster happens, to the other, has shrunk significantly.

“It’s the only system I know of that has married the high consistency models that we have exposed with multi-master capability as well. It had to reach a certain level of maturity, testing it with first-party Microsoft applications at scale and then with a select set of external customers. That’s why it took us a long time.

“We also announced the ability to have your Cosmos Db database in your own VNet (virtual network). It’s a huge deal for enterprises where they want to make sure that no data leaks out of that VNet. To do it for a global distributed database is specially hard because you have to close all the transitive networking dependencies.”

image
Technical Fellow Dharma Shukla

Does Cosmos DB work on Azure Stack?

“We are in the process of going to Azure Stack. Azure Stack is one of the top customer asks. A lot of customers want a hybrid Cosmos DB on Azure Stack as well as in Azure and then have Active – Active. One of the design considerations for multi master is for edge devices. Right now Azure has about 50 regions. Azure’s going to expand to let’s say 200 regions. So a customer’s single Cosmos DB table spanning all these regions is one level of scalability. But the architecture is such that if you directly attach lots of Azure Stack devices, or you have sensors and edge devices, they can also pretend to be replicas. They can also pretend to be an Azure region. So you can attach billions of endpoints to your table. Some of those endpoints could be Azure regions, some of them could be instances of Azure Stack, or IoT hub, or edge devices. This kind of scalability is core to the system.”

Have customers asked for any additional APIs into Cosmos DB?

“There is a list of APIs, HBase, richer SQL, there are a number of such API requests. The good news is that the system has been built in a way that adding new APIs is relatively easy addition. So depending on the demand we continue to add APIs.”

Can you tell me anything about how you’ve implemented Cosmos DB? I know you use Service Fabric. Do you use other Azure services?

“We have dedicated clusters of compute machines. Cosmos DB is a Ring 0 service. So it’s there any time Azure opens a new region, Cosmos DB clusters have provision by default. Just like compute, storage, Cosmos DB is also one of the Ring 0 services which is the bottommost. Azure Active Directory for example depends on Cosmos DB. So Cosmos DB cannot take a dependency on Active Directory.

“The dependency that we have is our own clusters and machines, on which we put Service Fabric. For deployment of Cosmos DB code itself, we use Service Fabric. For some of the load balancing aspects we use Service Fabric. The partition management, global distribution, replication, is our own. So Cosmos DB is layered on top of Service Fabric, it is a Service Fabric application. But then it takes over. Once the Cosmos DB bits are laid out on the machine then its replication and partition management and distribution pieces take over. So that is the layering.

“Other than that there is no dependency on Azure. And that is why one of the salient aspects of this is that you can take the system and host it easily in places like Azure Stack. The dependencies are very small.

“We don’t use Azure Storage because of that dependency. So we store the data locally and then replicate it. And all of that data is also encrypted at rest.”

So when you say it is not currently in Azure Stack, it’s there underneath, but you haven’t surfaced it?

“It is in a defunct mode. We have to do a lot of work to light it up. When we light up it on such on-prem or private cloud devices, we want to enable this active to active pathway. So you are replicating your data and that is getting synchronized with the cloud and Azure Stack is one of the sockets.”

Microsoft itself is using Cosmos DB. How far back does this go? Azure AD is quite old now. Was it always on Cosmos DB / DocumentDB?

“Over the years Office 365, Xbox, Skype, Bing, and more and more of Azure services, have started moving. Now it has almost become ubiquitous. Because it’s at the bottom of the stack, taking a dependency on it is very easy.

“Azure Active Directory consists of a set of microservices. So they progressively have moved to Cosmos DB. Same situation with Dynamics, and our slew of such applications. Skype is by and large on Cosmos DB now. There are still some fragments of the past.  Xbox and the Microsoft Store and others are running on it.”

Do you think your customers are good at making the right choices over which database technology to use? I do pick up some uncertainty about this.

“We are working on making sure that we provide that clarity. Postgres and MySQL and MariaDB and SQL Server, Azure SQL and elastic pools, managed instances, there is a whole slew of relational offerings. Then we have Cosmos DB and then lots of analytical offerings as well.

“If you are a relational app, and if you are using a relational database, and you are migrating from on-prem to Azure, then we recommend the relational family. It comes with this fundamental scale caveat which is that up to 4TB. Most of those customers are settled because they have designed the app around those sorts of scalability limitations.

“A subset of those customers, and a whole bunch of brand new customers, are willing to re-write the app. They know that that they want to come to cloud for scale. So then we pitch Cosmos DB.

“Then there are customers who want to do massive scale offline analytical processing. So there is, Databricks, Spark, HD Insight, and that set of services.

“We realise there are grey lines between these offerings. We’re tightening up the guidance, it’s valid feedback.”

Any numbers to flesh out the idea that this is a fast-growing service for Microsoft?

“I can tell you that the number of new clusters we provision every week is far more than the total number of clusters we had in the first month. The growth is staggering.”

Embarcadero acquires AnyDAC data access libraries for Delphi, C++ Builder

Embarcadero has acquired the AnyDAC data access libraries from DA-SOFT, including its main author Dmitry Arefiev. These libraries support Delphi and C++ Builder and support connections to a wide range of database servers, including SQL Server, DB2, Oracle, PostgreSQL, SQLite, Interbase, Firebird, Microsoft Access, and any ODBC connection.

AnyDAC is well liked by Delphi devlopers for its performance, features and support. Its architecture includes a local data storage layer similar to the dataset in Microsoft’s ADO.NET.

While it is good to see this set of libraries added to the mainstream product, developers are asking the obvious questions. What will happen to the cost and to the support for AnyDAC?

I am both scared and relieved at the same time. We took the AnyDAC route a few years ago, getting out of the BDE, and we have not regretted it for a second. My fright comes from the fact that DA-SOFT could not be beat in terms of customer support…many of us have come to DA-SOFT with a problem, only to have it fixed the next day. It is only realistic to think that with Embarcadero, this will no longer be the case.

says Dan Hacker on product manager Marco Cantu’s blog.

Microsoft results: old business model still humming, future a concern

Microsoft has published its latest financials. Here is my at-a-glance summary:

Quarter ending March 31st 2012 vs quarter ending March 31st 2011, $millions

Segment Revenue Change Profit Change
Client (Windows + Live) 4624 +177 2952 +160
Server and Tools 4572 +386 1738 +285
Online 707 +40 -479 +297
Business (Office) 5814 +485 3770 +457
Entertainment and devices 1616 -319 -229 -439

What is notable? Well, Windows 7 is still driving Enterprise sales, but more striking is the success of Microsoft’s server business. The company reports “double-digit” growth for SQL Server and more than 20% growth in System Center. This seems to be evidence that the company’s private cloud strategy is working; and from what I have seen of the forthcoming Server 8, I expect it to continue to work.

Losing $229m in entertainment and devices seems careless though the beleaguered Windows Phone must be in there too. Windows Phone is not mentioned in the press release.

Overall these are impressive figures for a company widely perceived as being overtaken by Apple, Google and Amazon in the things that matter for the future: mobile, internet and cloud.

At the same time, those “things that matter” are exactly the areas of weakness, which must be a concern.

CodeRage free online conference for Delphi and RAD Studio starts next week

Embarcadero’s CodeRage virtual conference starts next week from October 17 2011, and is worth a look if you have any interest in Delphi or the new RAD Studio XE2.

There are sessions on 64-bit Delphi, the new cross-platform FireMonkey framwork, the new LiveBindings data binding system, Prism (Delphi for .NET), and extras including a session on Regular Expressions in Delphi and elsewhere, Dependency Injection and Delphi Spring, unit testing with Delphi, and using 3D graphics in business applications.

Of course you could wait for the replays to be available, but if this is like previous events there is a chance to ask questions to people who might actually know the answers, so there is an advantage to the live event – though the event is schedules for Pacific Time so the afternoon ones involve a late night if you are in the UK.

Full circle for Microsoft database APIs as OLEDB for SQL Server is deprecated

Microsoft’s Eric Nelson has posted about how the OLEDB driver for SQL Server is being deprecated and will not be supported beyond “Denali”, the forthcoming version.

OLEDB was created to be the successor to ODBC – expanding the supported data sources/models to include things other than relational databases. Notably OLEDB was tightly tied to a Windows only technology (COM) whilst ODBC was not (Although we did try and take COM cross platform via partners)

ODBC never did get replaced. What actually happened is that ODBC remained the dominant of the two technologies for many scenarios – and became increasingly used on none Windows platforms and has become the de-facto industry standard for native relational data access.

ODBC was as I recall Microsoft’s first attempt at creating a universal database API.

The death of OLEDB will be slow, according to Nelson. The OLEDB driver for Denali will be supported for seven years following Denali’s release. He also says that OLEDB itself, as opposed to the SQL Server OLEDB driver, is not necessarily being deprecated; though frankly if Microsoft ceases supporting it with its own database I cannot see much future for it.

Note that ADO.NET, which to some extent replaced OLEDB, is not being deprecated. However ADO.NET is only usable from .NET applications. When you consider that Microsoft may be to some extent tilting away from .NET and towards native code, the deprecation of OLEDB becomes even more significant.

ODBC is not particularly easy to use in its raw form. However, you can wrap ODBC with, yes, an OLEDB provider or an ADO.NET provider; or you can wrap the whole lot in an object-relational framework such as Entity Framework.

One more chapter in the long, strange and tortuous history of Microsoft’s data APIs.

Using SQLite 3 with Unicode in Delphi

I’ve updated my SQLite3 Delphi wrapper for Unicode in Delphi 2009 and higher. Previous versions of the wrapper ducked the issue by using Ansi strings throughout.

image

I actually used Embarcadero Delphi XE for the development, but I would expect it to work in Delphi 2009 and higher, since it was in that version that Delphi first properly supported Unicode. Converting older Delphi projects is meant to be seamless, except in cases where you are using pointers or doing interop with native DLLs; of course this wrapper does both.

The SQLite 3 API expects either UTF8 or Unicode strings. To be more precise, some functions have Unicode versions indicated with a “16” suffix, and some do not, in which case they expect UTF8 if they accept string values. Although UTF8 strings support Unicode characters, most characters generally occupy a single byte just as in Ansi strings, so one of the things I discovered was that I could not simply rely on PChar, Delphi’s null-terminated string type, which from Delphi 2009 is a Unicode type with double-byte characters. Instead, for cases where the SQLite 3 API expects a UTF8 string, I have used PAnsiChar as before.

It is all somewhat confusing, and there are a few cases where Delphi does not do quite what you would expect. I recommend Marco Cantu’s paper Delphi and Unicode [pdf], one of the best resources I found. This article by Nick Hodges on Unicodifying your code is handy too.

Finally, in the example I keep an object in memory and it is easy to end up with code paths that do not free it. I love this feature of Delphi (since at least 2007) which informs you of your mistake when the application closes:

image

I have uploaded the code and you can find it linked here.

Update: My assumption that Delphi 2010 would work the same way as Delphi XE was incorrect. I have some code that reads a blob field containing a UTF8 string and returns it as a string. I read the stream into a byte array and then cast it to a UTF8String:

str := UTF8String(bytes);

where str is a UTF8string variable. This worked in Delphi XE, but in Delphi 2010 I got garbage. I have modified the code to use a TStringStream and now it works in both.

Why Oracle is immoveable in the Enterprise

At Oracle OpenWorld yesterday I spoke to an attendee from a global enterprise. His company is a big IBM customer and would like to standardise on DB2. To some extent it does, but there is still around 30% Oracle and significant usage of Microsoft SQL Server. Why three database platforms when they would prefer to settle on one? Applications, which in many cases are only certified for a specific database manager.

I was at MySQL Sunday earlier in the day, and asked whether he had any interest in Oracle’s open source database product. As you would expect, he said it was enough trouble maintaining three different systems; the last thing he wanted was a fourth.

Oracle: a good home for MySQL?

I’m not able to attend the whole of Oracle OpenWorld / JavaOne, but I have sneaked in to MySQL Sunday, which is a half-day pre-conference event. One of the questions that interests me: is MySQL in safe hands at Oracle, or will it be allowed to wither in order to safeguard Oracle’s closed-source database business?

It is an obvious question, but not necessarily a sensible one. There is some evidence for a change in direction. Prior to the takeover, the MySQL team was working on a database engine called Falcon, intended to lift the product into the realm of enterprise database management. Oracle put Falcon on the shelf; Oracle veteran Edward Screven (who also gave the keynote here) said that the real rationale for Falcon was that InnoDB would be somehow jiggered by Oracle, and that now both MySQL and InnoDB were at Oracle, it made no sense.

Context: InnoDB is the grown-up database engine for MySQL, with support for transactions, and already belonged to Oracle from an earlier acquisition.

There may be something in it; but it is also true that Oracle has fine-tuned the positioning of MySQL. Screven today emphasised that MySQL is Oracle’s small and nimble database manager; it is “quite performant and quite functional”, he said; the word “quite” betraying a measure of corporate internal conflict. Screven described how Oracle has improved the performance of MySQL on Windows and is cheerful about the possibility of it taking share from Microsoft’s SQL Server.

It is important to look at the actions as well as the words. Today Oracle announced the release candidate of MySQL 5.5, which uses InnoDB by default, and has performance and scalability improvements that are dramatic in certain scenarios, as well as new and improved tools. InnoDB is forging ahead, with the team working especially on taking better advantage of multi-core systems; we also heard about full text search coming to the engine.

The scalability of MySQL is demonstrated by some of its best-known deployments, including Facebook and Wikipedia. Facebook’s Mark Callaghan spoke today about making MySQL work well, and gave some statistics concerning peak usage: 450 million rows read per second, 3.5 million rows changed per second, query response time 4ms.

If pressed, Screven talks about complexity and reliability with critical data as factors that point to an Oracle rather than a MySQL solution, rather than lack of scalability.

In practice it matters little. No enterprise currently using an Oracle database is going to move to MySQL; aside from doubts over its capability, it is far too difficult and risky to switch your database manager to an alternative, since each one has its own language and its own optimisations. Further, Oracle’s application platform is built on its own database and that will not change. Customers are thoroughly locked in.

What this means is that Oracle can afford to support MySQL’s continuing development without risk of cannibalising its own business. In fact, MySQL presents an opportunity to get into new markets. Oracle is not the ideal steward for this important open source project, but it is working out OK so far.