Permalink

Thoughts on a Nice database API

09 JUL 2004

Over at Lambda, there's a discussion going on about database abstraction layers. This is something that I think about from time to time, and lately I've been wondering what sort of database API we should have for Nice.

It seems to me there are (at least) two major design decisions to consider. First, how will one express queries, and second, what facilities does one want to provide for converting query results into objects and back. Both of these are very tricky indeed.

Query Language

Obviously, embedded SQL is a choice that's available. It has the advantage of simplicity of implementation, and people who know SQL will of course be able to use it immediately. I'd really like to be able to statically check the SQL against a database, so that means we have to make changes to the parser like SQLJ.

There are problems with SQL, though. For one thing, no two database vendors seem to deal with it the same way. So already we need some kind of abstraction layer, some sort of "generic" SQL that can be translated into vendor-specific SQL. Lots of systems for Java do this kind of thing already (though they are often hard to use, dynamically type checked, or both), and SchemeQL and HaskellDB may provide some inspiration.

There are also simple but important issues like naming to consider: I'd like to be able to write code that works with names like "Employee_Manager", even though the database table's actual name might be "emp_mgr", or even "emp_mgr_join_table_23". So some sort of mapping facility is necessary, too. We haven't even gotten to the impedance mismatch headache yet, and already our work is cut out for us.

Ideally, I'd like to have a facility that's more expressive than SQL. If the query/update that I've expressed turned out to be too complicated for a single SQL statements, then it would get translated into multiple SQL queries, or a stored procedure, or whatever is most appropriate (and efficient) for the target DBMS.

Object Mapping

First question is, do we even need one? When one has an object mapping, one tends to want to deal with objects exclusively, to forget there is a relational database out there at all. This can lead to problems such as out-of-date cached copies of objects, especially if the database has triggers which modify the data after the program has committed its transaction. Some Java tools try to address this problem by re-querying the objects that have just been committed, but this is a performance problem, because in many cases the objects won't actually be used again.

Many times classes are designed to hold all the data that are relevant to a given concept. So an Employee will have a GivenName, a FamilyName, a TaxID, a Salary, one or more Managers, a HireDate, etc. However, in some cases we wish to deal with large numbers of Employees, and we don't actually need to use all these fields. Perhaps we only want to give pay raises based on their HireDate anniversary, so we don't actually need the names, tax ID, and so on. With an object mapping, you tend to get them anyway, which is wasteful of resources. Some systems let you specify that you only want certain attributes retrieved, and the rest are left as null (or zero or false, if they're primitives), but that means you have to remember that the Employee object is not really an Employee object, and you had better not let references to it escape into parts of the program that expect fully loaded Employees, or you'll get lots of errors that don't make any sense. This problem is even more unacceptable in Nice, because the type system prevents undeclared null values, and so getting a NullPointerException is rare event indeed, usually stemming from imported Java code. So partial objects whose attributes are set to null if they're not loaded are really a nasty way of lying to the compiler. There has to be a better way.

An object mapping also encourages you (well, maybe not you, but what about your most junior team member?) to implement "give a 5% raise to each employee whose hire anniversary is today" as (roughly):


ObjectResultSet emps = connection.executeQuery("select * from emp");

while(emps.next()) {

	Employee emp = emps.get();

	if (isHireAnniversary(emp))

	   emp.setSalary(emp.getSalary() + emp.getSalary() * 0.05);

}

commit();

which is going to cause every employee in the database to be loaded into memory, which will either require a huge amount of memory, or else keep the garbage collector quite busy, depending on whether the implementation holds extra references to the queried objects or not. Even more importantly, it will require a huge amount of I/O. A better solution would be to implement isHireAnniversary at the database level, and offload that calculation onto the DBMS. A goal that would be really useful to achieve would be to have the code for running it in the application and the code for running it in the database look the same, so you could just say "run this transaction here" or "run this transaction on the database server." Hey, I didn't say it would necessarily be easy!

Another argument against an object mapping in Nice is that Nice already supports tuples! In Nice, the example above could be rewritten without objects like this:


Iterator<(int, Date, BigDecimal)> emps = 

	connection.executeQuery("select id, hiredate, salary from emp");

for((id, hire, salary):emps) {

	if (isHireAnniversary(hire))

	   connection.update("update emp set salary = " +

	   	salary + salary * 0.05 + " where id = " + id ); 

}

commit();

Of course, this trades better performance for more complicated code - you can't just call "setSalary", you have to write an update statement. It also doesn't solve the I/O problem, but it does reduce it - we're not bringing back any fields that we won't even use anymore.

One last gripe about object mappings: If you load an Employee, and then call "emp.getManager().getDepartment().getDepartmentManager().getManager()", this will actually result in no less than four sequential trips to the database to load the next object in the chain. Argh!

More ruminations next time, perhaps... Anyone have any good ideas they'd like to share?