Tuesday, August 23, 2011

Using DBUnit through Ant with a large MS SQL Database

The following describes my experiences with using DBUnit through Ant with a large MS SQL database, and all of the adventures I had. The intention is to use DBUnit though Ant to export a database as an XML file, to clear the database, and to re-populate it using that XML file.

Why?

Automated acceptance testing. You start with an XML backup of what you want your base system to contain, refresh your database with that backup, run your suite of automated acceptance tests, and then restore the XML backup again. That way your tests can do things specific to the application like create users, invoices, etc, and otherwise mess around with every conceivable operation. When you are done you can restore everything to the way it was before.

Running it though Ant lets you easily integrate this refresh into other automation tools such as Jenkins (Hudson), TeamCity, and so on.

Did you say large?

Yes I did, but in this context I am using it to refer to a database with a large number of tables, not a large number of records. A large number of tables is a catch-all for several circumstances which you cause you trouble, such as with tables which have keywords for names, foreign key dependencies between tables, tables with timestamp columns, and some more items I will get address.

But I want a lot of records!

There are a couple of problems with using a lot of records with DBUnit, the first of which is that if you are exporting and importing you will have to deal with very long save and load times. As a general rule you probably are aiming for an export XML file size of under 3 MB, which is around 2,000 records depending on what it is your tables. The second problem has to do with memory, but I have a workaround for that as well.

DBUnit with Ant

So you want to use Ant? You are then going to have to write your own task by extending the DBUnit Ant Task. This is because there are options that need to be set [at least to my knowledge] that cannot be set in the DBUnit Ant Task that is provided to you.

To start out you will need to extend three things:

  1. The DbUnitTask, which is the <dbunit> tag. This allows you to define your own Export and Operation tags.
  2. The Export, which is the <export> tag. This is for changing how the export works.
  3. The Operation, which is the <operation> tag. This is for changing how the insert works.

MSDBUnitTask.java

public class MSDBUnitTask extends DbUnitTask {

    @SuppressWarnings("unchecked")
    public void addMSExport(MSExport export) {
        this.getSteps().add(export);
    }
    @SuppressWarnings("unchecked")
    public void addMSOperation(MSOperation operation) {
        this.getSteps().add(operation);
    }
}

MSExport.java

public class MSExport extends Export   {

    private String tableNames;
    @Override
    public void execute(IDatabaseConnection arg0) throws DatabaseUnitException {
        // code goes here
    }

    public String getTableNames() {
        return tableNames;
    }

    public void setTableNames(String tableNames) {
        this.tableNames = tableNames;
    }

}

MSOperation.java

public class MSOperation extends Operation   {

        @Override
    public void execute(IDatabaseConnection arg0) throws DatabaseUnitException {
        // code goes here
    }

}

Including your custom task in Ant

This assumes that the Ant classes from above are compiled into a JAR and present in the ${lib.dir} along with the dbunit.jar

<path id="classpath">

    <fileset dir="${lib.dir}">
        <include name="*.jar" />
    </fileset>   
</path>

<taskdef name="msdbunit" classname="com.foo.ant.dbunit.MSDBUnitTask" classpathref="classpath" />

 

Adventures in Exporting

The Java code here goes in the execute method of MSExport.

This is where the majority of problems start. As it turns out you can export with ease, but can end up with XML that had invalid characters and some other interesting things that cannot be imported.

Let’s start with the result we want (and need): the XML to export the database:

<target name="db-export">

    <msdbunit
        driver="${db.driver}" 
        url="${db.url}" 
        userid="${db.username}" 
        password="${db.password}">

        <dbconfig>
            <property name="escapePattern" value="[?]" /> 
            <property name="datatypeFactory" value="org.dbunit.ext.mssql.MsSqlDataTypeFactory" /> 
        </dbconfig>
        <msexport format="xml" tableNames="${insert.table.names}" dest="full_export.xml" />
    </msdbunit>

</target>

The msdbunit and msexport tags were explained earlier, and the base Java code for creating them was given.

Issue #1: Export Table Order

You must specify the order of tables to do the export in insertion order. This is done by passing in a comma separated list of tables name from ${insert.table.names} into the <msexport> tag as the tableNames attribute.

The reason for this order is that you must take into account FK dependencies when both inserting and deleting. For example if Foo.A is a FK to Bar.A, then you can’t start out by inserting records into Foo; you have to insert into Bar first. From the deletion perspective it is the opposite: You can’t delete from Bar since Foo can reference it and cause the delete to fail.

The MSExport task already has tableNames defined, so in order to get the result as a list of tables all that needs to be done is this:

String[] tableNamesArray = tableNames.split(",");

ITableFilter filter = new SequenceTableFilter(tableNamesArray);

IDataSet dataset = new FilteredDataSet(filter, arg0.createDataSet());

The second part of this is do use this array to define the table sequence. You must do this manually because even though it can be done automatically, with 100+ tables it will take minutes as opposed to a fraction of a second.

Issue #2: Columns that are of type TIMESTAMP or default DATETIME

There is an issue if you export a column of the TIMESTAMP type or of the defaulted DATETIME, because when you try and import it you will get an error regarding not being able to insert a value into the column.

The exact error you get is the following:

Cannot insert an explicit value into a timestamp column

The solution is not to export these columns, which requires the use of a column filter. This was the primary reason for abandoning the default Ant task, as I couldn’t figure out a good way to do a large column filtering.

List<ITable> updatedTables = new ArrayList<ITable>();

for (String tableName : tableNamesArray) {
                ITable table = dataset.getTable(tableName);
                List<Column> excludedColumns = new ArrayList<Column>();
                Column[] columns = table.getTableMetaData().getColumns();
                //For each column...
                for (Column column : columns) {
                    //TIMESTAMP columns have to be ignored in MSSQL because they cannot have an explicit insert
                    //Unfortunately DATETIME in MSSQL also shows here as a TIMESTAMP so the only way to
                    //determine defaulted DATETIME or TIMESTAMP is by looking for defaults and the SQL Type name
                    if ( (column.getDefaultValue() != null && column.getDataType() == DataType.TIMESTAMP) ||
                            column.getSqlTypeName().equals("timestamp")) {
                        excludedColumns.add(column);
                    }
                }
                //convert the list of excluded columns to an array
                Column[] excluded =  excludedColumns.toArray(new Column[excludedColumns.size()]);
                //create a new ITable that excludes the filtered columns
                ITable filteredTable = DefaultColumnFilter.excludedColumnsTable(table,excluded);
                //keep track of the modified table
                updatedTables.add(filteredTable);               
            }
            //Take all of the modified table and create a new dataset
            ITable[] updateTableArray =  updatedTables.toArray(new ITable[updatedTables.size()]);
            CompositeDataSet compositeDataSet = new CompositeDataSet(updateTableArray);

This code iterates over all of the tables, looks for datetime and timestamp columns, excludes them from that table, and adds them to a new list of tables with the appropriate columns excluded. It then constructs a new composite dataset containing all of the new tables and columns.

Issue #3: Exporting your customizations

When you extend the Export task, you have to then define how to export.

//Get the format and file destination
String format = this.getFormat();
File dest = this.getDest();
//Output based on the format
if (format.equalsIgnoreCase(FORMAT_CSV)) {
    CsvDataSetWriter.write(compositeDataSet, dest);
else if (format.equalsIgnoreCase(FORMAT_FLAT)) {
    FlatXmlDataSet.write(compositeDataSet, new FileOutputStream(dest));
} else if (format.equalsIgnoreCase(FORMAT_XML)) {
    XmlDataSet.write(compositeDataSet, new FileOutputStream(dest));
} else if (format.equalsIgnoreCase(FORMAT_DTD)) {
    FlatDtdDataSet.write(dataset, new FileOutputStream(dest));
} else {
    throw new DatabaseUnitException(format+" is not a recognized format.");
}

Issue #4: Exporting Tables that have keywords for names

For example User is a keyword, but you can have a table of this name in MS SQL and access it using [User]. In order to deal with this you want to put all table names within [].

This can be configured in the Ant XML using the escapePattern property of dbconfig:

<target name="db-export">

    <msdbunit
        driver="${db.driver}" 
        url="${db.url}" 
        userid="${db.username}" 
        password="${db.password}">

        <dbconfig>
            <property name="escapePattern" value="[?]" /> 
            <property name="datatypeFactory" value="org.dbunit.ext.mssql.MsSqlDataTypeFactory" /> 
        </dbconfig>
        <msexport format="xml" tableNames="${insert.table.names}" dest="full_export.xml" />
    </msdbunit>

</target>

Issue #5: Data Type Factory

In order to be able to insert records into a MS SQL database using DBUnit, you have to specify the data factory to use as a part of the export. If you don’t any time you try and import a result containing a primary key you will get the following error:

Cannot insert explicit value for identity column in table

This issue can also be resolved in the default Ant using the dataTypeFactory property in the dbconfig:

<target name="db-export">

    <msdbunit
        driver="${db.driver}" 
        url="${db.url}" 
        userid="${db.username}" 
        password="${db.password}">

        <dbconfig>
            <property name="escapePattern" value="[?]" /> 
            <property name="datatypeFactory" value="org.dbunit.ext.mssql.MsSqlDataTypeFactory" /> 
        </dbconfig>
        <msexport format="xml" tableNames="${insert.table.names}" dest="full_export.xml" />
    </msdbunit>

</target>

Issue #6: Which table failed to export?

This issue is that when you are exporting your database and something goes wrong, it is not specified which table was the problem. You just get an error message such as “Cannot insert explicit value for identity column in table” and are left to figure out which table may have caused it.

Since we have our own Ant task that iterates over tables, this is pretty easy to do: put a System.out.println with the table name at the start of the loop. That was if something breaks, you know which table caused it.

Deleting from the Database

If you exported correctly this isn’t a problem, and doesn’t require any custom work beyond the escapePattern and data type factory. It should be noted that before inserting into your database you want to clear it using your export file first. The CLEAN_INSERT operation is supposed to do this on insert for you, but in my experience I have never been able to get it work on a large database. It ends up throwing constraint violations, either because of the database, a bug, or my misuse of the operation.

<target name="db-delete">

    <msdbunit
        driver="${db.driver}" 
        url="${db.url}" 
        userid="${db.username}" 
        password="${db.password}">

        <dbconfig>
            <property name="escapePattern" value="[?]" /> 
            <property name="datatypeFactory" value="org.dbunit.ext.mssql.MsSqlDataTypeFactory" /> 
        </dbconfig>
        <operation type=”DELETE_ALL” format="xml" src="full_export.xml" />
    </msdbunit>

</target>

Populating the Database

The Java code here goes in the execute method of MSOperation.

The same as with deleting from the database, if you correctly exported the dataset you should not have any problems. There is one feature though that is nice to have, which is the ability in the case of an error to know which table cause the error.

        //Only perform an operation for every table if doing a clean insert for MS SQL
        if (this.getType().equalsIgnoreCase("MSSQL_CLEAN_INSERT")) {
            try {
                FileReader reader = new FileReader(this.getSrc());
                String format = this.getFormat();
                IDataSet dataset = null;
                if (format.equalsIgnoreCase(FORMAT_XML)) {
                    dataset = new XmlDataSet(reader);
                } else {
                    throw new DatabaseUnitException("xml is the only supported format.");
                }
                String[] tableNames = dataset.getTableNames();
                //For each table, create a new dataset and execute the operation because:
                for (int i = 0; i < tableNames.length; i++) {
                    String tableName = tableNames[i];
                    System.out.println("Populating table "+tableName);
                    ITable table = dataset.getTable(tableName);
                    ITable[] iTables = new ITable[1];
                    iTables[0] = table;
                    CompositeDataSet composite = new CompositeDataSet(iTables);
                    this.getDbOperation().execute(connection, composite);
                }
            } catch (Exception e) {
                e.printStackTrace();
                throw new DatabaseUnitException(e);
            }
        } else {
            super.execute(connection);
        }

    }

Beyond that all you need is the following:

  • MSSQL_CLEAN_INSERT
  • The MS SQL Data Type Factory
  • The escapePattern for tables with keywords for names

<target name="db-populate">

    <msdbunit
        driver="${db.driver}" 
        url="${db.url}" 
        userid="${db.username}" 
        password="${db.password}">

        <dbconfig>
            <property name="escapePattern" value="[?]" /> 
            <property name="datatypeFactory" value="org.dbunit.ext.mssql.MsSqlDataTypeFactory" /> 
        </dbconfig>
        <msoperation type=”MSSQL_CLEAN_INSERT” format="xml" src="full_export.xml" />
    </msdbunit>

</target>

Where did I get my information?

The internet combined with trial and error. Here are the sites I used for information:

No comments:

Contributors