Monday, July 27, 2009

Hibernate import.sql

I knew that hibernate had a mechanism to load an sql script on startup, but I never took the time to track it down. Today I finally decided it was worth the effort. The mechanism is fairly simple, but not well documented. I found the following on the hibernate web site:

Ant tasks for schema creation and documentation
...
Also note that since Hibernate 3.1 you can include a file called "import.sql" in the runtime classpath of Hibernate. At the time of schema export it will execute the SQL statements contained in that file after the schema has been exported.
...

Emmanuel Bernard (Hibernate Search) also wrote a brief blurb about import.sql on the hibernate blog:

import.sql: easily import data in your unit tests

Hibernate has a neat little feature that is heavily under-documented and unknown. You can execute an SQL script during the SessionFactory creation right after the database schema generation to import data in a fresh database. You just need to add a file named import.sql in your classpath root and set either create or create-dropas your hibernate.hbm2ddl.auto property.

I use it for Hibernate Search in Action now that I have started the query chapter. It initializes my database with a fresh set of data for my unit tests. JBoss Seam also uses it a lot in the various examples. import.sql is a very simple feature but is quite useful at time. Remember that the SQL might be dependent on your database (ah portability!).


#import.sql file
delete from PRODUCTS
insert into PRODUCTS (PROD_ID, ASIN, TITLE, PRICE, IMAGE_URL, DESCRIPTION) values ('1', '630522577X', 'My Fair Lady', 19.98, '630522577X.jpg', 'My Fair blah blah...');

insert into PRODUCTS (PROD_ID, ASIN, TITLE, PRICE, IMAGE_URL, DESCRIPTION) values ('2', 'B00003CXCD', 'Roman Holiday ', 12.98, 'B00003CXCD.jpg', 'We could argue that blah blah');


For more information about this feature, check Eyal's blog, he wrote a nice little entry about it. Remember if you want to add additional database objects (indexes, tables and so on), you can also use the auxiliary database objects feature.


import.sql works as expected. If you are using maven, simply create the import.sql in your src/test/resources or src/main/resources and the sql will be applied after the hibernate ddl statements have executed.

Update (2009/09/08)

While it is possible to change the file name of the imported script from "import.sql" to a user specified name, it is not possible to import multiple scripts. It looks like it would be fairly straight forward to support multiple scripts. Below is an extract from org/hibernate/tool/hbm2ddl/SchemaExport.java where the import is performed.




76 private String importFile = "/import.sql";

...

237 try {
238 InputStream stream = ConfigHelper.getResourceAsStream( importFile );
239 importFileReader = new InputStreamReader( stream );
240 }

...

261 if ( !justDrop ) {
262 create( script, export, outputFileWriter, statement );
263 if ( export && importFileReader != null ) {
264 importScript( importFileReader, statement );
265 }
266 }



Another approach to supporting multiple files is to use features of the database. For instance, h2 database provides a RUNSCRIPT command for executing script files. h2 only supports referencing scripts by file name, and not as resources on the classpath, but this could be changed using a custom function. Other databases should provide similar capabilities.

Saturday, April 11, 2009

Name That Class

I was thinking about class naming strategies recently and decided to make a list of commonly used suffixes for Java classes. While this post is more about idioms than patterns, compiling this list sent me back to 1995 (Design Patterns).

Access, Action, Adapter, Advisor, Aware
Base, Bean, Binding, Buffer, Builder
Cache, Callback, Changer, Checker, Child, Chooser, Collector, Constants, Context, Connector, Constants, Consumer, Container, Control, Controller, Converter
Data, Decoder, Default, Delegate, Descriptor
Editor, Element, Encoder, Entry, EntryPoint, Entity, Event, Expression, Extractor
Factory, Filter, Finder, Format, Function
Generator
Handler, Helper, Hints,Holder
Impl, Info, Input, Interceptor, Item
Key, Kind
List, Listener
Manager, Map, Marshaller, Mapper, Matcher, Member, Message, Mode, Model, Module
Node, Notification
Object, Operation, Output
Parameters, Permission, Policy, Populator, Processor, Producer, Provider, Properties, Proxy
Reader, Record, Reference, Renderer, Request, Response, Resolver, Resource, Result, Retriever
Selector, Source, Service, Spec, Standard, Strategy, Stream, Stub, Support
Template, Translator, Type
Unmarshaller, Utils
Value, Verifier, Validator, View
Wrapper, Writer


Please add your own favorites in the comments.

Saturday, February 16, 2008

InterWiki and Currie Syntax 1.0

The WikiPedia page on InterWiki has a link to the W3C draft specification Currie Syntax 1.0. This is not as powerful as some wiki engines provide, but it's an interesting approach.

Basically it defines a qname-like formatted reference inside square brackets that has its prefix replaced with the current namespace URI of the prefix. No prefix resolution mechanism is defined for non-XML documents.

As an example, consider the following fragment from an OGC web map context document:
<LayerList>
<Layer queryable="1" hidden="1" >
<Server service="WMS" version="1.1.1" title="World Map">
<OnlineResource type="simple"
xlink:href="http://www2.demis.nl/wms/wms.asp?wms=WorldMap"/>
</Server>
<SRS>EPSG:4326</SRS>
<Name>Ocean features</Name>
<Title>Ocean features</Title>
</Layer>
</LayerList>

Instead of using the full URL for the Demis WMS server, you could do something like the following:
<LayerList xmln:demis="http://www2.demis.nl/wms/wms.asp?wms=">
<Layer queryable="1" hidden="1" >
<Server service="WMS" version="1.1.1" title="World Map">
<OnlineResource type="simple"
xlink:href="[demis:WorldMap]"/>
</Server>
<SRS>EPSG:4326</SRS>
<Name>Ocean features</Name>
<Title>Ocean features</Title>
</Layer>
</LayerList>


So now, if you have 20 layers from Demis and you need to change the url, you can do it in one place.

One limitation to this approach is that you must still bind the URL to the prefix inside the document. There are many cases when you would like to have the prefix mapping stored in a database that can be edited by the user. If the XML is being generated dynamically, then this isn't a problem. However, if the XML is being generated dynamically then why not simply replace the URL when you generate the XML.

Sunday, October 7, 2007

TiddlyWiki, Google Apps, Google Analytics

Spacelag is now completely hosted by Google Apps and Blogger. This is the most recent in a long line of experiments for me. I have tried Confluence, Wordpress, TiddlyWiki, Drupal and a slew of others all running on a hosted virtualized server. These all worked fine, but Google is free, faster and I do not have to be my own sysadmin!

Blogger works fine as a blog, but I really wanted a simple wiki for spacelag. The only options Google Apps provides for website hosting is Google Page Creator. If you have used it, you will know it isn't especially flexible and there are no options for server-side scripting. That's when I returned to TiddlyWiki.

I have used TiddlyWiki off and on over the past couple of years, mostly for personal note keeping. I have also used TiddlyWiki at work for deploying documentation and tutorials with web applications. I have even used TiddlyWiki with server-side extentions as a wiki for spacelag. This time around it's just a stock TiddlyWiki served as a file on www.spacelag.com/.

That's definitely simple and I love not having to maintain mysql and the rest of the software stack. But how will I know if people are actually visiting the site? For this, Google provides Google Analytics. Signing up was easy and the code snipped required to enable the blog was simple enough, but TiddlyWiki required a little more thinking. I could have just hacked the code to include the required scripts, but this is not the TiddlyWiki way.

After a little bit of googling, I came across Tracking TiddlyWiki with Google Analytics by Phil Hawksworth. He provided almost exactly what I needed. You can follow the link for details, but I will provide a brief recap.

The first step is easy, simply add the markup provide by Google Analytics to the MarkupPostHead tiddler. TiddlyWiki will inject this into the loaded page just after the HTML head element. This will now register a page hit whenever someone loads the wiki.

The second step is to enable tracking individual tiddlers by overriding the displayTiddler function to call urchinTracker("/" + titles) before displaying the tiddler.

The only change I made to Phil's code is to call urchenTracker only when readOnly == true. This way I do not make calls to google when I view and edit offline. I chose readonly because I set my blog to be readonly when it's online. You could also check window.location to make sure it contains a specific URL or some other trick.

Below are my versions of the required tiddlers.

MarkupPostHead:
<script src="http://www.google-analytics.com/urchin.js" type="text/javascript">
</script>


CustomTracker (tag with "systemConfig")
{{{
// Specify your account number here!
_uacct = "UA-1234567-1";

// CustomTracker as a namespace for tracking related functions
var CustomTracker = {
// store a reference to the original displayTiddler function
displayTiddler: story.displayTiddler
};

CustomTracker.track = function() {
if (readOnly) {
urchinTracker.apply(this, arguments);
}
};

CustomTracker.trackAndDisplayTiddler = function(srcElement, titles) {
// log with the tracker
CustomTracker.track('/' + titles);
// call the original displayTiddler function
CustomTracker.displayTiddler.apply(this,arguments);
};

// replace the default displayTiddler function with a tracking version
story.displayTiddler = CustomTracker.trackAndDisplayTiddler;

// Call once for the initial page load
CustomTracker.track();
}}}

Tuesday, September 18, 2007

Zipcodes and more

Zip codes are extremely convenient for finding places on a map. Just go to maps.yahoo.com and type in "pizza 90120" and you will find approximately 27,466 hits for places whose name contains the word pizza or have categories that contain the word pizza. That's definitely better than the 7,662,184 matches you find with just "pizza". You find that many online store locators will also take a zip code and find all of the stores within a given distance. Definitely useful.


If you're interesting in seeing how zipcodes are laid out in the use, Ben Fry has put together Zipdecode, a fun little zipcode mapping tool. Just start typing and interactively map zipcodes that match the partial zipcode as you type it.




All of the above are good examples of how to perform fast searching using an identifier instead of geographic coordinates. This works because the postal service has already assigned zipcodes to all addresses.


What if you start with a coordinate (lat,long) and want to compute the zipcode? Easy, just download the boundary files for zipcodes and find the one that contains your point, right? Well, not exactly. The following statement from the Census Bureau states the problem concisely:


The Census Bureau does not have maps or digital files showing the boundaries of U.S. Postal Service ZIP Codes. The fact that ZIP Codes aren't required to be polygons makes them difficult to map. They are networks of streets served by mail carriers or just individual post offices and are a tool for mail delivery. They also change periodically as required to meet Post Office operational needs. Various companies have created maps by interpolating boundaries between occurrences of ZIP Codes on the ground. However, this does not guarantee that the U.S. Postal Service delivery routes will follow this interpretation.
Census Zip Code Information


To help overcome these limitations, Census developed Zip Code Tabulation Areas.

ZCTAs are generalized area representations of U.S. Postal Service (USPS) ZIP Code service areas. Simply put, each one is built by aggregating the Census 2000 blocks, whose addresses use a given ZIP Code, into a ZCTA which gets that ZIP Code assigned as its ZCTA code. They represent the majority USPS five-digit ZIP Code found in a given area. For those areas where it is difficult to determine the prevailing five-digit ZIP Code, the higher-level three-digit ZIP Code is used for the ZCTA code.
If you're interested in seeing how ZCTAs are laid out, take a look at an interactive ZCTA map. It's a nice mashup of a ZCTA map with Google Maps.





If you look closely, you will notice that holes exist in rivers and likely other places. This is a problem if you're trying to develop an indexing scheme, unless you don't need 100% coverage. However, if there is no census data there, it's likely that you will find anything of interest.

Tuesday, September 11, 2007

What is a gazetteer?

For my first post here, I have decided to jump right into the concept of gazetteers. You can expect to see more posts on gazetteers in the near future as I plan to build one.

What is a gazetteer? According to wordnet it's "a geographical dictionary". Wikipedia calls it a "geographical directory" and has lots more to say about the history and the publicly available gazetteers.

If you are interesting in obtaining access to a gazetteer, a number of online gazetteer services are available that provide semi-restful APIs and most provide the ability to download the entire database. The U.S. Board on Geographic Names was established to maintain uniform geographic name usage throughout the Federal Government. On their site you will find links to download several different gazetteers, including:
GeoNames is popular online gazetteer that also provides downloads and web services.

Most web sites which provide a place name lookup have implemented their own gazetteer using either a relational database or some indexing engine. Standards for gazetteer services and open source implementations do exist.

In a future post I will discuss both the abstract model defined by ISO and the gazetteer profile of WFS defined by the Open Geospatial Consortium.