Tuesday 12 September 2006

InfoPath 2007, SQL05 and ZQuery

As part of the latest project we're looking at an uber Sharepoint 2007 solution. One area we are looking at are storing completed forms data directly to SQL05, and what querying we can do directly.
So, to try this out I created a table with 2 columns as below.

CREATE TABLE [dbo].[tblInfoPathForms]([FormGuid]
[uniqueidentifier] NOT NULL CONSTRAINT [DF_tblInfoPathForms_FormGuid] DEFAULT
(newid()),[FormData] [xml] NOT NULL,CONSTRAINT [PK_tblInfoPathForms] PRIMARY KEY
CLUSTERED ( [FormGuid] ASC )WITH (IGNORE_DUP_KEY = OFF) ON
[PRIMARY]) ON [PRIMARY]

FormGuid is obvious, the second column however uses the new datatype in SQL05 - the xml datatype

How is this different to storing the xml in a text column? Glad you asked. The XML datatype has the following advantages. It's stored as a binary stream, so access is quicker. It also supports the XQuery language directly through SQL statements, allowing you to find node data and manipulate it directly.

So - Having created a test infopath form, published it and completed it. I have my XML. Infopath XML is not too bad really. Inserting the XML is just like any other insert statement.

XQuery can use either XPath structure or the more complex FLWDR syntax - which acts more like SQL. The query below uses $s as an alias for the Xpath root /statusReport and returns the text inside the reportDate node.

SELECT FormData.query('for $s in
/statusReportreturn $s/reportDate/text()')

FROM tblInfoPathForms
WHERE
FormGuid='FE66CE45-891E-453D-8225-E5504D26608B'


Easy! Well....not quite - if you run this query you will get an empty cell back. Microsoft disregards the XQuery specification here and does not return an error message, instead returning an empty result. So what is the problem? Namespaces. Looking at the original XML
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-09-22T20:42:56"
xmlns:xd="http://schemas.microsoft.com/office/infopath/2003"
xml:lang="en-us">
we can see that the namespace "my" is used throughout the document. By adding a default namespace for the document in the query prolog we get the correct results

SELECT FormData.query('declare default element namespace
"http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-09-22T20:42:56";
for
$s in /statusReport
return $s/reportDate/text()')
FROM
tblInfoPathForms
WHERE FormGuid='FE66CE45-891E-453D-8225-E5504D26608B'
Now that we can get at the expected result it's just a case of playing about with the query to get other results. XQuery is quite powerful and allows count and sum operations, along with methods to "shred" the XML back into result sets.