Tuesday 12 September 2006

InfoPath 2007, SQL05 and ZQuery

As part of the latest project we're looking at an uber Sharepoint 2007 solution. One area we are looking at are storing completed forms data directly to SQL05, and what querying we can do directly.
So, to try this out I created a table with 2 columns as below.

CREATE TABLE [dbo].[tblInfoPathForms]([FormGuid]
[uniqueidentifier] NOT NULL CONSTRAINT [DF_tblInfoPathForms_FormGuid] DEFAULT
(newid()),[FormData] [xml] NOT NULL,CONSTRAINT [PK_tblInfoPathForms] PRIMARY KEY
CLUSTERED ( [FormGuid] ASC )WITH (IGNORE_DUP_KEY = OFF) ON
[PRIMARY]) ON [PRIMARY]

FormGuid is obvious, the second column however uses the new datatype in SQL05 - the xml datatype

How is this different to storing the xml in a text column? Glad you asked. The XML datatype has the following advantages. It's stored as a binary stream, so access is quicker. It also supports the XQuery language directly through SQL statements, allowing you to find node data and manipulate it directly.

So - Having created a test infopath form, published it and completed it. I have my XML. Infopath XML is not too bad really. Inserting the XML is just like any other insert statement.

XQuery can use either XPath structure or the more complex FLWDR syntax - which acts more like SQL. The query below uses $s as an alias for the Xpath root /statusReport and returns the text inside the reportDate node.

SELECT FormData.query('for $s in
/statusReportreturn $s/reportDate/text()')

FROM tblInfoPathForms
WHERE
FormGuid='FE66CE45-891E-453D-8225-E5504D26608B'


Easy! Well....not quite - if you run this query you will get an empty cell back. Microsoft disregards the XQuery specification here and does not return an error message, instead returning an empty result. So what is the problem? Namespaces. Looking at the original XML
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-09-22T20:42:56"
xmlns:xd="http://schemas.microsoft.com/office/infopath/2003"
xml:lang="en-us">
we can see that the namespace "my" is used throughout the document. By adding a default namespace for the document in the query prolog we get the correct results

SELECT FormData.query('declare default element namespace
"http://schemas.microsoft.com/office/infopath/2003/myXSD/2005-09-22T20:42:56";
for
$s in /statusReport
return $s/reportDate/text()')
FROM
tblInfoPathForms
WHERE FormGuid='FE66CE45-891E-453D-8225-E5504D26608B'
Now that we can get at the expected result it's just a case of playing about with the query to get other results. XQuery is quite powerful and allows count and sum operations, along with methods to "shred" the XML back into result sets.

Tuesday 11 July 2006

Regex - The Website Accessibility Saviour

While trying to make Microsoft Content Management Server (MCMS) based websites I've had to employ several funky tricks, including Http Modules, overriding page rendering, inherited pages/master pages and the work horse of the entire system - the regular expression
Most of making CMS accessible has been catching the resulting html before it gets to the user, finding x and replacing it with y. This works for a lot of things (eg, removing the Border=0 attribute that microsoft insist on adding to any image). The situation with links was quite different. I needed to find and remove target="_blank" where found, but also add in onclick javascript to launch the page in a new window. Why should I want to do this? Target is not a valid attribute in either the HTML or XHTML strict specification and we needed to open some websites in another window as they had some dodgy bad practice that stopped you using the back button in your browser.
So...the goal was to get from:

http://www.mysite.com" target="_blank">
to

http://www.mysite.com" onclick='window.open("http://www.mysite.com");return false;'>
The key is System.Text.RegularExpressions.Regex, and more specifically the Replace method.

Regex.Replace(string sOrig, string sRegex, MatchEvaluator oMatchEvaluator, RegexOptions Options)

example below:


m_sXHTML = Regex.Replace(m_sXHTML,
"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>",
new MatchEvaluator(ConvertTargetToJavascript), RegexOptions.IgnoreCase);
The two keys are the regular expression and the Match evaluator delegate. I'll tackle the expression first.

"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>"

Looks fun eh? There is a well defined syntax for regular expressions, which I don't intend to go into detail on here, but I will cover how this one works, breaking it down into chunks.

"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>
match the character "a" or "A"

"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>"
followed by a white space character ("\s" is the regex syntax, as this is in code we need to escape the escape character!)

"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>"
Anything but a ">" -for zero or more occurances (*).

"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>"
anything but ">" or double quote for 1 or more occurances (+)

"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>"
(0 or 1) double quote (?)

This gives us a match for any tag that contains the target="_blank" or target=_blank attribute after the href attribute. If we do a similar regex for when it occurs before the href then we have covered all bases.

What I have not mentioned so far are the brackets throughout the regex. These imply groups of information that can be used by our match evaluator.

"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>"
The value of the href attribute

"<[aA]\\s([^>]*href=\"([^>^\"]+)\"[^>]*target=[\"]?_blank[\"]?[^>]*)>"
all the attribute information for this
tag.

All we need now is to provide the delegate method, in the example called "ConvertTargetToJavascript"

private string ConvertTargetToJavascript(Match m)
{
string sTest = m.Groups[0].ToString();
string sResult = "

";
return sResult ;
}


Group 0 is always the whole match. The groups that follow it depend on the round brackets used in the regex. In the example Group 1 will be all attribute information, and Group 2 will be the value of the href. The string result returned by the match evaluator will replace the original match.
All to easy.....once you get your head around regex syntax.

Friday 30 June 2006

Deployment Techniques - The Indiana Jones Method

We all have our favourite ways of deploying things, some better than others. One technique often used in more informal environments is the "Indiana Jones" technique. Remember the scene at the beginning of Raiders of the Lost Ark where Indie weighs up the bag of sand carefully before swapping it with the idol? I'm not suggesting that anyone would ever dream of renaming the "active" .NET dll on a website and then replacing it with a new one before sitting back to see what happens....

Always remember to weigh that bag of sand more carefully than Indie, and remember to run quickly ;)

CSS - You can stylesheet just about anything

I've come across several recent instances of tools that allow you to override/add your own styles, this blogger included. Other service providers often provide websites that you can style to look like the service provided is coming from your own corporate site.

Often these companies will tell you that you can just do the basics, and that, for example, you will have to send them logos etc, which they will have to upload to their servers before you can truely make use of the branding. Also I've asked the question recently:

"Could you remove this bit of text for me please - it's not applicable?"

to which the answer was

"We'd love to, several customers have asked for it, but we can't - changing the code would change it across all sites - it's impossible"

Nothing is impossible if you are allowed your own CSS modifications....it's great.

For instance - the portrait at the top of my blog.....it doesn't really exist. Look at the HTML and you won't find it. It's all courtesy of the wonderful piece of CSS

background-image: url(/images/myimage.jpg)

Also - you can almost certainly hide any part of the HTML page you want too, using:

display: none

A bit of padding, margins, background images and "display:none" can help style nearly anything without ever going near the HTML, or requiring the involvement of a slow moving service provider. It goes without saying that you should check that you are sticking to the terms of your licensing if you start removing the service provider logos etc.