Category Archives: SQLServer

How to Edit an XML Column on the fly from Table in SQLServer

I work with an OLAP Engine called Mondrian from Pentaho – which has a Schema which is defined in XML.

For our implementation this is stored in a Table in a SQLServer DB and in production I cannot stop the Application server which is JBOSS and which connects to SQLServer to read the XML from the Table to set up initial Mondrian Connection and modify the XML by hand and then restart it. This is deployed for multiple customers in a multi-tenant mode.

So I had to devise a way to change the Mondrian Schema on the fly for a very serious customer issue.

I describe below a T-SQL script to do it.

Look at the XML below. I have made is short for clarity purpose.

Our XML  Schema is about 500MB long for the MONDRIAN Engine for our Database.

<?xml version=”1.0″ encoding=”UTF-8″?>

<Schema name=”OMJ_APS_200812_3″>

.

.

<Cube name=”Prescriber Product_Package Period”>

<Table name=”RX_PRESC_PKG”/>

<DimensionUsage name=”ST_H” source=”ST_H” foreignKey=”AlignmentId”/>

.

<Measure name=”Dollars” caption=”$” column=”Dollars” aggregator=”sum” formatString=”#,###”/>

<Measure name=”SMP” caption=”SMP” column=”SMP” aggregator=”sum” formatString=”#,###”/>

<Measure name=”Dollars NUMBER_PRESCRIBER” caption=”$ NUMBER_PRESCRIBER” column=”PrescriberID” aggregator=”Count” formatString=”#,###”/>

</Cube>

</Schema>

Look at the last Measure

<Measure name=”Dollars NUMBER_PRESCRIBER” caption=”$ NUMBER_PRESCRIBER” column=”DOL_NUM_PRESCRIBER” aggregator=”Count” formatString=”#,###”/>

Look at the attribute in it

column=”PrescriberID”

Suppose I want to change it to

column=”DOL_NUM_PRESCRIBER”

The Script which does the above is given below – Part 1

Also assume I need to change the

aggregator=”Count”

to

aggregator=”distinct count”

The script which does the above is – Part 2

==============================================================================================================

DECLARE @strCommand varchar(max)

DECLARE @measurePersistentName varchar(1024)

DECLARE @measureGroupName varchar(1024)

set @measurePersistentName = ‘PrescriberID’

— backup the table before modifying the XML contents

select * into MONDRIAN_OLAP_SCHEMA_BACKUP from MONDRIAN_OLAP_SCHEMA

BEGIN

— Part 1   Modify the Column Name

set @strCommand = ‘

DECLARE @myDoc xml

DECLARE @finalStr varchar(max)

select @myDoc=SchemaXml from MONDRIAN_OLAP_SCHEMA

set @finalStr = ”<?xml version=”1.0″ encoding=”UTF-8″?>”

SET @myDoc.modify(”’ +

‘replace value of (/Schema/Cube[@name=”Prescriber Product_Package Period”]/Measure[@name=”‘ + @measurePersistentName + ‘ NUMBER_PRESCRIBER”]/@column)[1]

with (“DOL_NUM_PRESCRIBER”) ”’ + ‘)

set @finalStr = @finalStr + convert(varchar(max), @myDoc)

update MONDRIAN_OLAP_SCHEMA set SchemaXml = @finalStr

exec (@strCommand)

— Part 2 — Modify the Aggregator Function Name

set @strCommand = ‘

DECLARE @myDoc xml

DECLARE @finalStr varchar(max)

select @myDoc=SchemaXml from MONDRIAN_OLAP_SCHEMA

set @finalStr = ”<?xml version=”1.0″ encoding=”UTF-8″?>”

SET @myDoc.modify(”’ +

‘replace value of (/Schema/Cube[@name=”Prescriber Product_Package Period”]/Measure[@name=”‘ +  @measurePersistentName +  ‘ NUMBER_PRESCRIBER”]/@aggregator)[1]

with (“distinct count”) ”’ + ‘)

set @finalStr = @finalStr + convert(varchar(max), @myDoc)

update MONDRIAN_OLAP_SCHEMA set SchemaXml = @finalStr

exec (@strCommand)

END

==============================================================================================================

Advertisements

How to parse simple XML strings in SQLServer using T-SQL

Assume you have a Simple XML String which has the following structure

‘<Filters>

<Filter GroupName=”NSPC Region1″ GroupMemberName=”90-WESTERN1″/>

<Filter GroupName=”NSPC Region1″ GroupMemberName=”91-WESTERN1″/>

<Filter GroupName=”NSPC Region1″ GroupMemberName=”92-WESTERN1″/>

<Filter GroupName=”NSPC Region2″ GroupMemberName=”90-WESTERN2″/>

<Filter GroupName=”NSPC Region2″ GroupMemberName=”91-WESTERN2″/>

<Filter GroupName=”NSPC Region2″ GroupMemberName=”92-WESTERN2″/>

</Filters>’

And you would like to parse out the Outer and Inner Attributes – here is some simple code to do that in T-SQL

Declare @SegmentGroupFilters xml

DECLARE @FilterSGDisplayName varchar(255)

DECLARE @FilterSEGDisplayname varchar(255)

set @SegmentGroupFilters =    ‘<Filters>

<Filter GroupName=”NSPC Region1″ GroupMemberName=”90-WESTERN1″/>

<Filter GroupName=”NSPC Region1″ GroupMemberName=”91-WESTERN1″/>

<Filter GroupName=”NSPC Region1″ GroupMemberName=”92-WESTERN1″/>

<Filter GroupName=”NSPC Region2″ GroupMemberName=”90-WESTERN2″/>

<Filter GroupName=”NSPC Region2″ GroupMemberName=”91-WESTERN2″/>

<Filter GroupName=”NSPC Region2″ GroupMemberName=”92-WESTERN2″/>

</Filters>’

if ( @SegmentGroupFilters is not null  )

BEGIN

print ‘@SegmentGroupFilters is not null’

DECLARE sg_cursor2 CURSOR FOR

SELECT

T1.rows.value(‘@GroupName’,’varchar(100)’) as FilterSGDisplayName,

T1.rows.value(‘@GroupMemberName’,’varchar(100)’) as FilterSegmentDisplayName

FROM @SegmentGroupFilters.nodes(‘/Filters/Filter’) T1(rows)

OPEN sg_cursor2

— Read first row from config table

FETCH NEXT

FROM sg_cursor2

INTO @FilterSGDisplayName, @FilterSEGDisplayname

WHILE @@FETCH_STATUS = 0

BEGIN

print ‘GroupName ‘ + @FilterSGDisplayName

print ‘GroupMemberName ‘ + @FilterSEGDisplayname

FETCH NEXT

FROM sg_cursor2

INTO @FilterSGDisplayName, @FilterSEGDisplayname

END

CLOSE sg_cursor2

DEALLOCATE sg_cursor2

END

Results

@SegmentGroupFilters is not null

GroupName NSPC Region1

GroupMemberName 90-WESTERN1

GroupName NSPC Region1

GroupMemberName 91-WESTERN1

GroupName NSPC Region1

GroupMemberName 92-WESTERN1

GroupName NSPC Region2

GroupMemberName 90-WESTERN2

GroupName NSPC Region2

GroupMemberName 91-WESTERN2

GroupName NSPC Region2

GroupMemberName 92-WESTERN2

Different things I tried to solve – SQL Error: I/O Error: Connection reset

Problem definition :

We have been seeing these Exception on the production machines – when a SQL query comes from the AppServer ( JBoss ) and does the query using one of the connections in the Connection pool.

SQL Error: I/O Error: Connection reset

java.sql.SQLException: I/O Error: Connection reset

at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1053)

at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:465)

at net.sourceforge.jtds.jdbc.JtdsStatement.executeQuery(JtdsStatement.java:1304)

at TestSQLQuery.process(TestSQLQuery.java:46)

at TestSQLQuery.main(TestSQLQuery.java:27)

Caused by: java.net.SocketException: Connection reset

at java.net.SocketInputStream.read(SocketInputStream.java:168)

at java.io.DataInputStream.readFully(DataInputStream.java:178)

at java.io.DataInputStream.readFully(DataInputStream.java:152)

at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:846)

at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:727)

at net.sourceforge.jtds.jdbc.ResponseStream.getPacket(ResponseStream.java:466)

at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:103)

at net.sourceforge.jtds.jdbc.ResponseStream.peek(ResponseStream.java:88)

at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:3932)

at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1046)

Setup under which this is happening

  • JBoss AppServer 4.2.2 running RedHat Linux 5.5
  • SQLServer JDBC Driver 1.2.2 – jtds-1.2.2.jar
  • Java 1.6
  • SQLServer 2005(9.00.4035.00      SP3         Standard Edition (64-bit))   running on – Windows Server 2008 R2

These sort of problems always excite me – and I itch to get jumping on these sort of problems. This helps me to get away from some of the routine work I do and also gives me an opportunity to learn more and improve my troubleshooting skills.

So with the permission of my manager – I jumped on this problem. Here is the story in more details.

I created a standalone Java Program to make sure that the problem is can be recreated from anywhere without much setup.

I added JDBC logging to my Standalone Java program by adding the following lines to my code – to get more details of the error – the JDBC Log file shows the following exception immediately after the Query packet information.

Class driverClass = Class.forName(driverClassName);

java.sql.Driver driver = (java.sql.Driver) driverClass.newInstance();

FileOutputStream fs = new FileOutputStream(“DriverManagerConnectionLogging.log”);

DriverManager.setLogWriter(new PrintWriter(“DriverManagerConnectionLogging.log”));

Exception shown in the JDBC Log File

TdsSocket: Max buffer memory used = 0KB

java.sql.SQLException: I/O Error: Connection reset

at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1053)

at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:465)

at net.sourceforge.jtds.jdbc.JtdsStatement.executeQuery(JtdsStatement.java:1304)

at TestSQLQuery.process(TestSQLQuery.java:46)

at TestSQLQuery.main(TestSQLQuery.java:27)

SQLException: SQLState(08S01)

The weird part of this was – the error never happened when the Standalone Java program was running from a windows box still connecting to the same SQLServer machine (remote) to the same DB. However EVERY time I ran this from the Linux box – the error happened without fail.

We were at a loss to explain why this was happening.

My technique to solve such kind of problems is elimination and isolation. I try to eliminate possibilities that could cause these errors and also try to isolate the areas where these errors could happen.

On further investigation I found that the SQLServer machine had 2 NIC Cards, but the Linux Machine had only 1.

Lets go back 6 months when we had a similar problem :

About 6 months back I had solved a similar case of Connection Resets from the Socket on Linux boxes – after I had discovered that the Linux box had 2 NIC cards and 1 of the NIC Cards was showing the following – see image below – which shows the amount of packets being dropped from one of the interfaces on the Linux box.

One the 2nd NIC Card was removed from the Linux box and rebooted – we did not see the same error again until now when the same error started happening again.

Here are the steps we tried to resolve the problem this time – The outcome of the steps is given at the end of this post.

  1. Upgraded to the latest JDBC Driver for SQLServer for Java jtds 1.2.5 – no change happened with that – same exception kept on happening
  2. Did a ping from both the Linux box to the SQLServer Box and vice versa and both returned the ICMP packets in < 1 ms – so this was not the problem.
  3. I did the following command to make sure – that Linux was not running out of FileHandles

“netstat -a -l –tcp|wc -l”
“lsof|grep TCP|wc -l”
Both of them returned numbers which seemed to be fine.

  1. So my first guess was to remove the 2nd NIC Card on the SQLServer machine and reboot – cold start the machine and try my standalone program.  That did not solve the problem
  2. Tried a new JDBC Driver for SQLServer – from jtds – jtds-1.2.5.jar – that did not solve the problem too.
  3. Made sure that the Java code closes the connection is closed Query is done and the statements are closed. The code was already doing this – so this was not the cause.
  4. Looked at SQLServer log from Management Studio for errors / issues – found none related to Socket Exception
  5. Looked at Windows Event Viewer for error conditions – found none related to Sockets/Network
  6. Looked at SQLServer Surface Configurations settings – and the port numbers were fine and it was set to accept TCP/IP Only – so this seemed fine as well.
  7. Changed the connection string from the Java program to use the IP Address of the SQLServer box and not the MachineName – this had no effect too.
  8. Recently we had upgraded our Linux Boxes from RHEL 5.3 to RHEL 5.4. I got hold of an older Linux box and ran my standalone java program to the see if the problem happened there too – and to my digust – it still happened there.

So now I was trying to be imaginative and trying to think up ways to solve it. All the above 10 points resulted in negative and I had come up with those points in less than 10 min.

Now the real challenge began – I realized this was not an easy problem to solve.

  1. I turned off the firewall on both the SQLServer and Linux boxes – tried my same java program – nothing happened – same exception
  2. Turned off all – non essential services running on SQLServer and tried again – same exception
  3. Moved the DB from 1 SQLServer machine to another – same thing happened
  4. Tried my Standalone Java program from a Different Linux box – no change – same exception
  5. Another engineer on my team wrote a Python script with the same query – to make sure – it is nothing related to the JDBC Driver – and the same error happened when running the Python script as well

I now reverted back to the original Linux box and original SQLServer machine where this was happening

  1. I ran the same standalone Java program from the Linux and ran SQLServer Profiler on the SQLServer box to see what it was doing. I saw it was doing fine – ( see picture below ). I could see the SQL query came in – it executed ( too about 45 sec ) and then the batch completed.

  1. I started running the protocol Analyzer on SQLServer machine and I see this ( see picture below ) – I see SQLServer sending a Reset request

 

FINALLY – an idea dawned – and I decided let me do a cold start of the Linux box.

I did that and once the machine was up and running – I ran the same standalone Java program and it ran perfectly fine. I could not believe my eyes. I ran it 10 times in a row and it worked each time.

I was rejoicing.

BUT here is the bad part – Next day I came to work and tried the same query again and it failed.

Uh! – Something happened between the reboot and the next day – which has resulted in some stack corruption in the Network Card / Network itself that has caused it to fail again.

None of the above tests have helped me to isolate the error to either of the machines.

My next thought was to hook up ethereal / wireshark on the Linux and SQLServer boxes and monitor the traffic and see if that helps.

However before that – I finally decided 1 last thing I needed to verify and that was the OS of the Machine on which SQLServer was installed and running.

– The OS of the machine for SQLServer 2005 was – Windows 2008 Server R2 : This was pleasantly surprising to me and I was happy I discovered another line of thought which I could chase.

So I looked at all other production machines on which SQLServer was installed and found to my joy that all of them were – Windows 2003 Server.

Someone had become a cowboy and installed Windows 2008 R2 and without any testing deployed it on to production.

I immediately got hold of another machine with was not Windows 2008 Server R2 machine, Installed the same version of SQLServer 2005 we had on production, copied the tables from the production SQLServer DB to this machine and ran the same query against this machine and lo it all worked.

Yes, indeed after a lot of thought, excitement and some disappointments, I was successful in getting rid of this weird SQLException from the production DBs.

How we reduced ETL Build Time for our Data Warehouse in SQLServer by Using BCP and Java Based String Concatenation rather than SQLServer’s String Concat mechanism

We had an ETL Build step – that used to take 75-80% of the build time. This step heavily used SQLServer’s String concatenation method and often this used to take about 30-40 hours. It has been widely documented that SQLServer’s String Concatenation is extremely slow and we had also observed that during the time SQLServer does String Concatenation the CPU usage on the machine goes up and stays there for the entire duration.

The article below describes how – I speeded up the above step by getting rid of String concatenation entirely and using an alternate technique to speed up the ETL build times. After the steps below were implemented – the above ETL Step took about 1.5-2.0 hours. What an improvement.  Thanks to the crude raw speed of BCP ( bulk copy protocol ) from SQLServer and Java.

Let me lay the background of what we were trying to do before I describe the steps.

Assume we had a fact table that looked like below – Everything in Dimension1, Dimension2, Dimension3 is shown as a Foreign KeyID to the Dimensional tables.

Dimension1 Dimension2 Dimension3 Product-Market Hierarchy Time Measure1 Measure2
345 435 823 A Jan-2009 145 675
345 435 823 B Jan-2009 329 719
345 435 823 C Jan-2009 146 782
345 435 823 A Mar-2009 435 671
350 432 829 B Mar-2009 1232 5540
350 432 829 A Apr-2009 435 671
350 432 829 C May-2009 567 987
.
355 435 823 A Jan-2009 145 675
355 435 823 B Jan-2009 329 719
355 435 823 A Feb-2009 146 782
365 442 823 A Mar-2009 435 671
365 442 829 C Mar-2009 1232 5540
370 442 829 C Apr-2009 435 671
370 442 829 B Feb-2009 567 987
370 442 829 A Apr-2009 45 25

Table : 1

We needed to create an Aggregate Table as shown below for doing some reporting in our BI Analytic Tool.

Also in our domain – Dimension 1 had a cardinality of about 2 million.

Dim1 Dim2 Dim3 Measure1-Str Measure2-Str
345 435 823 A,Jan-2009,145; B,Jan-2009,329; C,Jan-2009,146; A,Mar-2009,435; A,Jan-2009,675; B,Jan-2009,719; C,Jan-2009,782; A,Mar-2009,671;
350 432 829 B,Mar-2009,1232; A,Apr-2009,435; C,May-2009,567; B,Mar-2009,5540; A,Apr-2009,671; C,May-2009,987;
355 435 823 A,Jan-2009,145; B,Jan-2009,329; A,Feb-2009,146; A,Jan-2009,675; B,Jan-2009,719; A,Feb-2009,782;
365 442 823 A,Mar-2009,435; C,Mar-2009,1232; A,Mar-2009,671; C,Mar-2009,5540;
370 442 829 C,Apr-2009,435; B,Feb-2009,567; A,Apr-2009,45; C,Apr-2009,671; B,Feb-2009,987; A,Apr-2009,25;

Table : 2

What I am going to discuss is – how I dramatically improved the building times of the Agg Table as shown in the above Table.

The above Aggregate table took about 10 hours to be build on a 235 million row fact table – when we used a SQLServer String concatenation approach of using cursors to loop through the rows. I had verified that the slowdown was due to the String Concatenation only and nothing else.

The solution I discuss below reduced the built time to about 1.5 hours.

Please remember that the Aggregate table build time was mainly because SQLServer was very very very very slow doing String Concatenation.

3 Step Process to get 10 time speed improvements in ETL Time to generate the Aggregate Table

1st Step – BCP – Bulk Copy Protocol  in SQLServer

I used the BCP program to get the data out from the SQLServer Tables to a flat file separated by Tabs.

I was at times amazed at the raw speed of BCP in SQLServer to get the data out from a very very large table sometimes almost the size of 250 Million row tables in less than 10-15 min to a flat file and each column value separated by a Tab.

For example I used the command line as shown below :

bcp “SELECT i.Dim1,i.Dim2, i.Dim3, i.Product-Market-Hierarchy, i.Time,  isNull(SUM(i.Measure1), 0) ,  isNull(SUM(i.Measure2), 0)  FROM FactTable i

GROUP BY i.Dim1,i.Dim2, i.Dim3, i.Product-Market-Hier, i.Time

ORDER BY  i.Dim1, i.Dim2, i.Dim3, i.Product-Market-Hier, i.Time”

QUERYOUT    tmp_FactTable_FROM_BCP_OUT.txt -c -Usa -PPassword

2nd Step – Use Java to Do the String Concatenation

I then wrote a java program that reads the above generated Tab separated text file from BCP and uses FileIO to read each line – parse each column separated with Tabs and then generates an output which is a Text file – with each column separated by Tabs and with the columns as shown in Table 2.

It was the Java Program that was doing the String Concatenation.

3rd Step – Take the output from the Java Based step – and BCP in the results back

After that it was easy. Just BCP in the results obtained from 2nd step into the  DB Table 2.

bcp TableName_asTable2

IN  FLatFileGeneratedByJava.txt -c -Usa -Ppassword