DataStream

LANGUAGES: C#

ASP.NET VERSIONS: 1.0 | 1.1

 

Merge Disparate Source Data

Learn to pull data from multiple sources into a single DataSet.

 

By Brian Noyes

 

The DataSet is a powerful object. You can add multiple tables, set relations between the tables, and add constraints, giving you a powerful in-memory container for data. It can keep track of modifications to rows, and can use the original and the modified rows to perform updates using optimistic concurrency.

 

How you get data from multiple sources into one DataSet depends a lot on where the data is coming from. If you are executing the queries yourself against one or more databases to construct the DataSet, then there are lots of examples of how to execute a query against a database using the ADO.NET Command object and a DataAdapter to fill the DataSet with one or multiple tables.

 

But what do you do if you are aggregating data from multiple sources, and that data comes in the form of pre-constructed DataSets or XML that you want to turn into a DataSet? Perhaps you are getting a DataSet as the return value from a Web service that you do not control, or are getting separate XML files representing tables of data from other data sources. You want to get that data into a single DataSet for data binding, updates, or manipulation purposes.

 

Adding Data and Schema with Merge

The answer to bringing together data from multiple sources is actually very straightforward, but seems to elude many people. The DataSet class implements a method named Merge that allows you to pass in a source DataSet, DataTable, or DataRow array to a target DataSet and have the data from the source merged into the target DataSet. There are a number of overloads of the Merge method, allowing you to approach things in different ways depending on your needs. You can simply add multiple tables to a DataSet by merging them in from multiple sources, or you can add data from multiple sources into a single table. Both of these approaches include many options in terms of the way merge is performed with respect to the schema of the target DataSet and whether changes are preserved when data is imported into an existing table in the target DataSet.

 

To demonstrate some of the options and the basic approach, the sample code accompanying this article includes a simple application that pulls in data from three XML files (see end of article for download details). Two of the XML files contain data from the Northwind Orders and Order Details tables. The third contains some modified and added records for the Orders table that the code merges into the existing Orders table data to effect changes to the existing data. Imagine that these three XML files could have come from completely different data sources over the wire. However, the data has implicit relationships that you want to recapture and use in your application.

 

Your first thought might be that you can just read multiple XML files into the DataSet in a similar way to loading multiple queries into a DataSet using a DataAdapter. The ReadXml method of the DataSet makes it easy to read in XML into a DataSet, but it only supports repopulating the DataSet from scratch, not adding information to the DataSet incrementally. Unfortunately, with the current version of .NET, there is no XmlAdapter. I will be covering some of the new capabilities just announced for .NET 2.0 in future columns, but there is in fact an XmlAdapter in .NET 2.0. If you need to do this kind of loading of multiple XML files into a DataSet a lot today, consider writing your own class to encapsulate the process until the .NET 2.0 functionality is released.

 

So the solution in this case is to use the Merge method on the DataSet class. Let's start with the case where you have data coming from two sources that represent different tables, but that there is an implicit parent-child relationship between the tables. To get the data into a single DataSet with that relationship, the first step is to read the data into two different DataSets first (assuming XML as the source data format - you may also have been passed a fully constructed DataSet from a Web service method or class library call):

 

DataSet dsOrders = new DataSet();

DataSet dsOrderDetails = new DataSet();

dsOrders.ReadXml("Orders.xml");

dsOrders.Tables[0].TableName = "Orders";

dsOrderDetails.ReadXml("OrderDetails.xml");

dsOrderDetails.Tables[0].TableName = "OrderDetails";

 

Note that the code is renaming the tables after reading them in. This is because the XML files for the sample were purposely saved without schema information to more closely represent data from disparate data sources, where there may not be any agreed upon common schema. To bring the two tables together into one DataSet, you then call the Merge method on the target DataSet into which you want to aggregate the data from the source DataSet. Then you can create the relationship between the tables:

 

// Perform the merge

dsOrders.Merge(dsOrderDetails.Tables["OrderDetails"]);

// Now that the child table is there, create the

// relationship and constraints

dsOrders.Relations.Add("FK_Orders_OrderDetails",

  dsOrders.Tables["Orders"].Columns["OrderID"],

  dsOrders.Tables["OrderDetails"].Columns["OrderID"],true);

 

In this case, I am using the version of the Merge method that simply takes a DataTable reference. It will also transfer the schema information from that table into the DataSet and add it as an additional table (see Figure 1). The result is that a new DataTable is added to the target DataSet (dsOrders) with the schema of the Order Details table and its contained data.

 


Figure 1. Calling Merge on one DataSet and passing in the table from another results in a copy of the table from the source being created in the target.

 

Change Data with Merge

Now let's take a look at another scenario where you might use the Merge method. Say you have a DataSet that contains your current data for a table. That current data may or may not have modifications already made to it since it was retrieved from its data source. You receive a set of data from some other source that contains new or modified data that you need to use to perform updates to the existing set of data. The data may be in the form of an XML file, a DataSet with a DataTable that matches the schema of the target DataTable, or perhaps an array of DataRows resulting from a method call such as Select or GetChanges on an existing DataSet.

 

In this situation, when you call Merge on the target DataSet, passing in source data whose schema matches an existing table in the target, the data from the source will be merged into the target. What "merge" means in this case depends on a number of factors.

 

The simplest to understand is the default case, where you just pass in the source data to a target, and both source and target have matching table schema with primary key information. In this case, if a row in the source data matches the primary key of a row in the target data, the source data will replace the current values of the fields of the corresponding row in the target data. If no matching primary key is found for a row in the source data, it will be added to the target data as a new row (see Figure 2). If there are any schema mismatches or if a constraint is violated, an exception will be raised at the end of the updating process with embedded information about what went wrong, and error information will be added to the offending rows.

 


Figure 2. Rows from the source table replace the matching rows in the target table. Rows from the source that do not match target rows are simply added to the target as new rows.

 

If you use one of the overloads of the Merge method that take a Boolean preserveChanges parameter, and you pass true for that parameter instead of the default value of false, the behavior in the scenario is quite different. When you tell the Merge method to preserve changes, if a row in the source data has the same primary key as a row in the target data, the target data row current values will remain unchanged, but the original values for the row in the target will take on the original values of the source row.

 

I'm not sure I see a lot of value in using this approach, because if the original values change, you will get concurrency exceptions if you try to use the target DataSet to perform updates to its data source. But that is the designed behavior of the Merge method when told to preserve changes. Figure 3 shows the code for the method that reads in the source and target data and merges them. Note that it is important to have primary key information available in both of the tables. Without this, Merge will treat all rows coming from the source as new rows.

 

private void MergeRows(object sender, System.EventArgs e)

{

   // Create the source and target DataSets

   DataSet dsOrders = new DataSet();

   DataSet dsModOrders = new DataSet();

   // Load the target data and name the table

   dsOrders.ReadXml("Orders.xml");

   dsOrders.Tables[0].TableName = "Orders";

   // Setting primary key is important because that is the

   // only way Merge can identify matching rows

   dsOrders.Tables[0].PrimaryKey = new DataColumn[]

      {dsOrders.Tables[0].Columns["OrderID"]};

   // Make all the rows in the target the original values

   dsOrders.AcceptChanges();

 

   // Load the source data, set table name and PK

   dsModOrders.ReadXml("ModifiedOrders.xml");

   dsModOrders.Tables[0].TableName = "Orders";

   dsModOrders.Tables[0].PrimaryKey = new DataColumn[]

      {dsModOrders.Tables[0].Columns["OrderID"]};

   // When loaded, the rows are all treated as new/mod rows

   // If AcceptChanges is called, it sets them all to orig

   // so that the merge behavior with preserveChanges

   // can be observed.

   if (chkSourceOrig.Checked)

   {

      dsModOrders.AcceptChanges();

   }

   // Do the merge

   dsOrders.Merge(dsModOrders.Tables["Orders"],

      chkChanges.Checked,MissingSchemaAction.AddWithKey);

   // Data bind to see the results, varying based on the

   // selected RowState

   dataGrid1.DataSource = new DataView(

      dsOrders.Tables["Orders"],null,null,

       (DataViewRowState)cmbRowStateFilter.SelectedItem);

}

Figure 3. Calling Merge with the same table in both source and target allows you to perform updates from other DataSets. The behavior of merge depends on a combination of the RowState of rows in the source and target and whether the preserveChanges Boolean argument is passed to Merge.

 

The sample program for the article will let you play with these options by changing the preserveChanges parameter on the fly and displaying the resulting changes with a selectable RowState filter. You will want to look at the source and target data to see which values are coming from where to understand what you are seeing at run time, but this should give you a better idea of what combinations are possible using the Merge method. To read about all the various combinations of behavior of the Merge method, see the topic "Merging DataSet Contents" in the MSDN library.

 

The sample code accompanying this article is available for download.

 

Brian Noyes is a consultant, trainer, speaker, and writer with IDesign, Inc. (http://www.idesign.net), a .NET focused architecture and design consulting firm. Brian specializes in designing and building data-driven distributed applications. He has over 12 years' experience in programming, engineering, and project management, and is a contributing editor and writer for C#PRO, asp.netPRO, and other publications. Contact him at mailto:brian.noyes@idesign.net.