Bee Pee

Importing Data from a flat file to SQL Server using SQLBulkCopy

Problem: Import a file with over 600,000 records which is around 250MB size. The import needs to be quick.
Solution: Recently i was tasked with a carrier integration project. Now this carrier configuratiion file conatined data for 160 diffrent columns of data. Traditional file reading and inserts using Enity Framework or SQL Inserts was not going to work.
The first idea i which came to my mind is creating a SSIS package, this would have been a perfect solution as SSIS is designed just for it. However, i had a restriction in place whereby i had to use SQLExpress only.
This meant i needed something which i could do bulk inserts with. In comes SQLBulkCopy, since i was going to import into SQL Server SQLBulkCopy was perfect for this case. Belive it or not this class existed in the framework since 2.0. It has been less well known class for good reason. The reason being we dont write importing routine everyday as compared to other more routine operations.
The idea on the high level was to read data from the file insert into a datatable and then use SQLBulkCopy to insert into the database. Now writing the code to read the file was another intresting part, there is another post coming on that too. Apart from using Datatable , we could use IDataReader as well.
I choose DataTable as i could use Datatable name and coulumn names later to do the mapping and table name while inserting the data
Namespace System.Data.SqlClient
Step 1: Create DataTable

var networkTable = new DataTable("dbo.Networks");
networkTable.Columns.Add(SequenceNumber, typeof(string));
networkTable.Columns.Add(CarrierId, typeof(string));
networkTable.Columns.Add(ModId, typeof(string));
networkTable.Columns.Add(Barcodes, typeof(double));

Step 2: Populate  DataTable

newRow[SequenceNumber] = line.Substring(5, 3).Trim();
newRow[CarrierId] = line.Substring(8, 2).Trim();
newRow[ModId] = line.Substring(10, 3).Trim();          
newRow[Barcodes] = Convert.ToInt32(line.Substring(31,1).Trim());

The code sample is kept breif on purpose.
Step 3: Insert Records

private void insertRecords(DataTable dataTable)
           using (var bulkCopy = new SqlBulkCopy(SQlConnection,SqlBulkCopyOptions.KeepIdentity | SqlBulkCopyOptions.TableLock))
               bulkCopy.DestinationTableName = dataTable.TableName;
               bulkCopy.BatchSize = 5000;
                   foreach (var column in dataTable.Columns)
                       bulkCopy.ColumnMappings.Add(column.ToString(), column.ToString());

               catch (Exception ex)

                   throw ex;

As we can see from the function above using datatable allows me generlise the function so i can reuse it from anywhere in the project.
The SQL bulk copy options of SqlBulkCopyOptions.KeepIdentity  & SqlBulkCopyOptions.TableLock are in place so that we can preserve the identity values. Also TableLock option allows us to lock the table during inserts.
The batch size of 5000 indicates that a batch of 5000 records will be inserted in a single go.
During the operation i had couple of error messages come up as below:
Error 1 : Failed to convert parameter value from a String to a Int32
After encountering this error i set the SQL Copy option of SqlBulkCopyOptions.KeepIdentity so that the indentity columns are ignored during the inserts.

Error 2 : The given value of type String from the data source cannot be converted to type nvarchar of the specified target column.

Error 3 :The given value of type String from the data source cannot be converted to type int of the specified target column
After encountering errors 2 & 3 i set up the column mapping code so that the right columns are inserted from source to destination. Initially i had assumed keeping the column names source and destination same would be enough.

Error 4 : The given ColumnMapping does not match up with any column in the source or destination.
The error 4 was because of mistake on my part. But the thing to remember is the coulumn mapping are case sensitive. I had the column names same with the casing. However, the mistake i had made was trailing spaces in the column name of datatable.

Conclusion : The import process took 55 seconds to import of sample 550,000 lines(reading + inserting in db)