Read the data from flat/delimited file which has huge number of columns
Currently i am working on few integration projects and interesting ones too. Especially the last one as i had to get my tool out of my arsenal. The project involved importing around 600,000 rows into the database. So far so go you would say that’s easy use stream reader to read the data and be done with it. As they say devil is in the detail as it happened the row contained data for 107 columns in database.
As you can see 600,000 rows with data for 107 columns you can imagine the number of variables you would have to write to read the data and put it in the class. Which can then be inserted in the database. The approach i wanted was and less manual typing. This is because the more more typing i do with as many as 107 variables more chances of me making mistake.
What would be the sample code which would allow us to read the data from the file.Assuming that we are reading data into data table.
dataRow[“DataColumnName”] = line.Substring(58, 2).Trim();
So without any further ado let me tell you what tools you will need and how to do it. You will need Visual Studio, of course no getting away from that. And the humble ‘Microsoft Excel’. I used excel you could use any tool which provides similar functionality. Yes you read it right Excel.
Define your destination tables with exactly the same data types and length .
Next we need to declare all the variables, the idea here is to keep the column name in data table same as the column in db. Because we are going to use SQL Bulk copy to insert data as described in this post. Also, if we need to use the column names somewhere else in the code we can do so easily once we have declared them as constants.Again we will need to declare the field length and field start as constants. Also add the data type of the column this will finally look like .
Now with these both the above steps done we are ready for producing our code.The code is which we want to produce looks like below
and how did this happen , simple formula
Now i have used a column for declaring a constant i.e. private const int & private const string, a ‘;’ and a ‘,’. You may very well put it in a formula, i am just being lazy thats all. The advantage it gives me is if i wanted to declare one variable and public and other as protected i can still do it. Not that i would have needed to in this case. Also, it lets me keep my formula clean.
Now we just combine different columns to get the following results
Create data table code
Code to read different parts of the line into the data table.
Now the point to note is i have used the start point as pure number , it should have been again a constant declaration. Why? simply because say you make an error in calculation of the start of one of the column , you would have to regenerate the whole of read line record. Instead if it were a constant all you do is just change the constants declaration. Its a matter of opinion. I just like to keep it clean that’s all.
The whole process of variable declaration and reading the individual column is automated, so i do minimal coding and typing. Thus reducing the human error part. Also, its replicable for me i can use and reuse it any time there are more columns which i need to add or amend.