While attempting to do some data merging across several Excel spreadsheets, I found some interesting behavior in how OLEDB determines data types of columns with an Excel source.
For me, the issue manifested itself with a column that looked like the following (shortened for brevity's sake):
10132
10133
10134D
10134
10135
By default, Excel looks at the first 8 rows of each column to determine the type. If most of the items are numeric, it will create the column as an int or double type. The problem is, if the column also contains text data, like the one above (10134D), the column can't be converted to a numeric type and OLEDB will just drop the value with no error or explanation.
With the above data, since 4 out of 5 are numeric, OLEDB turns the column into a numeric type, and the 3rd row's column of that data would be NULL.
I tried several ways of fixing the issue but found that the only solution that worked was to set the IMEX property in the connection string to 1 in the Extended Properties section like so:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=blah.xls;Extended Properties="Excel 8.0;IMEX=1;"
An additional extended property you might find useful is "HDR=1", which tells the driver that the first row of the spreadsheet contains text data as a header and not column data.
Apparently this tells the OLEDB driver for Excel that the data is "intermixed" and that it should convert any intermixed data to the type specified in the registry at:
HKLM\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\ImportMixedTypes
By default, this is "Text", which is exactly what I wanted. You'll also notice a key in there named "TypeGuessRows". This is the number of rows that Excel will use to determine the data-type of the column, the default being 8 as described above. You may need to tweak this value to a number greater than the row at which your data changes in the column you want as a string type.
Anyway, it's probably useless to most, but it took me a bit of time and research to figure out so I thought I'd pass it along.
About Brian Peek
Brian is a
Microsoft C# MVP who has been actively
developing in .NET since its early betas in 2000, and who has been developing
solutions using Microsoft technologies and platforms for even longer. Along
with .NET, Brian is particularly skilled in the languages of C, C++ and
assembly language for a variety of CPUs. He is also well-versed in a wide
variety of technologies including web development, document imaging, GIS,
graphics, game development, and hardware interfacing. Brian has a strong background in developing applications for the health-care industry, as well as developing solutions for portable devices, such as tablet PCs and PDAs. Additionally, Brian has co-authored the
book "
Debugging ASP.NET" published by New Riders, and is currently
co-authoring a book titled "
10 Coding4Fun Projects with .NET for Programmers, Hobbyists, and Game Developers" to be published by
O'Reilly
in late 2008. Brian also writes for MSDN's
Coding4Fun website, contributing
articles on a monthly basis.