BrianPeek.com

A Compendium of Random Uselessness
in Search

Excel and OLEDB in .NET

While attempting to do some data merging across several Excel spreadsheets, I found some interesting behavior in how OLEDB determines data types of columns with an Excel source.

For me, the issue manifested itself with a column that looked like the following (shortened for brevity's sake):

10132
10133
10134D
10134
10135


By default, Excel looks at the first 8 rows of each column to determine the type.  If most of the items are numeric, it will create the column as an int or double type.  The problem is, if the column also contains text data, like the one above (10134D), the column can't be converted to a numeric type and OLEDB will just drop the value with no error or explanation.

With the above data, since 4 out of 5 are numeric, OLEDB turns the column into a numeric type, and the 3rd row's column of that data would be NULL.

I tried several ways of fixing the issue but found that the only solution that worked was to set the IMEX property in the connection string to 1 in the Extended Properties section like so:

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=blah.xls;Extended Properties="Excel 8.0;IMEX=1;"

An additional extended property you might find useful is "HDR=1", which tells the driver that the first row of the spreadsheet contains text data as a header and not column data.

Apparently this tells the OLEDB driver for Excel that the data is "intermixed" and that it should convert any intermixed data to the type specified in the registry at:

HKLM\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel\ImportMixedTypes

By default, this is "Text", which is exactly what I wanted.  You'll also notice a key in there named "TypeGuessRows".  This is the number of rows that Excel will use to determine the data-type of the column, the default being 8 as described above.  You may need to tweak this value to a number greater than the row at which your data changes in the column you want as a string type.

Anyway, it's probably useless to most, but it took me a bit of time and research to figure out so I thought I'd pass it along.

Published Apr 18 2006, 05:01 PM by Brian Peek
Filed under:

Comments

 

Tom said:

Thanks for that tip - it answered a question that's been perplexing me!  
May 20, 2006 10:24 PM
 

Rich Frank said:

Excel looks at the first 8 rows...

Thanks for the information.  I continue to bt dumbfounded how quirky and undocumented all this stuff is.
August 18, 2006 11:38 AM
 

Erik said:

Thanks, very useful. See also the following link for a more in-depth discussion:

http://support.microsoft.com/kb/194124/EN-US/
"PRB: Excel Values Returned as NULL Using DAO OpenRecordset"
October 12, 2006 1:57 PM
 

Mogens Nielsen said:

Thanks - this one really helped me out!
October 24, 2006 9:02 AM
 

Julian said:

There's more to IMEX.  I had a spreadsheet with 10 rows of null preamble then numeric values.  These would not display until I set IMEX=1.  I then wanted to write to another XL template using an INSERT statement and I kept getting the "Operation must use an updateable query" error.   Turns out this had nothing to do with permissions.  Instead, for that query I had to make sure I removed the IMEX extended property and then the query worked.
December 27, 2006 9:53 AM
 

Ahmed ElAraby said:

Just add an empty row after the header to specify the type of the row, ex: numeric, datetime...etc.

then insert your rows. your inserted rows will be after that empty row. you can then reference these rows in other sheets.

November 6, 2007 6:10 AM
 

Cluebat-man to the rescue said:

This afternoon I was working on a Windows Forms app for the finance guys, to help them allocate costs

February 29, 2008 4:13 PM

Leave a Comment

(required)  
(optional)
(required)  
Add

About Brian Peek

Brian is a Microsoft C# MVP who has been actively developing in .NET since its early betas in 2000, and who has been developing solutions using Microsoft technologies and platforms for even longer. Along with .NET, Brian is particularly skilled in the languages of C, C++ and assembly language for a variety of CPUs. He is also well-versed in a wide variety of technologies including web development, document imaging, GIS, graphics, game development, and hardware interfacing. Brian has a strong background in developing applications for the health-care industry, as well as developing solutions for portable devices, such as tablet PCs and PDAs. Additionally, Brian has co-authored the book "Debugging ASP.NET" published by New Riders, and is currently co-authoring a book titled "10 Coding4Fun Projects with .NET for Programmers, Hobbyists, and Game Developers" to be published by O'Reilly in late 2008. Brian also writes for MSDN's Coding4Fun website, contributing articles on a monthly basis.
Copyright (C) 2008 Brian Peek
Powered by Community Server (Commercial Edition), by Telligent Systems