How to run a SQL query on an Excel table?
I'm trying to create a sub-table from another table of all the last name fields sorted A-Z which have a phone number field that isn't null. I could do this pretty easy with SQL, but I have no clue how to go about running a SQL query within Excel. I'm tempted to import the data into postgresql and just query it there, but that seems a little excessive.
For what I'm trying to do, the SQL query SELECT lastname, firstname, phonenumber WHERE phonenumber IS NOT NULL ORDER BY lastname
would do the trick. It seems too simple for it to be something that Excel can't do natively. How can I run a SQL query like this from within Excel?
There are many fine ways to get this done, which others have already suggestioned. Following along the "get Excel data via SQL track", here are some pointers.
-
Excel has the "Data Connection Wizard" which allows you to import or link from another data source or even within the very same Excel file.
-
As part of Microsoft Office (and OS's) are two providers of interest: the old "Microsoft.Jet.OLEDB", and the latest "Microsoft.ACE.OLEDB". Look for them when setting up a connection (such as with the Data Connection Wizard).
-
Once connected to an Excel workbook, a worksheet or range is the equivalent of a table or view. The table name of a worksheet is the name of the worksheet with a dollar sign ("$") appended to it, and surrounded with square brackets ("[" and "]"); of a range, it is simply the name of the range. To specify an unnamed range of cells as your recordsource, append standard Excel row/column notation to the end of the sheet name in the square brackets.
-
The native SQL will (more or less be) the SQL of Microsoft Access. (In the past, it was called JET SQL; however Access SQL has evolved, and I believe JET is deprecated old tech.)
-
Example, reading a worksheet:
SELECT * FROM [Sheet1$]
-
Example, reading a range:
SELECT * FROM MyRange
-
Example, reading an unnamed range of cells:
SELECT * FROM [Sheet1$A1:B10]
-
There are many many many books and web sites available to help you work through the particulars.
Further notes
By default, it is assumed that the first row of your Excel data source contains column headings that can be used as field names. If this is not the case, you must turn this setting off, or your first row of data "disappears" to be used as field names. This is done by adding the optional HDR= setting
to the Extended Properties of the connection string. The default, which does not need to be specified, is HDR=Yes
. If you do not have column headings, you need to specify HDR=No
; the provider names your fields F1, F2, etc.
A caution about specifying worksheets: The provider assumes that your table of data begins with the upper-most, left-most, non-blank cell on the specified worksheet. In other words, your table of data can begin in Row 3, Column C without a problem. However, you cannot, for example, type a worksheet title above and to the left of the data in cell A1.
A caution about specifying ranges: When you specify a worksheet as your recordsource, the provider adds new records below existing records in the worksheet as space allows. When you specify a range (named or unnamed), Jet also adds new records below the existing records in the range as space allows. However, if you requery on the original range, the resulting recordset does not include the newly added records outside the range.
Data types (worth trying) for CREATE TABLE: Short, Long, Single, Double, Currency, DateTime, Bit, Byte, GUID, BigBinary, LongBinary, VarBinary, LongText, VarChar, Decimal
.
Connecting to "old tech" Excel (files with the xls extention): Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\MyFolder\MyWorkbook.xls;Extended Properties=Excel 8.0;
. Use the Excel 5.0 source database type for Microsoft Excel 5.0 and 7.0 (95) workbooks and use the Excel 8.0 source database type for Microsoft Excel 8.0 (97), 9.0 (2000) and 10.0 (2002) workbooks.
Connecting to "latest" Excel (files with the xlsx file extension): Provider=Microsoft.ACE.OLEDB.12.0;Data Source=Excel2007file.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;"
Treating data as text: IMEX setting treats all data as text. Provider=Microsoft.ACE.OLEDB.12.0;Data Source=Excel2007file.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=1";
(More details at http://www.connectionstrings.com/excel)
More information at http://msdn.microsoft.com/en-US/library/ms141683(v=sql.90).aspx, and at http://support.microsoft.com/kb/316934
Connecting to Excel via ADODB via VBA detailed at http://support.microsoft.com/kb/257819
Microsoft JET 4 details at http://support.microsoft.com/kb/275561
tl;dr; Excel does all of this natively - use filters and or tables
(http://office.microsoft.com/en-gb/excel-help/filter-data-in-an-excel-table-HA102840028.aspx)
You can open excel programatically through an oledb connection and execute SQL on the tables within the worksheet.
But you can do everything you are asking to do with no formulas just filters.
- click anywhere within the data you are looking at
- go to data on the ribbon bar
-
select "Filter" its about the middle and looks like a funnel
- you will have arrows on the tight hand side of each cell in the the first row of your table now
- click the arrow on phone number and de-select blanks (last option)
- click the arrow on last name and select a-z ordering (top option)
have a play around.. some things to note:
- you can select the filtered rows and pasty them somewhere else
- in the status bar on the left you will see how many rows meet you filter criteria out of the total number of rows. (e.g. 308 of 313 records found)
- you can filter by color in excel 2010 on wards
- Sometimes i create calculated columns that give statuses or cleaned versions of data you can then filter or sort by theses too. (e.g. like the formulae in the other answers)
DO it with filters unless you are going to do it a lot or you want to automate importing data somewhere or something.. but for completeness:
A c# option:
OleDbConnection ExcelFile = new OleDbConnection( String.Format( "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0;HDR=YES\"", filename));
ExcelFile.Open();
a handy place to start is to take a look at the schema as there may be more there than you think:
List<String> excelSheets = new List<string>();
// Add the sheet name to the string array.
foreach (DataRow row in dt.Rows) {
string temp = row["TABLE_NAME"].ToString();
if (temp[temp.Length - 1] == '$') {
excelSheets.Add(row["TABLE_NAME"].ToString());
}
}
then when you want to query a sheet:
OleDbDataAdapter da = new OleDbDataAdapter("select * from [" + sheet + "]", ExcelFile);
dt = new DataTable();
da.Fill(dt);
NOTE - Use Tables in excel!:
Excel has "tables" functionality that make data behave more like a table.. this gives you some great benefits but is not going to let you do every type of query.
http://office.microsoft.com/en-gb/excel-help/overview-of-excel-tables-HA010048546.aspx
For tabular data in excel this is my default.. first thing i do is click into the data then select "format as table" from the home section on the ribbon. this gives you filtering, and sorting by default and allows you to access the table and fields by name (e.g. table[fieldname] ) this also allows aggregate functions on columns e.g. max and average