source of historical stock data [closed]
I'm trying to make a stock market simulator (perhaps eventually growing into a predicting AI), but I'm having trouble finding data to use. I'm looking for a (hopefully free) source of historical stock market data.
Ideally, it would be a very fine-grained (second or minute interval) data set with price and volume of every symbol on NASDAQ and NYSE (and perhaps others if I get adventurous). Does anyone know of a source for such info?
I found this question which indicates Yahoo offers historical data in CSV format, but I've been unable to find out how to get it in a cursory examination of the site linked.
I also don't like the idea of downloading the data piecemeal in CSV files... I imagine Yahoo would get upset and shut me off after the first few thousand requests.
I also discovered another question that made me think I'd hit the jackpot, but unfortunately that OpenTick site seems to have closed its doors... too bad, since I think they were exactly what I wanted.
I'd also be able to use data that's just open/close price and volume of every symbol every day, but I'd prefer all the data if I can get it. Any other suggestions?
Solution 1:
Let me add my 2¢, it's my job to get good and clean data for a hedge-fund, I've seen quite a lot of data feeds and historical data providers. This is mainly about US stock data.
To start with, if you have some money don't bother with downloading data from Yahoo, get the end of day data straight from CSI data, this is where Yahoo gets their EOD data as well AFAIK. They have an API where you can extract the data to whatever format you want. I think the yearly subscription for data is a few $100 bucks.
The main problem with downloading data from a free service is that you only get stocks that still exist, this is called Survivorship Bias and can give you wrong results if you look at many stocks, because you'll only include the ones that made it so far and not the ones that were de-listed.
For playing around with some intraday data I'd look into IQFeed, they provide several APIs to extract historical data, although they are mainly an outfit for real-time feeds. But here there are quite a few options, some brokers even provide historical data downloads via their APIs, so just pick your poison.
BUT usually all of this data is not very clean, once you really start back testing you'll see that certain stocks are missing or appear as two different symbols, or stock splits are not properly accounted for, etc. And then you realize that historical dividend data is need as well and so you start running in circles, patching data together from 100 different data sources and so on. So to start with a "discount" data feed will do, but as soon as you run more comprehensive backtests you might run into problems depending on what you do. If you just look at, let's say, the S&P 500 stocks this will not be so much a problem though and a "cheap" intraday feed will do.
What you will not find is free intraday data. I mean you might find some examples, I'm sure there's somewhere 5 years of MSFT tick data floating around but that will not get you very far.
Then, if you need the real stuff (level II order book, all ticks as they have happened at all exchanges) one "affordable", yet excellent option is Nanex. They'll actually ship you a drive with terabytes of data. If I remember right its about $3k-4K per year of data. But trust me, once you understand how hard it is to get good intraday data, you won't think this is very much money at all.
Not to discourage you but to get good data is hard, so hard in fact that many hedge-funds and banks spend hundreds of thousands of dollars a month to get data they can trust. Again, you can start somewhere and then go from there but it's good to see it a bit in context.
Edit: The answer above is from my own experience. This write-up from Caltech about available data feeds will give more insights, and especially recommends QuantQuote.
Solution 2:
THIS ANSWER IS NO LONGER ACCURATE AS THE YAHOO FEED HAS CEASED TO EXIST
Using Yahoo's CSV approach above you can also get historical data! You can reverse engineer the following example:
http://ichart.finance.yahoo.com/table.csv?s=YHOO&d=0&e=28&f=2010&g=d&a=3&b=12&c=1996&ignore=.csv
Essentially:
sn = TICKER
a = fromMonth-1
b = fromDay (two digits)
c = fromYear
d = toMonth-1
e = toDay (two digits)
f = toYear
g = d for day, m for month, y for yearly
The complete list of parameters:
a Ask
a2 Average Daily Volume
a5 Ask Size
b Bid
b2 Ask (Real-time)
b3 Bid (Real-time)
b4 Book Value
b6 Bid Size
c Change & Percent Change
c1 Change
c3 Commission
c6 Change (Real-time)
c8 After Hours Change (Real-time)
d Dividend/Share
d1 Last Trade Date
d2 Trade Date
e Earnings/Share
e1 Error Indication (returned for symbol changed / invalid)
e7 EPS Estimate Current Year
e8 EPS Estimate Next Year
e9 EPS Estimate Next Quarter
f6 Float Shares
g Day's Low
h Day's High
j 52-week Low
k 52-week High
g1 Holdings Gain Percent
g3 Annualized Gain
g4 Holdings Gain
g5 Holdings Gain Percent (Real-time)
g6 Holdings Gain (Real-time)
i More Info
i5 Order Book (Real-time)
j1 Market Capitalization
j3 Market Cap (Real-time)
j4 EBITDA
j5 Change From 52-week Low
j6 Percent Change From 52-week Low
k1 Last Trade (Real-time) With Time
k2 Change Percent (Real-time)
k3 Last Trade Size
k4 Change From 52-week High
k5 Percent Change From 52-week High
l Last Trade (With Time)
l1 Last Trade (Price Only)
l2 High Limit
l3 Low Limit
m Day's Range
m2 Day's Range (Real-time)
m3 50-day Moving Average
m4 200-day Moving Average
m5 Change From 200-day Moving Average
m6 Percent Change From 200-day Moving Average
m7 Change From 50-day Moving Average
m8 Percent Change From 50-day Moving Average
n Name
n4 Notes
o Open
p Previous Close
p1 Price Paid
p2 Change in Percent
p5 Price/Sales
p6 Price/Book
q Ex-Dividend Date
r P/E Ratio
r1 Dividend Pay Date
r2 P/E Ratio (Real-time)
r5 PEG Ratio
r6 Price/EPS Estimate Current Year
r7 Price/EPS Estimate Next Year
s Symbol
s1 Shares Owned
s7 Short Ratio
t1 Last Trade Time
t6 Trade Links
t7 Ticker Trend
t8 1 yr Target Price
v Volume
v1 Holdings Value
v7 Holdings Value (Real-time)
w 52-week Range
w1 Day's Value Change
w4 Day's Value Change (Real-time)
x Stock Exchange
y Dividend Yield
Solution 3:
I know you wanted "free", but I'd seriously consider getting the data from csidata.com for about $300/year, if I were you.
It's what yahoo uses to supply their data.
It comes with a decent API, and the data is (as far as I can tell) very clean.
You get 10 years of history when you subscribe, and then nightly updates afterward.
They also take care of all sorts of nasty things like splits and dividends for you. If you haven't yet discovered the joy that is data-cleaning, you won't realize how much you need this, until the first time your ATS (Automated Trading System) thinks some stock is really really cheap, only because it split 2:1 and you didn't notice.
Solution 4:
Intro:
From yahoo you can get EOD (end of day) historical prices, or real-time prices. The EOD prices are amazingly simple to download. See my blog for explanations on how to get the data and for C# code examples.
I'm in the process of writing a real-time data feed "engine" that downloads and stores the real-time prices in a database. The engine will initially be able to download historical prices from Yahoo and Interactive Brokers and it will be able to store the data in a database of your choice: MS SQL, MySQL, SQLite, etc. It's open source, but I'll post more information on my blog when I get closer to releasing it (within a couple of days).
Another option is eclipse trader... it allows you to record the historical data with granularity as low as 1 minute and stores the prices locally in a text file. It basically downloads the real-time data from Yahoo with a 15 minute delay. Since I wanted a more robust solution and I'm working on a big school project for which we need data, I decided to write my own data feed engine (which I mentioned above).
Sample Code:
Here is sample C# code that demonstrates how to download real-time data:
public void Start()
{
string url = "http://finance.yahoo.com/d/quotes.csv?s=MSFT+GOOG&f=snl1d1t1ohgdr";
//Get page showing the table with the chosen indices
HttpWebRequest request = null;
IDatabase database =
DatabaseFactory.CreateDatabase(
DatabaseFactory.DatabaseType.SQLite);
//csv content
try
{
while (true)
{
using (Stream file = File.Create("quotes.csv"))
{
request = (HttpWebRequest)WebRequest.CreateDefault(new Uri(url));
request.Timeout = 30000;
using (var response = (HttpWebResponse)request.GetResponse())
using (Stream input = response.GetResponseStream())
{
CopyStream(input, file);
}
}
Console.WriteLine("------------------------------------------------");
database.InsertData(Directory.GetCurrentDirectory() + "/quotes.csv");
File.Delete("quotes.csv");
Thread.Sleep(10000); // 10 seconds
}
}
catch (Exception exc)
{
Console.WriteLine(exc.ToString());
Console.ReadKey();
}
}
Database:
On the database side I use an OleDb
connection to the CSV file to populate a DataSet
and then I update my actual database via the DataSet
, it basically makes it possible to match all of the columns from the CSV file returned from Yahoo directly to your database (if your database does not support batch inserts of CSV data, like SQLite). Otherwise, inserting the data is a one-liner... just batch insert the CSV into your database.
You can read more about the formatting of the url here: http://www.gummy-stuff.org/Yahoo-data.htm
Solution 5:
A data set of every symbol on the NASDAQ and NYSE on a second or minute interval is going to be massive.
Let's say there are a total of 4000 companies listed on both exchanges (this is probably on the very low side since there are over 3200 companies listed on the NASDAQ). For data at a second interval, assuming there are 6.5 trading hours in a day, that would give you 23400 data points per day per company, or about 93,600,000 data points in total for that one day. Assuming 200 trading days in a year, thats about 18,720,000,000 data points for just one year.
Maybe you want to start with a smaller set first?