What is the best practice for formatting logs?
I'm writing a piece of honeypot software that will have extensive logging of interactions with it, I plan to log in plaintext .log
files.
I have two questions, from someone who isn't too familiar with how servers log.
Firstly how shall I break up my log files, I'm assuming after running this for a month I don't want one big
.log
file, do I do this by day, month, year? Is there some standard for it?The format of each line, do I have one standard delimiter that is whatever, *, -, +, anything? Is there a standard anywhere (my googling hasn't brought up much)?
I like this format for log files:
$ python simple_logging_module.py
2005-03-19 15:10:26,618 - simple_example - DEBUG - debug message
2005-03-19 15:10:26,620 - simple_example - INFO - info message
2005-03-19 15:10:26,695 - simple_example - WARNING - warn message
2005-03-19 15:10:26,697 - simple_example - ERROR - error message
2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message
This is from python's logging module. I usually have a file per day, one folder for each month, one folder for each year. You'll get huge log files that you can't edit properly otherwise.
logs/
2009/
January/
01012009.log
02012009.log
...
February/
...
2008/
...
There is no standard for such a logging. And rolling, layout of files, it all depends on what you need. In general I have faced 3 main scenarios:
- All in one file. Seems not an option for you.
-
Fixed size rolling. You define size when new log file is created once current file is bigger than defined value. Usually there is support out of a box for this in most
log4anything
packages. -
Total custom rolling. I've seen layouts like this
- Every day gets it's own directory named in format of
YYYYMMDD
. If you don't stage your logs consider directory layout like YYYY\MM\YYYYMMDD as shown in other answers. - Inside this directory fixed size rolling should be used.
- Every file has name
logfile_yyyymmdd_ccc.log
whereccc
is increasing number. Adding time to file name is also a good idea (eg. to easily judge how many logs per minute you are generating) - To save space every log is compressed with zip automatically.
- Last 3 days are allways kept uncompressed so you can have a quick access with
UNIX
text tools.
- Every day gets it's own directory named in format of
This custom one looked like this
logs/ 20090101/ logfile_20090101_001.zip logfile_20090101_002.zip ... 20090102/ logfile_20090102_001.zip logfile_20090102_002.zip logfile_20090101_001.log logfile_20090101_002.log logfile_20090102_001.log logfile_20090102_002.log
There is also some bunch of good practices for good logging:
- Always keep date in your log file name
- Always add some name to your log file name. It will help you in the future to distinguish log files from different instances of your system.
- Always log time and date (preferably up to milliseconds resolution) for every log event.
- Always store your date as YYYYMMDD. Everywhere. In filename, inside of logfile. It greatly helps with sorting. Some separators are allowed (eg. 2009-11-29).
- In general avoid storing logs in database. In is another point of failure in your logging schema.
- If you have multithreaded system always log thread id.
- If you have multi process system always log process id.
- If you have many computers always log computer id.
- Make sure you can process logs later. Just try importing one log file into database or
Excel
. If it takes longer than 30 seconds it means your logging is wrong. This includes:- Choosing good internal format of logging. I prefer space delimeted since it works nice with
Unix
text tools and withExcel
. - Choosing good format for date/time so you can easily import into some SQL databse or Excel for further proccesing.
- Choosing good internal format of logging. I prefer space delimeted since it works nice with
To break up your log files, you could use an external application like logrotate and let it take care of the dirty work.
As for the format of each line, there's no standard, so you should use what works best for you. If you're going to automatically parse the log file later, then you might want to keep that in mind as you format the log output.
I recommend you use a well-known logging library. Most logging libraries support rollover for you. Log4Net (.net) / Log4J (java) is a particularly good logging library to use, and it has a lot of options that you may find useful. Use whatever rollover interval works best for you. For a honeypot application, I think you will find hourly or daily turnover to work best. You could also use a fixed limit, like 256mb, to ensure that your log efforts don't overrun the available free disk space. Log4Net/Log4J supports this as well.
Log4J @ Apache.Org
Log4Net @ Apache.Org
The format of your logfiles should be setup according to your needs. It is highly desirable to use a delimiter that is unlikely to show up in your log input. For your application, this may not be possible. Under typical circumstances, some parties use spaces (NCSA logs), some parties use commas (to make CSV files), some parties use tabs (to make tab-delimited files). Each of these has their own benefits and drawbacks.