How line ending conversions work with git core.autocrlf between different operating systems
I've read a lot of different questions and answers on Stack Overflow as well as git documentation on how the core.autocrlf setting works.
This is my understanding from what I've read:
Unix and Mac OSX (pre-OSX uses CR) clients use LF line endings.
Windows clients use CRLF line endings.
When core.autocrlf is set to true on the client, the git repository always stores files in LF line ending format and line endings in files on the client are converted back and forth on check out / commit for clients (i.e. Windows) that use non-LF line endings, no matter what format the line endings files are on the client (this disagrees with Tim Clem's definition - see update below).
Here is a matrix that tries to document the same for the 'input' and 'false' settings of core.autocrlf with question marks where I'm not sure of line ending conversion behavior.
My questions are:
- What should the question marks be?
- Is this matrix correct for the "non-question marks"?
I'll update the question marks from the answers as consensus appears to be formed.
core.autocrlf value true input false ---------------------------------------------------------- commit | convert ? ? new | to LF (convert to LF?) (no conversion?) commit | convert to ? no existing | LF (convert to LF?) conversion checkout | convert to ? no existing | CRLF (no conversion?) conversion
I'm not really looking for opinions on the pros and cons of the various settings. I'm just looking for data which makes it clear how to expect git to operate with each of the three settings.
--
Update 04/17/2012: After reading the article by Tim Clem linked by JJD in the comments, I have modified some of the values in the "unknown" values in the table above, as well as changing "checkout existing | true to convert to CRLF instead of convert to client". Here are the definitions he gives, which are more clear than anything I've seen elsewhere:
core.autocrlf = false
This is the default, but most people are encouraged to change this immediately. The result of using false is that Git doesn’t ever mess with line endings on your file. You can check in files with LF or CRLF or CR or some random mix of those three and Git does not care. This can make diffs harder to read and merges more difficult. Most people working in a Unix/Linux world use this value because they don’t have CRLF problems and they don’t need Git to be doing extra work whenever files are written to the object database or written out into the working directory.
core.autocrlf = true
This means that Git will process all text files and make sure that CRLF is replaced with LF when writing that file to the object database and turn all LF back into CRLF when writing out into the working directory. This is the recommended setting on Windows because it ensures that your repository can be used on other platforms while retaining CRLF in your working directory.
core.autocrlf = input
This means that Git will process all text files and make sure that CRLF is replaced with LF when writing that file to the object database. It will not, however, do the reverse. When you read files back out of the object database and write them into the working directory they will still have LFs to denote the end of line. This setting is generally used on Unix/Linux/OS X to prevent CRLFs from getting written into the repository. The idea being that if you pasted code from a web browser and accidentally got CRLFs into one of your files, Git would make sure they were replaced with LFs when you wrote to the object database.
Tim's article is excellent, the only thing I can think of that is missing is that he assumes the repository is in LF format, which is not necessarily true, especially for Windows only projects.
Comparing Tim's article to the highest voted answer to date by jmlane shows perfect agreement on the true and input settings and disagreement on the false setting.
The best explanation of how core.autocrlf
works is found on the gitattributes man page, in the text
attribute section.
This is how core.autocrlf
appears to work currently (or at least since v1.7.2 from what I am aware):
core.autocrlf = true
- Text files checked-out from the repository that have only
LF
characters are normalized toCRLF
in your working tree; files that containCRLF
in the repository will not be touched - Text files that have only
LF
characters in the repository, are normalized fromCRLF
toLF
when committed back to the repository. Files that containCRLF
in the repository will be committed untouched.
core.autocrlf = input
- Text files checked-out from the repository will keep original EOL characters in your working tree.
- Text files in your working tree with
CRLF
characters are normalized toLF
when committed back to the repository.
core.autocrlf = false
-
core.eol
dictates EOL characters in the text files of your working tree. -
core.eol = native
by default, which means working tree EOLs will depend on where git is running:CRLF
on a Windows machine, orLF
in *nix. - Repository
gitattributes
settings determines EOL character normalization for commits to the repository (default is normalization toLF
characters).
I've only just recently researched this issue and I also find the situation to be very convoluted. The core.eol
setting definitely helped clarify how EOL characters are handled by git.
The issue of EOLs in mixed-platform projects has been making my life miserable for a long time. The problems usually arise when there are already files with different and mixed EOLs already in the repo. This means that:
- The repo may have different files with different EOLs
- Some files in the repo may have mixed EOL, e.g. a combination of
CRLF
andLF
in the same file.
How this happens is not the issue here, but it does happen.
I ran some conversion tests on Windows for the various modes and their combinations.
Here is what I got, in a slightly modified table:
| Resulting conversion when | Resulting conversion when | committing files with various | checking out FROM repo - | EOLs INTO repo and | with mixed files in it and | core.autocrlf value: | core.autocrlf value: -------------------------------------------------------------------------------- File | true | input | false | true | input | false -------------------------------------------------------------------------------- Windows-CRLF | CRLF -> LF | CRLF -> LF | as-is | as-is | as-is | as-is Unix -LF | as-is | as-is | as-is | LF -> CRLF | as-is | as-is Mac -CR | as-is | as-is | as-is | as-is | as-is | as-is Mixed-CRLF+LF | as-is | as-is | as-is | as-is | as-is | as-is Mixed-CRLF+LF+CR | as-is | as-is | as-is | as-is | as-is | as-is
As you can see, there are 2 cases when conversion happens on commit (3 left columns). In the rest of the cases the files are committed as-is.
Upon checkout (3 right columns), there is only 1 case where conversion happens when:
-
core.autocrlf
istrue
and - the file in the repo has the
LF
EOL.
Most surprising for me, and I suspect, the cause of many EOL problems is that there is no configuration in which mixed EOL like CRLF
+LF
get normalized.
Note also that "old" Mac EOLs of CR
only also never get converted.
This means that if a badly written EOL conversion script tries to convert a mixed ending file with CRLF
s+LF
s, by just converting LF
s to CRLF
s, then it will leave the file in a mixed mode with "lonely" CR
s wherever a CRLF
was converted to CRCRLF
.
Git will then not convert anything, even in true
mode, and EOL havoc continues. This actually happened to me and messed up my files really badly, since some editors and compilers (e.g. VS2010) don't like Mac EOLs.
I guess the only way to really handle these problems is to occasionally normalize the whole repo by checking out all the files in input
or false
mode, running a proper normalization and re-committing the changed files (if any). On Windows, presumably resume working with core.autocrlf true
.
core.autocrlf
value does not depend on OS type but on Windows default value is true
and for Linux - input
. I explored 3 possible values for commit and checkout cases and this is the resulting table:
╔═══════════════╦══════════════╦══════════════╦══════════════╗
║ core.autocrlf ║ false ║ input ║ true ║
╠═══════════════╬══════════════╬══════════════╬══════════════╣
║ ║ LF => LF ║ LF => LF ║ LF => LF ║
║ git commit ║ CR => CR ║ CR => CR ║ CR => CR ║
║ ║ CRLF => CRLF ║ CRLF => LF ║ CRLF => LF ║
╠═══════════════╬══════════════╬══════════════╬══════════════╣
║ ║ LF => LF ║ LF => LF ║ LF => CRLF ║
║ git checkout ║ CR => CR ║ CR => CR ║ CR => CR ║
║ ║ CRLF => CRLF ║ CRLF => CRLF ║ CRLF => CRLF ║
╚═══════════════╩══════════════╩══════════════╩══════════════╝
Things are about to change on the "eol conversion" front, with the upcoming Git 1.7.2:
A new config setting core.eol
is being added/evolved:
This is a replacement for the 'Add "
core.eol
" config variable' commit that's currently inpu
(the last one in my series).
Instead of implying that "core.autocrlf=true
" is a replacement for "* text=auto
", it makes explicit the fact thatautocrlf
is only for users who want to work with CRLFs in their working directory on a repository that doesn't have text file normalization.
When it is enabled, "core.eol" is ignored.Introduce a new configuration variable, "
core.eol
", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory.
It defaults to "native
", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf
" overridescore.eol
.
This means that:[core] autocrlf = true
puts CRLFs in the working directory even if
core.eol
is set to "lf
".core.eol:
Sets the line ending type to use in the working directory for files that have the
text
property set.
Alternatives are 'lf', 'crlf' and 'native', which uses the platform's native line ending.
The default value isnative
.
Other evolutions are being considered:
For 1.8, I would consider making
core.autocrlf
just turn on normalization and leave the working directory line ending decision to core.eol, but that will break people's setups.
git 2.8 (March 2016) improves the way core.autocrlf
influences the eol:
See commit 817a0c7 (23 Feb 2016), commit 6e336a5, commit df747b8, commit df747b8 (10 Feb 2016), commit df747b8, commit df747b8 (10 Feb 2016), and commit 4b4024f, commit bb211b4, commit 92cce13, commit 320d39c, commit 4b4024f, commit bb211b4, commit 92cce13, commit 320d39c (05 Feb 2016) by Torsten Bögershausen (tboegi
).
(Merged by Junio C Hamano -- gitster
-- in commit c6b94eb, 26 Feb 2016)
convert.c
: refactorcrlf_action
Refactor the determination and usage of
crlf_action
.
Today, when no "crlf
" attribute are set on a file,crlf_action
is set toCRLF_GUESS
. UseCRLF_UNDEFINED
instead, and search for "text
" or "eol
" as before.Replace the old
CRLF_GUESS
usage:
CRLF_GUESS && core.autocrlf=true -> CRLF_AUTO_CRLF
CRLF_GUESS && core.autocrlf=false -> CRLF_BINARY
CRLF_GUESS && core.autocrlf=input -> CRLF_AUTO_INPUT
Make more clear, what is what, by defining:
- CRLF_UNDEFINED : No attributes set. Temparally used, until core.autocrlf
and core.eol is evaluated and one of CRLF_BINARY,
CRLF_AUTO_INPUT or CRLF_AUTO_CRLF is selected
- CRLF_BINARY : No processing of line endings.
- CRLF_TEXT : attribute "text" is set, line endings are processed.
- CRLF_TEXT_INPUT: attribute "input" or "eol=lf" is set. This implies text.
- CRLF_TEXT_CRLF : attribute "eol=crlf" is set. This implies text.
- CRLF_AUTO : attribute "auto" is set.
- CRLF_AUTO_INPUT: core.autocrlf=input (no attributes)
- CRLF_AUTO_CRLF : core.autocrlf=true (no attributes)
As torek adds in the comments:
all these translations (any EOL conversion from
eol=
orautocrlf
settings, and "clean
" filters) are run when files move from work-tree to index, i.e., duringgit add
rather than atgit commit
time.
(Note thatgit commit -a
or--only
or--include
do add files to the index at that time, though.)
For more on that, see "What is difference between autocrlf and eol".