How can I add line breaks in an XML file from the Unix command line?
I have a large XML file. From the Unix command line, I'd like to add a newline after every >
.
I have tried using sed for this, with no luck:
sed -i '' -e's/>/>\n/' file.xml
This just inserts the letter n
, not a newline. I've also tried \r
and \r\n
.
How can I do this?
(FYI - I'm using zshell in OSX.)
Solution 1:
Script
Use indentxml file.xml
to view, indentxml file.xml > new.xml
to edit.
Where indentxml is
#!/usr/bin/perl
#
# Purpose: Read an XML file and indent it for ease of reading
# Author: RedGrittyBrick 2011.
# Licence: Creative Commons Attribution-ShareAlike 3.0 Unported License
#
use strict;
use warnings;
my $filename = $ARGV[0];
die "Usage: $0 filename\n" unless $filename;
open my $fh , '<', $filename
or die "Can't read '$filename' because $!\n";
my $xml = '';
while (<$fh>) { $xml .= $_; }
close $fh;
$xml =~ s|>[\n\s]+<|><|gs; # remove superfluous whitespace
$xml =~ s|><|>\n<|gs; # split line at consecutive tags
my $indent = 0;
for my $line (split /\n/, $xml) {
if ($line =~ m|^</|) { $indent--; }
print ' 'x$indent, $line, "\n";
if ($line =~ m|^<[^/\?]|) { $indent++; } # indent after <foo
if ($line =~ m|^<[^/][^>]*>[^<]*</|) { $indent--; } # but not <foo>..</foo>
if ($line =~ m|^<[^/][^>]*/>|) { $indent--; } # and not <foo/>
}
Parser
Of course, the canonical answer is to use a proper XML parser.
# cat line.xml
<a><b>Bee</b><c>Sea</c><d><e>Eeeh!</e></d></a>
# perl -MXML::LibXML -e 'print XML::LibXML->new->parse_file("line.xml")->toString(1)'
<?xml version="1.0"?>
<a>
<b>Bee</b>
<c>Sea</c>
<d>
<e>Eeeh!</e>
</d>
</a>
Utility
But maybe the easiest is
# xmllint --format line.xml
<?xml version="1.0"?>
<a>
<b>Bee</b>
<c>Sea</c>
<d>
<e>Eeeh!</e>
</d>
</a>
Solution 2:
There is no escape sequence, you need to literally use the newline character. So for this input
$ cat /tmp/example
<this is one tag><this is another tag><here again>
You would have to use
$ sed -e 's_>_&\
_g' /tmp/example
which produces
<this is one tag>
<this is another tag>
<here again>
Note that the newline has to be escaped (as shown above)