Best word wrap algorithm? [closed]
Here is a word-wrap algorithm I've written in C#. It should be fairly easy to translate into other languages (except perhaps for IndexOfAny
).
static char[] splitChars = new char[] { ' ', '-', '\t' };
private static string WordWrap(string str, int width)
{
string[] words = Explode(str, splitChars);
int curLineLength = 0;
StringBuilder strBuilder = new StringBuilder();
for(int i = 0; i < words.Length; i += 1)
{
string word = words[i];
// If adding the new word to the current line would be too long,
// then put it on a new line (and split it up if it's too long).
if (curLineLength + word.Length > width)
{
// Only move down to a new line if we have text on the current line.
// Avoids situation where wrapped whitespace causes emptylines in text.
if (curLineLength > 0)
{
strBuilder.Append(Environment.NewLine);
curLineLength = 0;
}
// If the current word is too long to fit on a line even on it's own then
// split the word up.
while (word.Length > width)
{
strBuilder.Append(word.Substring(0, width - 1) + "-");
word = word.Substring(width - 1);
strBuilder.Append(Environment.NewLine);
}
// Remove leading whitespace from the word so the new line starts flush to the left.
word = word.TrimStart();
}
strBuilder.Append(word);
curLineLength += word.Length;
}
return strBuilder.ToString();
}
private static string[] Explode(string str, char[] splitChars)
{
List<string> parts = new List<string>();
int startIndex = 0;
while (true)
{
int index = str.IndexOfAny(splitChars, startIndex);
if (index == -1)
{
parts.Add(str.Substring(startIndex));
return parts.ToArray();
}
string word = str.Substring(startIndex, index - startIndex);
char nextChar = str.Substring(index, 1)[0];
// Dashes and the likes should stick to the word occuring before it. Whitespace doesn't have to.
if (char.IsWhiteSpace(nextChar))
{
parts.Add(word);
parts.Add(nextChar.ToString());
}
else
{
parts.Add(word + nextChar);
}
startIndex = index + 1;
}
}
It's fairly primitive - it splits on spaces, tabs and dashes. It does make sure that dashes stick to the word before it (so you don't end up with stack\n-overflow) though it doesn't favour moving small hyphenated words to a newline rather than splitting them. It does split up words if they are too long for a line.
It's also fairly culturally specific, as I don't know much about the word-wrapping rules of other cultures.
Donald E. Knuth did a lot of work on the line breaking algorithm in his TeX typesetting system. This is arguably one of the best algorithms for line breaking - "best" in terms of visual appearance of result.
His algorithm avoids the problems of greedy line filling where you can end up with a very dense line followed by a very loose line.
An efficient algorithm can be implemented using dynamic programming.
A paper on TeX's line breaking.