How are text editors generally implemented?

One technique that's common (especially in older editors) is called a split buffer. Basically, you "break" the text into everything before the cursor and everything after the cursor. Everything before goes at the beginning of the buffer. Everything after goes at the end of the buffer.

When the user types in text, it goes into the empty space in between without moving any data. When the user moves the cursor, you move the appropriate amount of text from one side of the "break" to the other. Typically there's a lot of moving around a single area, so you're usually only moving small amounts of text at a time. The biggest exception is if you have a "go to line xxx" kind of capability.

Charles Crowley has written a much more complete discussion of the topic. You might also want to look at The Craft of Text Editing, which covers split buffers (and other possibilities) in much greater depth.


A while back, I wrote my own text editor in Tcl (actually, I stole the code from somewhere and extended it beyond recognition, ah the wonders of open source).

As you mentioned, doing string operations on very, very large strings can be expensive. So the editor splits the text into smaller strings at each newline ("\n" or "\r" or "\r\n"). So all I'm left with is editing small strings at line level and doing list operations when moving between lines.

The other advantage of this is that it is a simple and natural concept to work with. My mind already considers each line of text to be separate reinforced by years of programming where newlines are stylistically or syntactically significant.

It also helps that the use case for my text editor is as a programmers editor. For example, I implemented syntax hilighting but not word/line wrap. So in my case there is a 1:1 map between newlines in text and the lines drawn on screen.

In case you want to have a look, here's the source code for my editor: http://wiki.tcl.tk/16056

It's not a toy BTW. I use it daily as my standard console text editor unless the file is too large to fit in RAM (Seriously, what text file is? Even novels, which are typically 4 to 5 MB, fit in RAM. I've only seen log files grow to hundreds of MB).


Depending on the amount of text that needs to be in the editor at one time, a one string for the entire buffer approach would probably be fine. I think Notepad does this-- ever notice how much slower it gets to insert text in a large file?

Having one string per line in a hash table seems like a good compromise. It would make navigation to a particular line and delete/paste efficient without much complexity.

If you want to implement an undo function, you'll want a representation that allows you to go back to previous versions without storing 30 copies of the entire file for 30 changes, although again that would probably be fine if the file was sufficiently small.