How to avoid tables being split between pages in a Document

Solution 1:

Document tree

If you look closely on the structure of a Google Doc, you'll notice that it is a tree data structure and not a document in a broader sense (though one might argue that chapter/section/paragraph is also a tree-like structure).

I suspect the abovementioned is the reason why the API lacks in page related methods - although they are likely be to added in the future

Since the document is a tree, the problem of determining when a page split occures can be reduced to calculating the point when the sum of child heights overflows the page height.

Problem subdivision

To correctly get the spot at which the split will occure (and keep track of such elements), we need to solve several subproblems:

  1. Get the height, width and margins of a page
  2. Traverse elements, keeping track of total height. At each step:
    1. Calculate full height of the element.
    2. Add the height to total, check if overflow happened.
    3. If total overflows page height, the last outermost (closest to root) element is guaranteed to be split. Add the element to list, cache the overflow and reset the total (new page).

Observations

  1. When a PageBreak is encountered, total counter can be reset as the next element will be at the top (offset by overflow). Note that as PageBreak is not standalone (it is wrapped into Paragraph or ListItem), it can be encountered at any time.
  2. Only the highest TableCell in a TableRow counts towards total height.
  3. Some elements inherit from ContainerElement, meaning their height equals to sum of their children heights + top and bottom margins.

Helper functions

First, there are a couple helper functions we can define (see JSDoc comments for details):

/**
 * @summary checks if element is a container
 * @param {GoogleAppsScript.Document.Element} elem
 * @param {GoogleAppsScript.Document.ElementType} type
 * @returns {boolean}
 */
const isContainer = (elem, type) => {
  const Types = DocumentApp.ElementType;

  const containerTypes = [
    Types.BODY_SECTION,
    Types.EQUATION,
    Types.EQUATION_FUNCTION,
    Types.FOOTER_SECTION,
    Types.HEADER_SECTION,
    Types.LIST_ITEM,
    Types.PARAGRAPH,
    Types.TABLE,
    Types.TABLE_CELL,
    Types.TABLE_ROW,
    Types.TABLE_OF_CONTENTS
  ];

  return containerTypes.includes(type || elem.getType());
};

/**
 * @summary gets aspect ratio of a font
 * @param {string} fontFamily 
 * @returns {number}
 * @default .52
 */
const getAspectRatio = (fontFamily) => {
  const aspects = {
    Arial: .52,
    Calibri: .47,
    Courier: .43,
    Garamond: .38,
    Georgia: .48,
    Helvetica: .52,
    Times: .45,
    Verdana: .58
  };

  return aspects[fontFamily] || .618;
};

/**
 * @summary checks if Element is direct child of Body
 * @param {GoogleAppsScript.Document.Element} elem 
 * @returns {boolean}
 */
const isTopLevel = (elem) => {
  const { ElementType } = DocumentApp;
  return elem.getParent().getType() === ElementType.BODY_SECTION;
};

/**
 * @summary copies non-object array values as is
 * @param {any[]} arr 
 * @returns {any[]}
 */
const shallowCopy = (arr) => {
  return arr.map(el => el);
};

State tracking

Since we have to track overflow, last element processed, etc, I opted to add a Tracker object that takes care of state management. Several features of the tracker require an explanation:

processResults method:

  1. Ensures element bounds (page size) are restored after heights for nested elements have been calced (setDimensions, setMargins, resetDimensions and resetMargins methods with private inits allow us to manipulate bounds).
  2. Modifies processed heights for specific element types:
    1. Height of Body is set to 0 (or it will duplicate child heights).
    2. TableRow's height is set to the highest TableCell.
    3. Other type heights are summed with child heights.

handleOverflow method:

  1. Prevents nested elements from being added to list of splits (can be safely removed).
  2. Resets total height to latest overflow offset (height of part of the element split).

totalHeight setter:

At each recalc looks for height overflow and invokes overflow handler if needed.

/**
 * @typedef {object} Tracker
 * @property {Map.<GoogleAppsScript.Document.ElementType, function>} callbacks map of height processers
 * @property {?GoogleAppsScript.Document.Element} currElement current elemenet processed
 * @property {number[]} dimensions exposes dimensions of a page
 * @property {function(): void} handleOverflow handles page height overflow
 * @property {function(): boolean} isOverflow checks if height overflew page height
 * @property {number[]} margins exposes margins of a page
 * @property {number} overflow getter for overflow status
 * @property {function(boolean, ...number): number} processResults process callback results
 * @property {function(): Tracker} resetDimensions restores old dimensions
 * @property {function(): Tracker} resetMargins restores old margins
 * @property {function(): void} resetOverflow resets most resent overflow
 * @property {function(): void} resetTotalHeight resets accumulated height
 * @property {function(...number): void} setDimensions reinits containing dimensions
 * @property {function(...number): void} setMargins reinits containing margins
 * @property {function(string, ...any): void} setStore abstract property store setter
 * @property {number} significantWidth exposes significant page width
 * @property {number} significantHeight exposes significant page height
 * @property {GoogleAppsScript.Document.Element[]} splits list of elements split over page
 * @property {number} totalHeight total height
 * 
 * @summary factory for element trackers
 * @param {Tracker#callbacks} callbacks
 * @param {Bounds} bounds
 * @param {Tracker#splits} [splits]
 * @returns {Tracker}
 */
function makeTracker(callbacks, bounds, splits = []) {

  const inits = {
    dimensions: shallowCopy(bounds.dimensions),
    margins: shallowCopy(bounds.margins)
  };

  const privates = {
    bounds,
    current: null,
    currentType: null,
    currOverflow: 0,
    needsReset: 0,
    totalHeight: 0
  };

  const { ElementType } = DocumentApp;

  const ResultProcessors = new Map()
    .set(ElementType.BODY_SECTION, () => 0)
    .set(ElementType.TABLE_ROW, (results) => {
      return results.reduce((result, acc) => result > acc ? result : acc, 0);
    })
    .set("default", (results) => {
      return results.reduce((result, acc) => result + acc, 0);
    });

  return ({
    callbacks,
    splits,

    get currElement() {
      return privates.current;
    },

    set currElement(element) {
      privates.current = element;
      privates.currentType = element.getType();
    },

    get dimensions() {
      const { bounds } = privates;
      return bounds.dimensions;
    },

    get margins() {
      const { bounds } = privates;
      return bounds.margins;
    },

    get overflow() {
      const { bounds, totalHeight } = privates;
      return totalHeight - bounds.significantHeight;
    },

    get significantHeight() {
      const { bounds } = privates;
      return bounds.significantHeight;
    },

    get significantWidth() {
      const { bounds } = privates;
      return bounds.significantWidth;
    },

    get totalHeight() {
      return privates.totalHeight;
    },

    /**
     * @summary total height setter
     * @description intercepts & recalcs overflow 
     * @param {number} height
     */
    set totalHeight(height) {

      privates.totalHeight = height;

      if (this.isOverflow()) {
        privates.currOverflow = this.overflow;
        this.handleOverflow();
      }
    },

    isOverflow() {
      return this.overflow > 0;
    },

    handleOverflow() {
      const { currElement, splits } = this;

      const type = privates.currentType;

      const ignore = [
        ElementType.TEXT,
        ElementType.TABLE_ROW
      ];

      if (!ignore.includes(type)) {
        splits.push(currElement);
      }

      this.resetTotalHeight();
    },

    processResults(...results) {
      this.resetMargins().resetDimensions();

      const { currentType } = privates;

      const processed = (
        ResultProcessors.get(currentType) ||
        ResultProcessors.get("default")
      )(results);

      return processed;
    },

    resetDimensions() {
      const { bounds } = privates;
      const { dimensions } = bounds;

      dimensions.length = 0;
      dimensions.push(...inits.dimensions);
      return this;
    },

    resetMargins() {
      const { bounds } = privates;
      const { margins } = bounds;

      margins.length = 0;
      margins.push(...inits.margins);
      return this;
    },

    resetOverflow() {
      privates.currOverflow = 0;
    },

    resetTotalHeight() {
      const { currOverflow } = privates;
      this.totalHeight = currOverflow;
      this.resetOverflow();
    },

    setDimensions(...newDimensions) {
      return this.setStore("dimensions", ...newDimensions);
    },

    setMargins(...newMargins) {
      return this.setStore("margins", ...newMargins);
    },

    setStore(property, ...values) {

      const { bounds } = privates;

      const initStore = inits[property];

      const temp = values.map((val, idx) => {
        return val === null ? initStore[idx] : val;
      });

      const store = bounds[property];
      store.length = 0;
      store.push(...temp);
    }

  });
};

I. Get page bounds

The first subproblem is trivially solved (the sample might be complex, but is handy for passing state around). Of note here are significantWidth and significantHeight getters that return width and height that can be occupied by elements (i.e. without margins).

If you are wondering, why 54 is added to top and bottom margins, it is a "magic number" equal to 1.5 default vertical page margin (36 points) to ensure correct page overflow (I spent hours figuring out why there is an extra space of appx. this size added to the top and bottom pages margins despite HeaderSection and FooterSection being null by default, but there seems to be no ).

/**
 * @typedef {object} Bounds
 * @property {number} bottom bottom page margin
 * @property {number[]} dimensions page constraints
 * @property {number} left left page margin
 * @property {number[]} margins page margins
 * @property {number} right right page margin
 * @property {number} top top page margin
 * @property {number} xMargins horizontal page margins
 * @property {number} yMargins vertical page margins
 * 
 * @summary gets dimensions of pages in body
 * @param {Body} body
 * @returns {Bounds}
 */
function getDimensions(body) {

  const margins = [
    body.getMarginTop() + 54,
    body.getMarginRight(),
    body.getMarginBottom() + 54,
    body.getMarginLeft()
  ];

  const dimensions = [
    body.getPageHeight(),
    body.getPageWidth()
  ];

  return ({
    margins,
    dimensions,
    get top() {
      return this.margins[0];
    },
    get right() {
      return this.margins[1];
    },
    get bottom() {
      return this.margins[2];
    },
    get left() {
      return this.margins[3];
    },
    get xMargins() {
      return this.left + this.right;
    },
    get yMargins() {
      return this.top + this.bottom;
    },
    get height() {
      return this.dimensions[0];
    },
    get width() {
      return this.dimensions[1];
    },
    get significantWidth() {
      return this.width - this.xMargins;
    },
    get significantHeight() {
      return this.height - this.yMargins;
    }
  });
}

II. Traverse the elements

We need to recurisvely walk through all children starting from root (Body) until a leaf is reached (element without children), get their outer heights and heights of their children if any, all the while keeping track of PageBreaks and accumulating height. Each Element that is an immediate child of Body is guaranteed to be split.

Note that PageBreak resets total height counter:

/**
 * @summary executes a callback for element and its children
 * @param {GoogleAppsScript.Document.Element} root
 * @param {Tracker} tracker
 * @param {boolean} [inCell]
 * @returns {number}
 */
function walkElements(root, tracker, inCell = false) {
  const { ElementType } = DocumentApp;

  const type = root.getType();

  if (type === ElementType.PAGE_BREAK) {
    tracker.resetTotalHeight();
    return 0;
  }

  const { callbacks } = tracker;
  const callback = callbacks.get(type);
  const elemResult = callback(root, tracker);

  const isCell = type === ElementType.TABLE_CELL;
  const cellBound = inCell || isCell;

  const childResults = [];
  if (isCell || isContainer(root, type)) {
    const numChildren = root.getNumChildren();

    for (let i = 0; i < numChildren; i++) {
      const child = root.getChild(i);

      const result = walkElements(child, tracker, cellBound);

      childResults.push(result);
    }
  }

  tracker.currElement = root;

  const processed = tracker.processResults(elemResult, ...childResults);

  isTopLevel(root) && (tracker.totalHeight += processed);

  return processed;
}

III. Calculate element heights

In general, full height of an element is top, bottom margins (or either padding or border) + base height. Additionally, as some elements are containers, their base heights equal to the sum of full heights of their children. For that reason we can subdivide the third subproblem into getting:

  1. Heights of primitive types (without children)
  2. Heights of container types

Primitive types

Text height

UPD: It is possible for the getLineSpacing() to return null, so you have to guard against it (default: 1.15)

Text elements consist of chartacters, so to calculate base height one has to:

  1. Get parent's indentation
  2. Get char heigh and width (for simplicity assume it depends on font aspect ratio)
  3. Substract indentation from useful page width (= line width)
  4. For each char, add to line width until overflow and increment number of lines then1
  5. Text height will equal number of lines by char height and apply line spacing modifier

1 Here, traversal of chars is unnecessary, but if you wanted a greater precision, you could map char width modifiers, introduce kerning, etc.

/**
 * @summary calculates Text element height
 * @param {GoogleAppsScript.Document.Text} elem
 * @param {Tracker} tracker
 * @returns {number}
 */
function getTextHeight(elem, tracker) {

  const { significantWidth } = tracker;

  const fontFamily = elem.getFontFamily();
  const charHeight = elem.getFontSize() || 11;
  const charWidth = charHeight * getAspectRatio(fontFamily);

  /** @type {GoogleAppsScript.Document.ListItem|GoogleAppsScript.Document.Paragraph} */
  const parent = elem.getParent();

  const lineSpacing = parent.getLineSpacing() || 1.15;
  const startIndent = parent.getIndentStart();
  const endIndent = parent.getIndentEnd();

  const lineWidth = significantWidth - (startIndent + endIndent);

  const text = elem.getText();

  let adjustedWidth = 0, numLines = 1;
  for (const char of text) {

    adjustedWidth += charWidth;

    const diff = adjustedWidth - lineWidth;

    if (diff > 0) {
      adjustedWidth = diff;
      numLines++;
    }
  }

  return numLines * charHeight * lineSpacing;
}

Container types

Fortunately, our walker handles child elements recursively, so we only need to process specifics of each containter type (processResults method of the tracker will then connect child heights).

Paragraph

Paragraph has two property sets that add to its full height: margins (of which we only need top and bottom - accesible via getAttributes()), and spacing:

/**
 * @summary calcs par height
 * @param {GoogleAppsScript.Document.Paragraph} par
 * @returns {number}
 */
function getParagraphHeight(par) {
  const attrEnum = DocumentApp.Attribute;

  const attributes = par.getAttributes();

  const before = par.getSpacingBefore();
  const after = par.getSpacingAfter();

  const spacing = before + after;

  const marginTop = attributes[attrEnum.MARGIN_TOP] || 0;
  const marginBottom = attributes[attrEnum.MARGIN_BOTTOM] || 0;

  let placeholderHeight = 0;
  if (par.getNumChildren() === 0) {
    const text = par.asText();
    placeholderHeight = (text.getFontSize() || 11) * (par.getLineSpacing() || 1.15);
  }

  return marginTop + marginBottom + spacing + placeholderHeight;
}

Notice the placeholderHeight part - it is necessary as when you append a Table, an empty Paragraph (without Text) is inserted equal to 1 line of default text.

Table cell

TableCell element is a container that acts as body for its childern, thus to calc height of, for example, a Text inside the cell, both dimensions and margins (padding in this context is the same as margin) of bounds are temporarily set to those of the cell (height can be left as is):

/**
 * @summary calcs TableCell height
 * @param {GoogleAppsScript.Document.TableCell} elem
 * @param {Tracker} tracker
 * @returns {number}
 */
function getTableCellHeight(elem, tracker) {

  const top = elem.getPaddingTop();
  const bottom = elem.getPaddingBottom();
  const left = elem.getPaddingLeft();
  const right = elem.getPaddingRight();

  const width = elem.getWidth();

  tracker.setDimensions(null, width);
  tracker.setMargins(top, right, bottom, left);

  return top + bottom;
}

Table row

TableRow does not have any specific properties to count toward full height (and our tracker handles TableCell heights):

/**
 * @summary calcs TableRow height
 * @param {GoogleAppsScript.Document.TableRow} row
 * @returns {number}
 */
function getTableRowHeight(row) {
  return 0;
}

Table

Table merely contains rows and simply adds horizontal border widths to the total (only the top [or bottom] row has 2 borders without colliding, so only number of rows + 1 borders count):

/**
 * @summary calcs Table height
 * @param {GoogleAppsScript.Document.Table} elem
 * @returns {number}
 */
function getTableHeight(elem) {
  const border = elem.getBorderWidth();
  const rows = elem.getNumRows();
  return border * (rows + 1);
}

IV. Determine overflow

The fourth subproblem just connects the previous parts:

/**
 * @summary finds elements spl  it by pages
 * @param {GoogleAppsScript.Document.Document} doc
 * @returns {GoogleAppsScript.Document.Element[]}
 */
function findSplitElements(doc) {

  const body = doc.getBody();

  const bounds = getDimensions(body);

  const TypeEnum = DocumentApp.ElementType;

  const heightMap = new Map()
    .set(TypeEnum.BODY_SECTION, () => 0)
    .set(TypeEnum.PARAGRAPH, getParagraphHeight)
    .set(TypeEnum.TABLE, getTableHeight)
    .set(TypeEnum.TABLE_ROW, getTableRowHeight)
    .set(TypeEnum.TABLE_CELL, getTableCellHeight)
    .set(TypeEnum.TEXT, getTextHeight);

  const tracker = makeTracker(heightMap, bounds);

  walkElements(body, tracker);

  return tracker.splits;
};

Driver function

To test that the whole solution even works, I used this driver program:

function doNTimes(n, callback, ...args) {
  for (let i = 0; i < n; i++) {
    callback(...args);
  }
}

function prepareDoc() {

  const doc = getTestDoc(); //gets Document somehow

  const body = doc.getBody();

  doNTimes(30, () => body.appendParagraph("Redrum Redrum Redrum Redrum".repeat(8)));

  const cells = [
    [1, 2, 0, "A", "test"],
    [3, 4, 0, "B", "test"],
    [5, 6, 0, "C", "test"],
    [7, 8, 0, "D", "test"],
    [9, 10, 0, "E", "test"],
    [11, 12, 0, "F", "test"]
  ];

  body.appendTable(cells);

  doNTimes(8, (c) => body.appendTable(c), cells);

  body.appendPageBreak();

  doNTimes(5, (c) => body.appendTable(c), cells);

  const splits = findSplitElements(doc);

  for (const split of splits) {
    split.setAttributes({
      [DocumentApp.Attribute.BACKGROUND_COLOR]: "#fd9014"
    });
  }

  return doc.getUrl();
}

The driver function will mark each split element with a background colour (you would probably want to append a PageBreak before each of them):

Element split sample

Notes

  1. The answer is likely to overlook something (i.e. if one full row of a Table fits on previous page it won't count as an overflow somehow) and can be improved upon (+will expand with additional classes such as ListItem later), so if anyone knows of better solution for any part of the problem, let's discuss (or fire away and contribute directly).
  2. Watch for UPD sections for refinements during testing.

References

  1. ContainerElement class docs
  2. ElementType enum spec
  3. Paragraph class docs
  4. TableCell class docs
  5. Structure of a document