What Is the Difference Between POS Tagging and Shallow Parsing?

I'm currently taking a Natural Language Processing course at my University and still confused with some basic concept. I get the definition of POS Tagging from the Foundations of Statistical Natural Language Processing book:

Tagging is the task of labeling (or tagging) each word in a sentence with its appropriate part of speech. We decide whether each word is a noun, verb, adjective, or whatever.

But I can't find a definition of Shallow Parsing in the book since it also describe shallow parsing as one of the utilities of POS Tagging. So I began to search the web and found no direct explanation of shallow parsing, but in Wikipedia:

Shallow parsing (also chunking, "light parsing") is an analysis of a sentence which identifies the constituents (noun groups, verbs, verb groups, etc.), but does not specify their internal structure, nor their role in the main sentence.

I frankly don't see the difference, but it may be because of my English or just me not understanding simple basic concept. Can anyone please explain the difference between shallow parsing and POS Tagging? Is shallow parsing often also called Shallow Semantic Parsing?

Thanks before.


POS tagging would give a POS tag to each and every word in the input sentence.

Parsing the sentence (using the stanford pcfg for example) would convert the sentence into a tree whose leaves will hold POS tags (which correspond to words in the sentence), but the rest of the tree would tell you how exactly these these words are joining together to make the overall sentence. For example an adjective and a noun might combine to be a 'Noun Phrase', which might combine with another adjective to form another Noun Phrase (e.g. quick brown fox) (the exact way the pieces combine depends on the parser in question).
You can see how parser output looks like at http://nlp.stanford.edu:8080/parser/index.jsp

A shallow parser or 'chunker' comes somewhere in between these two. A plain POS tagger is really fast but does not give you enough information and a full blown parser is slow and gives you too much. A POS tagger can be thought of as a parser which only returns the bottom-most tier of the parse tree to you. A chunker might be thought of as a parser that returns some other tier of the parse tree to you instead. Sometimes you just need to know that a bunch of words together form a Noun Phrase but don't care about the sub-structure of the tree within those words (i.e. which words are adjectives, determiners, nouns, etc and how do they combine). In such cases you can use a chunker to get exactly the information you need instead of wasting time generating the full parse tree for the sentence.


POS tagging is a process deciding what is the type of every token from a text, e.g. NOUN, VERB, DETERMINER, etc. Token can be word or punctuation.
Meanwhile shallow parsing or chunking is a process dividing a text into syntactically related group.

Pos Tagging output

My/PRP$ dog/NN likes/VBZ his/PRP$ food/NN ./.

Chunking output

[NP My Dog] [VP likes] [NP his food]


The Constraint Grammar framework is illustrative. In its simplest, crudest form, it takes as input POS-tagged text, and adds what you could call Part of Clause tags. For an adjective, for example, it could add @NN> to indicate that it is part of an NP whose head word is to the right.


In POS_tagger, we tag words using a "tagset" like {noun, verb, adj, adv, prob...} while shallow parser try to define sub-components such as Name Entity and phrases in the sentence like "I'm currently (taking a Natural (Language Processing course) at (my University)) and (still confused with some basic concept.)"


D. Jurafsky and J. H. Martin say in their book, that shallow parse (partial parse) is a parse that doesn't extract all the possible information from the sentence, but just extract valuable in the specific case information.

Chunking is just a one of the approaches to shallow parsing. As it was mentioned, it extracts only information about basic non-recursive phrases (e.g. verb phrases or noun phrases).

Other approaches, for example, produce flatted parse trees. These trees may contain information about part-of-speech tags, but defer decisions that may require semantic or contextual factors, such as PP attachments, coordination ambiguities, and nominal compound analyses.

So, shallow parse is the parse that produce a partial parse tree. Chunking is an example of such parsing.