Extracting information from a scanned GS1-type barcode

There are two processes involved in obtaining the information represented by a GS1-type barcode that stores data in GS1 Application Identifier Standard Format.

  1. Extraction of the data fields (referred to as Application Identifiers) contained within the GS1-structured data obtained by scanning the symbol. This always includes a unique identifier for the item called a GTIN-14 and may include supplementary information such as an expiry date, LOT number, etc. This process can be performed by a standalone application.
  2. Lookup of the extracted GTIN in a database, either local to your application or via some public API, to provide a textual representation of the country of origin, manufacturer and possibly the item description. To perform this process comprehensively an application requires access to external resources.

Background: GS1 Application Identifier Standard Format Composition

GS1-formatted data consists of a concatenated list of Application Identifiers (AIs) and values, beginning with AI (01) which represents the GTIN.

For example, the data "(01) 95012345678903 (10) 000123 (17) 150801" represents the following information:

GTIN:             95012345678903
BATCH/LOT:        000123
USE BY OR EXPIRY: 1st August 2015

Section 3: GS1 Application Identifier Definitions of the GS1 General Specifications provides the meaning of each of the Application Identifiers and importantly also states whether the AI values are by definition variable-length or fixed-length in which case the mandatory length is provided.

GS1 barcodes use a special non-data character (FNC1) both to indicate that the data conforms to GS1 Application Identifier standard format and to delimit the end of a variable-length data field from the next AI. For example, the above data could be encoded in a Code 128 symbol as {FNC1}019501234567890310000123{FNC1}17150801 to produce the following GS1-128 symbol:

GS1-128: (01)95012345678903(10)000123(17)150801

When this symbol is read by a barcode scanner it is decoded as follows[†]:

019501234567890310000123{GS}17150801

Note that the initial FNC1 non-data character has been discarded and the FNC1 used in the variable-length AI separator role has been represented by a GS character (ASCII value 29).

Extraction (and optionally validation)

Extraction of the GTIN and any supplementary information can be performed directly by your application.

To extract the original Application Identifier data from the decoded GS1 symbol data from a barcode scanner requires that your application contains a data structure that we shall refer to as AI-TABLE mapping AI patterns to the length of their values as derived from the data provided in the section of the GS1 General Specifications linked to above:

AI     | N (value length)
-------------------------
(00)   | 18
(01)   | 14
(10)   | variable
(17)   | 6
(240)  | variable
(310n) | 6
(37)   | variable
...

With this available you can proceed with AI-value extraction from the scanned barcode data as follows:

while more data:
    AI,N = Entry from AI-TABLE matching a prefix of the data, otherwise FAIL.

    if N is fixed-length:
        VALUE = next N characters
    else N is variable length:
        VALUE = characters until "GS" or end of data

    emit: (AI) VALUE

In practise you may choose to include more of the data from the General Specifications in your AI-TABLE to permit your application to perform enhanced validation of each VALUE's type and length. However the above is sufficient to extract the given data, such as AI (17) representing the expiry date which you are looking for.

Lookup

To obtain the remaining data that you are interested in (which is not directly encoded in the barcode) such as the item's name and manufacturer details requires that you look up the extracted GTIN using external resources such as a local product database or one of the public UPC database APIs that are available.

The GTIN itself contains a country of origin (actually it represents the national GS1 Member Organisation with which the manufacturer is registered, so not quite country of origin), manufacturer identifier – together these are referred to as the GS1 Prefix, are variable-length and are assigned by GS1 – and the remainder of the digits represent the product code which is assigned freely by the manufacturer.

Given a GTIN, some UPC databases will provide only details relating to the GS1 Prefix such as a textual representation of the GS1 Member Organisation and the manufacturer. Others attempt to maintain a record of individual GTIN assignments to common items, however this data will always be somewhat incomplete and out of date as there is no mandatory registry of real time GTIN assignments.

The answers to this question provide some examples of free product information platforms.

[†] In fact you might see ]C1019501234567890310000123{GS}17150801 in which case the leading symbology identifier for GS1-128 ]C1 can be discarded.


This is a solution written in Javascript proven in a specific customer, generalization requires more work:

//define AI's, parameter name and, optionally, transformation functions
SapApplicationIdentifiers= [
    { ai: '00', regex: /^00(\d{18})/,         parameter: 'SSCC'},
    { ai: '01', regex: /^01(\d{14})/,         parameter: 'EAN'},
    { ai: '02', regex: /^02(\d{14})/,         parameter: 'EAN'},
    { ai: '10', regex: /^10([^\u001D]{1,20})/,  parameter: 'LOTE'},
    { ai: '13', regex: /^13(\d{6})/},
    { ai: '15', regex: /^15(\d{6})/,        parameter: 'F_CONS', transform: function(match){ return '20'+match[1].substr(0,2)+'-'+match[1].substr(2,2)+'-'+match[1].substr(4,2);}},
    { ai: '17', regex: /^17(\d{6})/,        parameter: 'F_CONS', transform: function(match){ return '20'+match[1].substr(0,2)+'-'+match[1].substr(2,2)+'-'+match[1].substr(4,2);}},
    { ai: '19', regex: /^19(\d{6})/,        parameter: 'F_CONS', transform: function(match){ return '20'+match[1].substr(0,2)+'-'+match[1].substr(2,2)+'-'+match[1].substr(4,2);}},
    { ai: '21', regex: /^21([\d\w]{1,20})/},                       //numero de serie
    { ai: '30', regex: /^30(\d{1,8})/},
    { ai: '310', regex: /^310(\d)(\d{6})/, parameter: 'NTGEW', transform: function(match){ return parseInt( match[2] ) / Math.pow( 10,parseInt( match[1] ) )}},
    { ai: '320', regex: /^320(\d)(\d{6})/, parameter: 'NTGEW', transform: function(match){ return parseInt( match[2] ) / Math.pow( 10,parseInt( match[1] ) )}},
    { ai: '330', regex: /^330(\d)(\d{6})/},
    { ai: '37', regex: /^37(\d{1,8})/,        parameter: 'CANT'}
  ];

 //walks through the code, removing recognized fields
 function parseAiByAi(code, mercancia, onError ){
    var match;

    if(!code)
      return;

    SapApplicationIdentifiers.forEach(function(AI){
      if(code.indexOf(AI.ai)==0 && AI.regex.test(code)){
        match= AI.regex.exec( code );
        if(AI.parameter){
          if(angular.isFunction(AI.transform)){
            mercancia[AI.parameter] = AI.transform(match);
          }else
            mercancia[AI.parameter]= match[1];
          if(AI.parameter=="NTGEW"){
            mercancia.NTGEW_IA= AI.ai;
          }
        }

        code= code.replace(match[0],'').replace(/^[\0\u001D]/,'');
        parseAiByAi(code, mercancia, onError);
      }

    });
  }

  parseAiByAi(code, mercancia, onError);