How recreate a hash digest of a multihash in IPFS

Assuming I'm adding data to IPFS like this:

$ echo Hello World | ipfs add

This will give me QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u - a CID which is a Base58 encoded Multihash.

Converting it to Base16, tells me that the hash digest for what IPFS has added is a SHA2-256 hash:

12 - 20 - 74410577111096cd817a3faed78630f2245636beded412d3b212a2e09ba593ca
<hash-type> - <hash-length> - <hash-digest>

I know that IPFS doesn't just hash the data, but actually serializes it as Unixfs protobuf first and then puts that in a dag.

I'd like to demystify, how to get to the 74410577111096cd817a3faed78630f2245636beded412d3b212a2e09ba593ca but I'm not really sure how to get hold of the created dag that holds the Unixfs protobuf with the data.

For example I can write the serialized raw data to disk and inspect it with a protobuf decoder:

$ ipfs block get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u > /tmp/block.raw
$ protoc --decode_raw < /tmp/block.raw

This will give me the serialized data in a readable format:

1 {
  1: 2
  2: "Hello World\n"
  3: 12
}

However, piping that through SHA-256 still gives me a different hash, which makes sense because IPFS puts the protobuf in a dag and multihashes that one.

$ protoc --decode_raw < /tmp/block.raw | shasum -a 256

So I decided to figure out how to get hold of that dag node, hash it myself to get to the hash I'm looking for.

I was hoping using ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u will give me a multihash that can then be decoded, but it turns out it returns some other data hash that I don't know how to inspect:

$ ipfs dag get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u
$ {"data":"CAISDEhlbGxvIFdvcmxkChgM","links":[]}

Any ideas on how to decode data from here?

UPDATE

data is a Base64 representation of the original data: https://github.com/ipfs/go-ipfs/issues/4115


Solution 1:

The hash you're looking for is the hash of the output of ipfs block get QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u. IPFS hashes the encoded value.

Instead of running:

protoc --decode_raw < /tmp/block.raw | shasum -a 256

Just run:

shasum -a 256 < /tmp/block.raw

but it turns out it returns some other data hash that I don't know how to inspect

That's because we currently use a protobuf inside of a protobuf. The outer protobuf has the structure {Data: DATA, Links: [{Name: ..., Size: ..., Hash: ...}]}.

In:

1 {
  1: 2
  2: "Hello World\n"
  3: 12
}

The 1 { ... } part is the Data field of the outer protobuf. However, protoc --decode_raw *recursively* decodes this object so it decodes theData` field to:

  • Field 1 (DataType): 2 (File)
  • Field 2 (Data): "Hello World\n"
  • Field 3 (Filesize): 12 (bytes)

For context, the relevant protobuf definitions are:

Outer:

// An IPFS MerkleDAG Link
message PBLink {

  // multihash of the target object
  optional bytes Hash = 1;

  // utf string name. should be unique per object
  optional string Name = 2;

  // cumulative size of target object
  optional uint64 Tsize = 3;
}

// An IPFS MerkleDAG Node
message PBNode {

  // refs to other objects
  repeated PBLink Links = 2;

  // opaque user data
  optional bytes Data = 1;
}

Inner:

message Data {
    enum DataType {
        Raw = 0;
        Directory = 1;
        File = 2;
        Metadata = 3;
        Symlink = 4;
        HAMTShard = 5;
    }

    required DataType Type = 1;
    optional bytes Data = 2;
    optional uint64 filesize = 3;
    repeated uint64 blocksizes = 4;

    optional uint64 hashType = 5;
    optional uint64 fanout = 6;
}

message Metadata {
    optional string MimeType = 1;
}

Solution 2:

I'm not sure what that encoding is but you can unmarshal the dag data field like this in js-ipfs:

const IPFS = require('ipfs')
const Unixfs = require('ipfs-unixfs')

const ipfs = new IPFS

ipfs.dag.get('QmWATWQ7fVPP2EFGu71UkfnqhYXDYH566qy47CnJDgvs8u', (err, d) => {
  console.log(Unixfs.unmarshal(d.value.data).data.toString()))
  // prints Hello World
})

Solution 3:

According to Steven's answer, using protobuf is working. Here is the full code of my approach.

ipfs.proto

syntax = "proto3";

message PBNode {
bytes Data = 1;
}

message PBLink {
bytes Hash = 1;
string Name = 2;
uint64 Tsize = 3;
}

message Data {
enum DataType {
    Raw = 0;
    Directory = 1;
    File = 2;
    Metadata = 3;
    Symlink = 4;
    HAMTShard = 5;
}
DataType Type = 1;
bytes Data = 2;
}

cid.js

const mh = require('multihashes');
const axios = require('axios');
const crypto = require('crypto');
const protobuf = require("protobufjs");
const IPFS = protobuf.loadSync('./ipfs.proto').lookupType('PBNode');

class CID {
/**
* convert IPFS multihash to sha2-256 hash string
* @param {string} multihash
* @param {boolean} prefix
* @returns {string} sha2-256 hash string starting with 0x
*/
static toHash(multihash, prefix = false) {
    return prefix ? '0x' : ''
    + mh.decode(mh.fromB58String(multihash)).digest.toString('hex')
}

/**
* convert sha2-256 hash string to IPFS multihash
* @param {string} str
* @returns {string} IPFS multihash starting with Qm
*/
static fromHash(str) {
    str = str.startsWith('0x') ? str.slice(2) : str;
    return mh.toB58String(mh.encode(Buffer.from(str, 'hex'), 'sha2-256'))
}

/**
* hash the buffer and get the SHA256 result compatible with IPFS multihash
* @param {Buffer} buf
* @returns {string}
*/
static hash(buf) {
    const r = IPFS.encode({
    Data: {
        Type: 2,
        Data: buf,
        filesize: buf.length
    }

    }).finish();
    return crypto.createHash('sha256').update(r).digest('hex');
}
}

async function ipfsGet(cid) {
const x = await axios.get(`http://your.address.xxx/ipfs/${cid}`, {
    responseType: 'arraybuffer'
});
return Buffer.from(x.data);
}

const r = "QmfQj4DUWEudeFdWKVzPaTbYimdYzsp14DZX1VLV1BbtdN";
const hashFromCID = CID.toHash(r);
console.log(hashFromCID);
ipfsGet(r).then(buf => {
const hashCalculated = CID.hash(buf);
console.log(hashCalculated);
console.log(hashCalculated === hashFromCID);
console.log(CID.fromHash(hashCalculated) === r)
});

module.exports = CID;