QR reading problem when using a Base64 UTF-8 Arabic string

I have a tagged Arabic string and I want to encode this string by using Base64 encoding. Everything runs perfect when using English letters for this string, but when using the Arabic letters, the QR reader doesn't display the correct letters.

Here is my code :

function TForm1.GetMyString(TagNo: Integer; TagValue: string): string;
var
  Bytes, StrByte: TBytes;
  i: Integer;
begin
  SetLength(StrByte, Length(TagValue)+2);
  StrByte[0] := Byte(TagNo);
  StrByte[1] := Byte(Length(TagValue));
  for i := 2 to Length(StrByte)-1 do
    StrByte[i] := Byte(TagValue[i-1]);
  Result := TEncoding.UTF8.GetString(StrByte);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  s: String;
  Bytes: TBytes;
begin
  s := GetMyString(1, Edit1.Text) + GetMyString(2, Edit2.Text) +
              GetMyString(3, Edit3.Text) + GetMyString(4, Edit4.Text) +
              GetMyString(5, Edit5.Text);
  bytes := TEncoding.UTF8.GetBytes(s);
  QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Bytes);
end;

After decoding the Base64 string, it also shows the same QR reading result eg. (E$33) 'D9E1'F) instead of (مؤسسة العمران)

I am using ZXingQR to read the generated string.


Solution 1:

GetMyString() is truncating a series of UTF-16 characters into an array of 8bit bytes, as well as putting other non-textual bytes into the array, and then treating the whole array as if it were UTF-8 (which it is not) to produce a new UTF-16 string.

And then Button1Click() is taking those jacked-up UTF-16 strings, concatenating them together, and converting the result to UTF-8 for encoding to base64.

This approach will only work with ASCII strings whose lengths are less than 128 characters, and tags that are below 128 in value, since ASCII bytes in the range 0..127 is a subset of UTF-8. This will NOT work with non-ASCII characters/bytes outside of this range.

It seems that you want to base64 encode a series of tagged UTF-8 strings. If so, then try something more like this instead:

procedure TForm1.GetMyString(TagNo: UInt8; const TagValue: string; Output: TStream);
var
  Bytes: TBytes;
begin
  Bytes := TEncoding.UTF8.GetBytes(TagValue);
  Assert(Length(Bytes) < 256);
  Output.WriteData(TagNo);
  Output.WriteData(UInt8(Length(Bytes)));
  Output.WriteData(Bytes, Length(Bytes));
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  Stream: TMemoryStream;
begin
  Stream := TMemoryStream.Create;
  try
    GetMyString(1, Edit1.Text, Stream);
    GetMyString(2, Edit2.Text, Stream);
    GetMyString(3, Edit3.Text, Stream);
    GetMyString(4, Edit4.Text, Stream);
    GetMyString(5, Edit5.Text, Stream);
    QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Stream.Memory, Stream.Size);
  finally
    Stream.Free;
  end;
end;

Alternatively:

function TForm1.GetMyString(TagNo: UInt8; const TagValue: string): TBytes;
var
  Len: Integer;
begin
  Len := TEncoding.UTF8.GetByteCount(TagValue);
  Assert(Len < 256);
  SetLength(Result, 2+Len);
  Result[0] := Byte(TagNo);
  Result[1] := Byte(Len);
  TEncoding.UTF8.GetBytes(TagValue, 1, Length(TagValue), Result, 2);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  Bytes: TBytes;
begin
  Bytes := Concat(
    GetMyString(1, Edit1.Text),
    GetMyString(2, Edit2.Text),
    GetMyString(3, Edit3.Text),
    GetMyString(4, Edit4.Text),
    GetMyString(5, Edit5.Text)
 );
 QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Bytes);
end;