QR reading problem when using a Base64 UTF-8 Arabic string
I have a tagged Arabic string and I want to encode this string by using Base64 encoding. Everything runs perfect when using English letters for this string, but when using the Arabic letters, the QR reader doesn't display the correct letters.
Here is my code :
function TForm1.GetMyString(TagNo: Integer; TagValue: string): string;
var
Bytes, StrByte: TBytes;
i: Integer;
begin
SetLength(StrByte, Length(TagValue)+2);
StrByte[0] := Byte(TagNo);
StrByte[1] := Byte(Length(TagValue));
for i := 2 to Length(StrByte)-1 do
StrByte[i] := Byte(TagValue[i-1]);
Result := TEncoding.UTF8.GetString(StrByte);
end;
procedure TForm1.Button1Click(Sender: TObject);
var
s: String;
Bytes: TBytes;
begin
s := GetMyString(1, Edit1.Text) + GetMyString(2, Edit2.Text) +
GetMyString(3, Edit3.Text) + GetMyString(4, Edit4.Text) +
GetMyString(5, Edit5.Text);
bytes := TEncoding.UTF8.GetBytes(s);
QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Bytes);
end;
After decoding the Base64 string, it also shows the same QR reading result
eg. (E$33) 'D9E1'F)
instead of (مؤسسة العمران)
I am using ZXingQR to read the generated string.
Solution 1:
GetMyString()
is truncating a series of UTF-16 characters into an array of 8bit bytes, as well as putting other non-textual bytes into the array, and then treating the whole array as if it were UTF-8 (which it is not) to produce a new UTF-16 string.
And then Button1Click()
is taking those jacked-up UTF-16 strings, concatenating them together, and converting the result to UTF-8 for encoding to base64.
This approach will only work with ASCII strings whose lengths are less than 128 characters, and tags that are below 128 in value, since ASCII bytes in the range 0..127 is a subset of UTF-8. This will NOT work with non-ASCII characters/bytes outside of this range.
It seems that you want to base64 encode a series of tagged UTF-8 strings. If so, then try something more like this instead:
procedure TForm1.GetMyString(TagNo: UInt8; const TagValue: string; Output: TStream);
var
Bytes: TBytes;
begin
Bytes := TEncoding.UTF8.GetBytes(TagValue);
Assert(Length(Bytes) < 256);
Output.WriteData(TagNo);
Output.WriteData(UInt8(Length(Bytes)));
Output.WriteData(Bytes, Length(Bytes));
end;
procedure TForm1.Button1Click(Sender: TObject);
var
Stream: TMemoryStream;
begin
Stream := TMemoryStream.Create;
try
GetMyString(1, Edit1.Text, Stream);
GetMyString(2, Edit2.Text, Stream);
GetMyString(3, Edit3.Text, Stream);
GetMyString(4, Edit4.Text, Stream);
GetMyString(5, Edit5.Text, Stream);
QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Stream.Memory, Stream.Size);
finally
Stream.Free;
end;
end;
Alternatively:
function TForm1.GetMyString(TagNo: UInt8; const TagValue: string): TBytes;
var
Len: Integer;
begin
Len := TEncoding.UTF8.GetByteCount(TagValue);
Assert(Len < 256);
SetLength(Result, 2+Len);
Result[0] := Byte(TagNo);
Result[1] := Byte(Len);
TEncoding.UTF8.GetBytes(TagValue, 1, Length(TagValue), Result, 2);
end;
procedure TForm1.Button1Click(Sender: TObject);
var
Bytes: TBytes;
begin
Bytes := Concat(
GetMyString(1, Edit1.Text),
GetMyString(2, Edit2.Text),
GetMyString(3, Edit3.Text),
GetMyString(4, Edit4.Text),
GetMyString(5, Edit5.Text)
);
QREdit.Text := TNetEncoding.Base64.EncodeBytesToString(Bytes);
end;