Creating multimeshes with huge instance count during runtime
Solution 1:
This is something I wish I knew more, but I'll share what I know.
First of all, consider having more chunks visible instead of chunks with more voxels.
As you know, generating a chunk takes longer the more voxels it has. However, having a chunk that is only one voxel defeats the purpose. Thus, there must an optimal number of voxels per chunk. And you are probably going over it, you may consider leave it configurable. A cube of 256 voxels per side, is 16777216 voxels. Try other sizes, 64 voxels per side worked better for me when I have attempted to do something like this.
I have to suggest against having an instance per cube. If you want to use meshes to represent your voxels. I suggest to do meshing. Each chunk would be a single mesh.
Voxel meshing is often intertwine with marching cubes. And I feel I have to mention it, even if you have heard it a million times. I doubt you want the kind of meshes marching generate, as they don't look like cubes. However, marching cubes is a way to mesh voxels. You can apply the same approach, except, you know, generate a mesh that looks like cubes. The idea is simple, you iterate over the voxels, and you construct a mesh from the voxel data, without any internal surface.
By the way, an alternative is ray casting. You can upload the voxel data as a 3D texture to the GPU and have a shader implement the "Fast Voxel Traversal Algorithm" or similar. It can be implemented as a material, and then your chunks are simple cubes. I have done this with some success.
Addendum: an optimization is to store on every free voxel the distance to the nearest ocupied voxel in the same chunk. Then when traversing you know you can skip that many voxels ahead, regardless of the direction of traversal, which makes ray casting faster.
You can optimize further by storing the distance to the nearest occupied voxel on the same chunk for each of the six frustums that correspond to the faces of the voxel cube. Then for traversal you check on which frustum the ray would be, and that tells you which value to use. However, this begins to slow down the generation process, so you may have to readjust the chunk size.
I know you can find more information on the sister site gamedev.stackexchange.com. Try searching "voxel meshing" also marching cubes is a tag there!
Another thing you can find about there is handling level of details for voxels. You probably don't really want all that detail for chunks that are far away. So you could generate a less detailed, but faster to compute, representation.
Now, since the question is about Godot, I will also mention that Godot has an Asset Library. Not an asset store, it is not a store front, everything there is free. Free as in beer, and free as in speech. And you can find voxel solutions on the Godot Asset Library. Which, even if you are not going to use them, you can look how they work. Because, did I mention? they are all free software.
I haven't gone deep in them, but the 3D Voxel Demo claims to use another thread to create the meshes. Furthermore, they recommend godot_voxel which is C++. In fact, it is a Godot module for voxels. Why not start there?
Addendum: did you try GridMaps?