How to deserialize Json string in Java/Scala with inherit class holding different attributes

I am trying to deserialize following string in scala

{ "nodeid": 30, "depth": 6, "split": 65, "split_condition": -9.53674316e-07, "yes": 43, "no": 44, "missing": 43 , "children": [
  { "nodeid": 43, "leaf": 0.000833201397 },
  { "nodeid": 44, "leaf": -0.00145038881 }
]}

it is a tree structure and we can consider there are non-leaf and leaf nodes, and these two both extends an abstract node type. They hold different attributes. (This string is dumped from XGBoost and I want to create my own data structure by deserializing it)

I tried to use Jackson, but it takes only one class type. If I define a class holding all these attributes, it can surely work, but I could not dumped back into the same format afterwards.

So except for custom override deserialize, is there any other better choices?

If could not get the question, could refer to my following example

I want the string to be deserialized into a Node class with following definition

class Node {
  var nodeid: Int = 0
}
class NotLeaf extends Node {
  var depth: Int = 0
  var split: Int = 0
  ...
  var children: List[Node] = null
}
class Leaf extends Node {
  var leaf: Float = 0
}

For code I have tried, I could define the class

class Node {
  var nodeid: Int = 0
  var depth: Int = 0
  var split: Int = 0
  var split_condition: Float = 0
  var yes: Int = 0
  var no: Int = 0
  var missing: Int = 0
  var children: List[XGBoostFormat] = null
  var leaf: Float = 0
}

And the deserialize result would have all the attributes, and if I serialize back it would be

{
  "nodeid" : 30,
  "depth" : 6,
  "split" : 65,
  "split_condition" : -9.536743E-7,
  "yes" : 43,
  "no" : 44,
  "missing" : 43,
  "children" : [ {
    "nodeid" : "43",
    "depth" : 0,
    "split" : 0,
    "split_condition" : 0.0,
    "yes" : 0,
    "no" : 0,
    "missing" : 0,
    "children" : null,
    "leaf" : 8.332014E-4
  }, {
    "nodeid" : "44",
    "depth" : 0,
    "split" : 0,
    "split_condition" : 0.0,
    "yes" : 0,
    "no" : 0,
    "missing" : 0,
    "children" : null,
    "leaf" : -0.0014503888
  } ],
  "leaf" : 0.0
}

In Scala it is better to use Scala JSON libraries based on compile time reflection rather than runtime reflection. They can autogenerate codecs - as long as you stick to the way Scala models data with sealed traits (or sealed abstract classes) and case classes. You also shouldn't use vars and nulls.

Libraries which would parse your data are (among others): Circe, Play JSON, Jsoniter Scala, Json4s. The first 2 of these could be used like this:

import cats.syntax.functor._ // for Decoder widen
import io.circe.Decoder
import io.circe.generic.extras.Configuration
import io.circe.generic.extras.semiauto._
import io.circe.parser

import play.api.libs.json._

sealed trait Tree extends Product with Serializable {
  val nodeid: Int
}
object Tree {
  
  // Circe config for derivation
  private implicit val circeConfig: Configuration =
    Configuration.default.withSnakeCaseMemberNames
  
  // PlayJson config for derivation
  private implicit val playJsonConfig: JsonConfiguration =
    JsonConfiguration(JsonNaming.SnakeCase)
  
  final case class Node(
    nodeid: Int,
    depth: Int,
    split: Int,
    splitCondition: Float,
    yes: Int,
    no: Int,
    children: List[Tree]
  ) extends Tree
  object Node {
    
    implicit def nodeCirceDecoder: Decoder[Node] =
      deriveConfiguredDecoder
    
    implicit def nodePlayJsonDecoder: Reads[Node] =
      Json.reads[Node]
  }
  
  final case class Leaf(
    nodeid: Int,
    leaf: Float
  ) extends Tree
  object Leaf {
    
    implicit val leafCirceDecoder: Decoder[Leaf] =
      deriveConfiguredDecoder
    
    implicit val leafPlayJsonDecoder: Reads[Leaf] =
      Json.reads[Leaf]
  }

  // combined manually because no discriminator field
  
  implicit val treeCirceDecoder: Decoder[Tree] =
    Decoder[Leaf].widen[Tree] or Decoder[Node].widen[Tree]
  
  // normally I'd use orElse on Reads but it's not by-name (lazy)
  implicit val treePlayJsonDecoder: Reads[Tree] = Reads[Tree] { json =>
    implicitly[Reads[Leaf]].widen[Tree].reads(json) orElse implicitly[Reads[Node]].widen[Tree].reads(json)
  }
}

val input = """{ "nodeid": 30, "depth": 6, "split": 65, "split_condition": -9.53674316e-07, "yes": 43, "no": 44, "missing": 43 , "children": [
              |  { "nodeid": 43, "leaf": 0.000833201397 },
              |  { "nodeid": 44, "leaf": -0.00145038881 }
              |]}""".stripMargin

io.circe.parser.decode[Tree](input)

Json.fromJson[Tree](Json.parse(input))

(scastie example)

  • use macros to derive codecs (your case is specific because you have recursive data structure with subtypes and no discrimination fields - with discriminator fields there would be no need to combine codecs manually)
  • put them into companion objects so that you wouldn't have to import them into the scope
  • run parsing code which returns Either[SomeError, YourType]

(Example uses both libraries to demonstrate how to use them but you need to pick only one).

Since objects are build at once or at all, there is no reason to make their field mutable and initialized with nulls.

I would also suggest to use Doubles, or even better BigDecimals, instead of Floats if you are using such precise values.

If you want to do it the Java way, with nulls, vars, annotations and runtime reflection (Jackson), then it's better to ask this question with Java tag as Scala developers actively avoid these.