Convert multiple columns in pyspark dataframe into one dictionary

I have created a PySpark DataFrame like this one: '''

df = spark.createDataFrame([
    ('v', 3, 'a'),
    ('d', 2, 'b'),
    ('q', 9, 'c')],
    ["c1", "c2", "c3"]
)

df.show()

c1  | c2 | c3
v   |  3 | a
d   |  2 | b
q   |  9 | c

I want to create a new column like this:

+--------------------------+
|           c4             |
+--------------------------+
|{"c1":"v","c2":3,"c3":"a"}|
|{"c1":"d","c2":2,"c3":"b"}|
|{"c1":"q","c2":9,"c3":"c"}|
+--------------------------+

I want c4 to be in type of MapType, not StringType Also, I want to keep the type of values as it is. (keep 3,2 and 9 as integers, not String)

Solution 1:

Use struct + to_json like this if you want to get JSON strings:

import pyspark.sql.functions as F

df1 = df.select(
    F.to_json(
        F.struct(*[F.col(c) for c in df.columns])
    ).alias("c4")
)

df1.show(truncate=False)
#+--------------------------+
#|c4                        |
#+--------------------------+
#|{"c1":"v","c2":3,"c3":"a"}|
#|{"c1":"d","c2":2,"c3":"b"}|
#|{"c1":"q","c2":9,"c3":"c"}|
#+--------------------------+

EDIT

If you want a MapType column use create_map function:

from itertools import chain

df1 = df.select(
    F.create_map(
        *list(chain(*[[F.lit(c), F.col(c)] for c in df.columns]))
    ).alias("c4")
)

#+---------------------------+
#|c4                         |
#+---------------------------+
#|{c1 -> v, c2 -> 3, c3 -> a}|
#|{c1 -> d, c2 -> 2, c3 -> b}|
#|{c1 -> q, c2 -> 9, c3 -> c}|
#+---------------------------+

how `if` chain comparing integers against a value can be rewritten with `match`?

Change the JSON deserialization/serialization policy for single ASP.NET Core controller

(GCP) ERROR: (gcloud.builds.submit) INVALID_ARGUMENT: could not resolve source: googleapi: Error 403: [email protected]

Is there a tool for labeling/annotating images using custom weights?

ASP.NET Core 5 MVC not able to model bind DateTime property of my DTO ? Is there any known problem with ASP.NET core model binding process?

What is the syntax for HEREDOC string in Java? [duplicate]

updating parent state in a table of inputs

java.lang.Exception: Insufficient roles/credentials for operation

Trying to set a column as email sent so I dont keep resending emails again and again

Mongoose Schema.update doesn't update boolean

Why the following python code outputs blank contour plot?

How do I crash the App Pool?