How to connect Pyspark with Teradata? [duplicate]
I am trying to connect to teradata from pyspark. Please can you tell me how can I do that? I tried looking online but can't find anything.
I have the following jars :
tdgssconfig-15.10.00.14.jar teradata-connector-1.4.1.jar
You need the jdbc connector JAR(you can get it for the teradata version that you are connecting to. And then use something like this :
sc.addJar("yourDriver.jar")
val jdbcDF = sqlContext.load("jdbc", Map(
"url" -> "jdbc:teradata://<server_name>, TMODE=TERA, user=my_user, password=*****",
"dbtable" -> "schema.table_name",
"driver" -> "com.teradata.jdbc.TeraDriver"))
Step 1: Find the appropriate jdbc driver for the version of teradata you are using: https://downloads.teradata.com/download/connectivity/jdbc-driver
Step 2: Go through the tutorial here
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
Sample code:
val jdbcDF = spark.read
.format("jdbc")
.option("url", "<jdbc_connection_string>")
.option("dbtable", "schema.tablename")
.option("user", "username")
.option("password", "password")
.load()