How to connect Pyspark with Teradata? [duplicate]

apache-spark

I am trying to connect to teradata from pyspark. Please can you tell me how can I do that? I tried looking online but can't find anything.

I have the following jars :

tdgssconfig-15.10.00.14.jar teradata-connector-1.4.1.jar

You need the jdbc connector JAR(you can get it for the teradata version that you are connecting to. And then use something like this :

sc.addJar("yourDriver.jar")

val jdbcDF = sqlContext.load("jdbc", Map(
  "url" -> "jdbc:teradata://<server_name>, TMODE=TERA, user=my_user, password=*****",
  "dbtable" -> "schema.table_name",
  "driver" -> "com.teradata.jdbc.TeraDriver"))

Step 1: Find the appropriate jdbc driver for the version of teradata you are using: https://downloads.teradata.com/download/connectivity/jdbc-driver

Step 2: Go through the tutorial here

https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

Sample code:

val jdbcDF = spark.read
  .format("jdbc")
  .option("url", "<jdbc_connection_string>")
  .option("dbtable", "schema.tablename")
  .option("user", "username")
  .option("password", "password")
  .load()

How to connect Pyspark with Teradata? [duplicate]

Related

Recent Posts