How to look for updated rows when using AWS Glue?

Solution 1:

You can use the query to find the updated records by filtering data at source JDBC database as shown below example. I have passed date as an argument so that for each run I can fetch only latest values from mysql database in this example.

query= "(select ab.id,ab.name,ab.date1,bb.tStartDate from test.test12 ab join test.test34 bb on ab.id=bb.id where ab.date1>'" + args['start_date'] + "') as testresult"

datasource0 = spark.read.format("jdbc").option("url", "jdbc:mysql://host.test.us-east-2.rds.amazonaws.com:3306/test").option("driver", "com.mysql.jdbc.Driver").option("dbtable", query).option("user", "test").option("password", "Password1234").load()