how to iteratively SELECT by hour for averages by userid?

I'm trying to average some metrics by user id, and by hour, in a select statement.

Would love some help understanding the best approach to doing this without using parameters in a python script :P

this is the current query,

SELECT 
            user_id,
            AVG(sentiment) as sentiment,
            AVG(magnitude) as magnitude,
            SUM(sentiment) as total_sentiment,
            SUM(magnitude) as total_magnitude,
            MAX(sentiment) as max_sentiment,
            MIN(sentiment) as min_sentiment,
            COUNT(user_id) as count
        FROM
            sentiments
        WHERE
            created
            BETWEEN %s and %s
        GROUP BY
            user_id;

Postgres by the way, and sending query with pandas via python for a test :)

update trying answer,

sql = """
        SELECT 
            user_id,
            AVG(sentiment) as sentiment,
            AVG(magnitude) as magnitude,
            SUM(sentiment) as total_sentiment,
            SUM(magnitude) as total_magnitude,
            MAX(sentiment) as max_sentiment,
            MIN(sentiment) as min_sentiment,
            COUNT(user_id) as count
        FROM
            sentiments
        GROUP BY
            user_id,
            date_part('hour', created);

    """

conn = db_conn.main()
cur = conn.cursor()
df = pd.read_sql(sql, con=conn)
cur.close()
df

This returns an error that

': function date_part(unknown, integer) does not exist
LINE 15: date_part('hour', created);
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.

Thanks in advance,


Solution 1:

The easiest way to truncate a timestamp is the DATE_TRUNC function.

SELECT DATE_TRUNC('hour', created), user_id,
... 
GROUP BY DATE_TRUNC('hour', created), user_id;

But if created contains a number that's a unix timestamp, then you first need to convert it to a timestamp.

SELECT DATE_TRUNC('hour', TO_TIMESTAMP(created)), user_id,
... 
GROUP BY DATE_TRUNC('hour', TO_TIMESTAMP(created)), user_id;

Solution 2:

In posgresql you can group by year, month, day and hours using,

group by
 date_part('year', created),
 date_part('month', created),
 date_part('day', created),
 date_part('hour', created)

Or doing some math with epoch

group by floor(date_part('epoch', created_at)/3600)

alternatively you could use extract, e.g. extract(epoch from created)