Gnuplot: How to load and display single numeric value from data file

My data file has this content

# data file for use with gnuplot
# Report 001
# Data as of Tuesday 03-Sep-2013 
total   1976
case1   522 278 146 65  26  7
case2   120 105 15  0   0   0
case3   660 288 202 106 63  1

I am making a histogram from the case... lines using the script below - and that works. My question is: how can I load the grand total value 1976 (next to the word 'total') from the data file and either (a) store it into a variable or (b) use it directly in the title of the plot?

This is my gnuplot script:

reset
set term png truecolor
set terminal pngcairo size 1024,768 enhanced font 'Segoe UI,10'
set output "output.png"
set style fill solid 1.00
set style histogram rowstacked
set style data histograms
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8
plot for [i=3:7] 'mydata.dat' every ::1 using i:xticlabels(1) with histogram \
notitle, '' every ::1 using 0:2:2 \
with labels \
title "My Title"

For the benefit of others trying to label histograms, in my data file, the column after the case label represents the total of the rest of the values on that row. Those total numbers are displayed at the top of each histogram bar. For example for case1, 522 is the total of (278 + 146 + 65 + 26 + 7).

I want to display the grand total somewhere on my chart, say as the second line of the title or in a label. I can get a variable into sprintf into the title, but I have not figured out syntax to load a "cell" value ("cell" meaning row column intersection) into a variable.

Alternatively, if someone can tell me how to use the sum function to total up 522+120+660 (read from the data file, not as constants!) and store that total in a variable, that would obviate the need to have the grand total in the data file, and that would also make me very happy.

Many thanks.


Solution 1:

Lets start with extracting a single cell at (row,col). If it is a single values, you can use the stats command to extract the values. The row and col are specified with every and using, like in a plot command. In your case, to extract the total value, use:

# extract the 'total' cell
stats 'mydata.dat' every ::::0 using 2 nooutput
total = int(STATS_min)

To sum up all values in the second column, use:

stats 'mydata.dat' every ::1 using 2 nooutput
total2 = int(STATS_sum)

And finally, to sum up all values in columns 3:7 in all rows (i.e. the same like the previous command, but without using the saved totals) use:

# sum all values from columns 3:7 from all rows
stats 'mydata.dat' every ::1 using (sum[i=3:7] column(i)) nooutput
total3 = int(STATS_sum)

These commands require gnuplot 4.6 to work.

So, your plotting script could look like the following:

reset
set terminal pngcairo size 1024,768 enhanced
set output "output.png"
set style fill solid 1.00
set style histogram rowstacked
set style data histograms
set xlabel "Case"
set ylabel "Frequency"
set boxwidth 0.8

# extract the 'total' cell
stats 'mydata.dat' every ::::0 using 2 nooutput
total = int(STATS_min)

plot for [i=3:7] 'mydata.dat' every ::1 using i:xtic(1) notitle, \
     '' every ::1 using 0:(s = sum [i=3:7] column(i), s):(sprintf('%d', s)) \
     with labels offset 0,1 title sprintf('total %d', total)

which gives the following output:

enter image description here

Solution 2:

For linux and similar.

If you don't know the row number where your data is located, but you know it is in the n-th column of a row where the value of the m-th column is x, you can define a function

get_data(m,x,n,filename)=system('awk "\$'.m.'==\"'.x.'\"{print \$'.n.'}" '.filename)

and then use it, for example, as

y = get_data(1,"case2",4,"datafile.txt")

using data provided by user424855

print y

should return 15