In an academic writing, is it correct to make reference to "the data itself", being that data is a plural noun and itself is a singular pronoun?

Welcome to EL&U. I would not be surprised if by doing a quick search in the search box included on this page, you would find several questions which are similar to yours.

Nevertheless, in academic writing the jury is still out on the "data is or data are" controversy. Some academics insist that data is a plural, countable noun. Other academics--probably younger academics, I suspect--treat the word as a singular noun that includes not one datum but many, all lumped together into a monolithic structure: "The data points unequivocally to X being the answer."

If you are in academia and are writing a paper for academics, you might simply ask a teacher (or two or three) in your field whether he or she considers data to be singular or plural. That should settle the question for you.

As for me, an older academic, I still use datum as a singular noun with a singular verb, and data as a plural noun with a plural verb. I suspect, however, that my preference is quickly becoming--if it isn't already--a thing of the past.

Essentially, this comes down to "It's plural if you want it to be." I never liked that answer, either.

However, people really like arguing about this.

Etymologically, data comes from Latin. This is well-known. Unfortunately, in Latin, its plurality was defined by devices that exist in English only in a far lesser capacity: gender and noun case.

In the Latin nominative case, data could be either the neuter plural or the feminine singular of datum. The neuter singular was datum, the masculine singular datus, the feminine plural datae, and the masculine plural dati.

Use of data as a plural in English (the earlier form) comes from a suggestion that we should incorporate the words closest in Latin meaning to how they will be used in English: the neuter singular datum and the neuter plural data.

However, data could also function as the feminine singular in Latin, which I conjecture led to its commonplace use as a mass noun in English.

I enjoy using these words as they were used in Latin: in a survey of male students, I might say "After the dati were collected, each outlying datus was removed." In a survey of female students, I might say "After the datae were collected, each outlying data was removed." In a survey of pineapples, I might say "After the data were collected, each outlying datum was removed."

Most people do not enjoy this. The first two usages are not by any means commonplace (possibly even unattested outside random tangents on the internet), with the third occasionally seen as archaïc but often accepted or even preferred, with data used as plural.

It is more common today, however, to use data as a mass noun; that is, "the data was collected," not "each data was collected." Datum remains typical in the latter case.