how to detect a file over internet using ping or similar command?
I have a shell script to download some of my stuff over Internet. How can I know if a file exists over the Internet? Let's say I want to know if http://192.168.1.1/backup/01012011.zip
exists or not? I have try using ping
command, but it shows error, i guess this because /
character.
Can anyone can help me? or is there another way?
You can use the --spider
option of wget, which does not actually download the file, but just checks if it's there. In your example:
wget --spider http://192.168.1.1/backup/01012011.zip
This will either return a message containing 200 OK
if the file is there, or an error, e.g. 404 Not Found
if it's not there, or 403 Forbidden
if you don't have permission to get it.
There certainly is another way - but this requires understanding what actually happens when a request is made over the Internet. When you visit a page in your web browser, data is transferred using a protocol called HTTP (yes, this is why you'll often see http://
at the beginning of URLs).
HTTP is a text-based protocol. Information is exchanged between the client and the server by sending headers followed by the body of the request. The headers contain a lot of status information about the request and the information being transferred. The header that you will be interested in to help you with your problem isn't really a header at all - it's the very first line transferred and contains a number called the status code. This number is 3 digits and conveys status information. If a request was successful, the result is usually 200 (not always - there are exceptions).
One thing is for sure - if the file you have requested does not exist on the web server, the server should reply with a status code of 404. This indicates that the resource could not be found. (For the curious, here is a list of HTTP status codes and their meaning.)
Well, enough theory. Let's see how we can do this on the terminal. A great tool for fetching requests using HTTP that also provides us with the ability to examine the status code is cURL, which is available in the Ubuntu repos. You can install it with:
sudo apt-get install curl
Once you have it installed, you can invoke it like so:
curl [website]
...and the content of the given URL will be printed to the terminal. This is the information that your web browser sees when it visits that URL. How does this help us? Well, take a close look at the flags for the curl
command. If we pass the parameter --head
, cURL will return only the headers from the request. Try it with a URL. You'll get a list of lines of the form:
header-name: header-value
Notice, of course, that the very first line looks nothing like this. Remember the status code that we talked about earlier? You'll notice it in the first line as the three-digit number. What we need to do now is extract it from the first line using Perl - and we can do it in the terminal using Perl's -e
flag which let's us pass Perl code directly to the Perl interpreter. We'll also need to add an extra flag to cURL (--silent
) in order to keep it from displaying a progress bar and messing up our Perl script.
Here is what we need... it's quite complicated due to the need to escape a lot of it from the shell:
perl -e "\$s=\`curl [URL] --head --silent\`; \$s=~m/(\\d{3})/;print \$1"
What this is basically doing is fetching the URL with cURL and running it through a Perl regular expression that extracts the status code and prints it out.
Now all you need to is put in the URL of the file you are checking for and compare it to '404'. If you get '404', you can assume the file does not exist.
Of course, this could be very difficult to manipulate in the terminal, so you can write a small script that makes this not only easier to understand, but also easier to execute:
#!/usr/bin/perl
# Get the URL
$url = $ARGV[0];
# Fetch the header
$header = `curl $url --head --silent`;
# Try to find the status code
$header =~ m/(\d{3})/;
# Return the result
exit(0) if $1 == 404;
exit(1);
Simply copy and paste that into a file. For this example, I'll call the file url_check
. Then make the file executable with:
chmod 755 url_check
Then you can check any file with the following simple command:
./url_check [URL]
The return value will be '0' if the server returned a 404 and '1' otherwise. You can then chain this command in the shell just like you would any other command.
wget http://192.168.1.1/backup/01012011.zip
Result code 0 means yes, something else - no.
You can check result code inside the script with $?
variable.