How to block Baidu from indexing MP3 files?

Using Apache I want to prevent people from directly downloading music via url. I just want the flash player to play it. However Baidu MP3 found the files and is establishing direct downloads to my music. Is there a way I can prevent this with htaccess?


You basically have two options :

  1. Tell the crawlers not to index your MP3 files
  2. Prevent direct access by anyone not coming from your website to your MP3 files

For the first option, you have to create a robots.txt file at the root of your web host. It will include something like :

User-agent: *
Disallow: /path/to/mp3s

As for the second option, you have to use mod_rewrite and create a .htacess file or add something something like that to your config :

RewriteEngine On
RewriteBase /
ReWriteCond %{REQUEST_URI} ^/path/to/mp3s/.*
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?example.com/.*$ [NC]
RewriteRule .*\.(mp3)$ - [F,NC]

It will prevent any visitor not originating from your website to access your MP3 files (ie, no deep-linking).

I'd recommend you use both methods simultaneously, as no indexing from a search engine doesn't prevent your files from being found by other crawlers (ignoring robots.txt), and the rewrite rule doesn't prevent crawlers from accessing your files, only visitors coming from a search page.


Turning off directory listing might do it:

<Directory /path/to/mp3s>
  Options -Indexes
</Directory>

This way the crawler can't find the MP3 URLs without reading the flash file, which it almost certainly doesn't do.


This assumes Baidu respects robots.txt, which it may or may not. Other dubious search agents may choose to ignore it.

You can also block particular IP address (or range), if you have particular

order allow,deny
deny from 127.0.0.1
deny from 127.0.0.2
deny from 127.0.0.3
allow from all 

== Outside the scope of your question :

You said that you had a flashplayer that was playing the music. If you can modify the Flash Player to access the URLs with an additional query string - say "?flashaccess=true" - then have all requests to your mp3 directory redirect to a PHP file (that checks for that argument), then have it return the contents of the MP3 file.

Back to htaccess only - You can also have the Flash app make the request with a particular useragent and block/redirect all others. [http://blamcast.net/articles/block-bots-hotlinking-ban-ip-htaccess an example]

It's not as daunting as it sounds.