• Resolved mywebmaestro

    (@mywebmaestro)


    I seem to be having many sites that are bleeding bandwidth (upwards of 20GB a month and more) that aren’t getting caught by blackhole… in Awstats, it’s listed as “crawler” (when I tried adding a modsec rule for that, it seemed to be blocking multiple bots, so I’m not sure whether or not the Awstats info is reliable.) In any case, while I do get occassional notices from the plugin that a bot has gotten blocked, it seems to miss a lot. Is the only protection based on a bot following a link it’s told not to? Is there something I’m missing in how to set this up effectively?

Viewing 6 replies - 1 through 6 (of 6 total)
  • Plugin Author Jeff Starr

    (@specialk)

    The first most important question: is there any *page-cache* happening on site? As explained in the docs, page cache breaks dynamic plugins like Blackhole from working correctly. So that would be the first thing to check.

    Thread Starter mywebmaestro

    (@mywebmaestro)

    I have Hummingbird installed (from WPMUDEV) but page caching is turned off. I have browser and gravatar caching enabled. https://wpmudev.com/docs/wpmu-dev-plugins/hummingbird/#caching

    Am I right though in understanding that in order to get blocked, a “bad” bot has to break the coded rule and try indexing the forbidden link? Are there any other protection options based on other behavior?

    Plugin Author Jeff Starr

    (@specialk)

    That is correct, as explained in the plugin docs.

    Thread Starter mywebmaestro

    (@mywebmaestro)

    Are there any plans to try to add an ability to combat the AI training scraping that’s going on? I seem to be seeing a lot of bot traffic from that which doesn’t obey robots.txt, and also doesn’t seem to always identify itself consistently.

    Plugin Author Jeff Starr

    (@specialk)

    It’s a good idea and something that I hope to get to implement soon.

    Plugin Author Jeff Starr

    (@specialk)

    Hey @mywebmaestro, I hope you got this sorted. It’s been a while with no reply so gonna go ahead and mark this thread as resolved to help keep the forum organized. Feel free to post again with any further questions or feedback, Thank you.

Viewing 6 replies - 1 through 6 (of 6 total)

The topic ‘Blocking “crawler”’ is closed to new replies.