Bug 150 - bugzilla robots.txt blocking web crawlers such as archive.org
Summary: bugzilla robots.txt blocking web crawlers such as archive.org
Status: CONFIRMED
Alias: None
Product: Libre-RISC-V Website
Classification: Unclassified
Component: website (show other bugs)
Version: unspecified
Hardware: All All
: --- normal
Assignee: Luke Kenneth Casson Leighton
URL:
Depends on:
Blocks:
 
Reported: 2019-12-22 01:24 GMT by Jacob Lifshay
Modified: 2019-12-22 02:05 GMT (History)
1 user (show)

See Also:
NLnet milestone: ---
total budget (EUR) for completion of task and all subtasks: 0
budget (EUR) for this task, excluding subtasks' budget: 0
parent task for budget allocation:
child tasks for budget allocation:
The table of payments (in EUR) for this task; TOML format:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Lifshay 2019-12-22 01:24:13 GMT
We should switch robots.txt to allow more things, a good template to use would be Mozilla's bugzilla robots.txt:
https://bugzilla.mozilla.org/robots.txt
Comment 1 Luke Kenneth Casson Leighton 2019-12-22 02:00:24 GMT
yep this apparently is quite common, web-crawling of bugzilla can be pretty heavy so mozilla set up a default that banned pretty much everything.  i'm not so bothered so have set it to "Allow /"
Comment 2 Luke Kenneth Casson Leighton 2019-12-22 02:01:42 GMT
(In reply to Jacob Lifshay from comment #0)
> a good template to use
> would be Mozilla's bugzilla robots.txt:
> https://bugzilla.mozilla.org/robots.txt

just copied it entirely, just... because :)
Comment 3 Jacob Lifshay 2019-12-22 02:05:52 GMT
(In reply to Luke Kenneth Casson Leighton from comment #2)
> (In reply to Jacob Lifshay from comment #0)
> > a good template to use
> > would be Mozilla's bugzilla robots.txt:
> > https://bugzilla.mozilla.org/robots.txt
> 
> just copied it entirely, just... because :)

Thanks, sounds good to me!

if you have more time, it would be nice to also fix #149