Nishkriya down and probably staying down

    I just wanted to inform the community that is no longer scraping forums.

    The short summary is that the method used to retrieve the latest topics is being blocked. I think its being blocked specifically for the scraping engine only. I was able to determine that in other cases, ( for example another machine) it works as intended/expected.

    If its being blocked then obviously its a hindrance to the forums enough that they had to implement a block of this type. Scraping already is a gray area, and I do not want to test the limits of Onyx Path patience. They have been kind enough so far. Technically I can find ways to work around this, but that's not in the spirit of being a good of being a good community member. Its better to just not scrape the forums that to play games to find hacks and work-arounds. I'll leave the data up for a little while until I wind down the account and hosting environment.

    Are you sure it wasn't an extension of the general forum problems that we were all having?

      I'd message them directly to attempt to figure out if the block is intentional or accidental. I would sorely miss the utility of the site you've created!

        I’m almost certain that it’s not an intentional attempt to stop the service. The forums have been experiencing a lot of trouble lately.

          This is awful. Let's be honest, these forums are an absolute dinosaur--they don't even support HTTPS--and Nishkriya is a much better interface for extracting information from them than this one.

          I'm not the only one who thinks so.


            Fair questions. I debugged the system on both the server and locally. It worked as expected locally, but the results were different on the server. Specifically, we retrieve a list of topics to scrape using a search query to the forum. From the local environment it had all the expected topics and results and was able to parse as expected. On the server, the same query on the server had a response with 0 results/topics. I output and reviewed the html response to compare. I've accounted for anything else that could be different and do not believe that there can be anything else interfering environment wise. I could be wrong, but I think its unlikely.

            I emailed Rich to inquire, but haven't received a response.

            Something occurred approx 2 months ago. I had a work around in place regarding the dev thread at that time, to make sure it was included in the parsing. Due to the way we read the topics from the same search query above, the devs topic had not been included. I added a work around to fix that which made it appear like it was working, but in reality it was only retrieving the dev thread because of this work around, I had not picked up on the fact that it was missing other topics until a user messaged me to point out the discrepancy.

            Maybe I'll give it a shot and test it again in a few days or a week, if anything changes I'll post here.


              Originally posted by Nishkriya2 View Post
              I emailed Rich to inquire, but haven't received a response.
              The "Contact Us" button? Yes?

                I was doing some searching myself the other day. I too was getting zero results for things I knew had results. I think it's whatever's afflicting the forums currently.


                  Ian tells me that whatever in the forum code is blocking your action is deeper than we have access to. It’s not deliberate on our part.


                    I know nothing about coding, so I'm speaking from a place of ignorance, but is it possible that Ask the Devs has just grown larger than the forum software is equipped to deal with? I know Giant in the Playground's forums, which are also on a version of vBulletin, close all threads after 50 pages, and Ask the Devs is already more than 18x that length.


                      Thanks for clarifying Rich.

                      I'll check the forums status periodically and keep an eye out for changes or updates. I wont take anything down and if i can get it working again soon I will do so and post an update here.


                        I do wonder if that thread is that long, if we're simply hitting timeout conditions in the server. If it were multiple queries failing, I'd suggest dropping the rate, but given that even a single query seems bad, I wonder if we could scrape a _different_ thread as a test... say, this one, and see if it catches RichT's post, or if it's general across the whole SW stack.


                          Nishkriya appears to be working again, at least for the time being!