Roll20 uses cookies to improve your experience on our site. Cookies enable you to enjoy certain features, social sharing functionality, and tailor message and display ads to your interests on our site and others. They also help us understand how our site is being used. By continuing to use our site, you consent to our use of cookies. Update your cookie preferences .
×
Create a free account
This post has been closed. You can still view previous posts, but you can't post any new replies.

Downtime on 12/4: Post-Mortem

1449502130

Edited 1449508297
Riley D.
Roll20 Team
On Friday afternoon/evening, we experienced some unexpected downtime. This downtime affected all 30 of our real-time shards for approximately 5-10 minutes, and then two shards in particular were affected for an extended period (off and on over the course of about 2 hours).&nbsp; First off, we want to say that downtime is always something we strive to avoid. We know that games on Roll20 are often scheduled weeks in advance, and getting a group together can be hard enough without piling another difficulty on top of it in the form of your tools not working. We take any downtime very seriously. We think it's important to let you know when it happens, why it happened, and what we're doing to prevent it from happening in the future. The cause of this downtime was our real-time service provider, Firebase. They are the ones who provide the infrastructure that allows us to have 10,000 players online playing tabletop games together at the same time. Most of the time, this service works well (99.95% uptime is the goal, and nearly every month that goal is met). Sometimes, there are minor hiccups. Unfortunately on Friday night there was a major outage for those two shards, which affected approximately &nbsp;7% of all games. In addition to this outage affecting these games, the entire site was slowed down by requests piling up and being unable to reach Firebase via their REST API. This is what we use to do things like create a new game, add a player to game, or copy things between games using the Transmogrifier. This led to some 504 (timeout) errors, as well as in-game problems such as the image library search responding slowly or not at all. By 6:30 PM, approximately 5 minutes after the downtime started, our technical team had been alerted and was responding. By 6:40 PM, we had made some changes on our end to alleviate the strain of the two shards being down so that the 504 Errors and the image library became responsive again. In addition, 28 out of our 30 shards were now back online and operating normally. We continued to work with Firebase throughout the rest of the evening to get the remaining two shards online. By 8:10 PM, service had been restored to all shards. Later on in the evening, the two shards experienced a slow-down from approximately 9:30PM - 10:00 PM; however the changes we had made previously prevented this slowdown from affecting the rest of the site. We did our best to communicate these issues on our Twitter feed, @roll20app, which is always the best source of information about downtime and site-wide issues. You can also always check on how we're doing on our status page, at <a href="http://status.roll20.net" rel="nofollow">http://status.roll20.net</a> . Here are a few things that we're doing now to help keep this particular issue from happening again: We're working with Firebase to get a better idea in advance of when there will be small amounts of downtime. The initial 5 minutes or so of downtime across all shards was due to a planned database restart on their end which we were not made aware of in advance. Our goal is to always know about planned downtime in advance and communicate it to you so you can plan accordingly. We're re-tooling pieces of our the Image Library search to hopefully be more responsive and place less strain on the site as a whole, so that not only will queries return more quickly, but so that if there are technical issues the image search can remain online. We're investigating other ways that we can more quickly and clearly communicate to the whole community when there are issues, and what's currently being done to handle them. On a personal note, I just happened to be out of town on vacation when all of this happened, so I'd like to thank the members of the Roll20 team (in particular, Steve and Stephanie) who were able to be around to start fixing things and to communicate to the community what was happening until I could get back to my computer to help. As we continue to expand the Roll20 team in the future, this type of coverage will only get better for us, which is a great thing for the entire community. Finally, I'd like to again apologize to anyone whose games were disrupted in any way; our goal is to bring people together to enjoy tabletop gaming, and we're always striving to find ways to make sure we can meet that goal with the highest standard possible.
Would this explain the lack of voice chat that my group experienced on Sunday between 3 pm and 9 pm EST? We're all living on the east coast between DC and PIttsburgh.
1449504844

Edited 1449506582
Riley D.
Roll20 Team
Alexander B. said: Would this explain the lack of voice chat that my group experienced on Sunday between 3 pm and 9 pm EST? We're all living on the east coast between DC and PIttsburgh. No sorry, that was probably a different issue. We're hearing reports from some folks that the built-in A/V (the WebRTC/Tokbox one) is suddenly not working. We're trying to figure out exactly what's going on here and it helps to have it all consolidated into one place. Please head to this forum thread and share the information requested to help, thanks!
I had originally posted a topic with searches for images on the web using the 'everything' field. This problem seems persistent and I was just wanting to make sure it was known that it seems to still be a lingering effect. It would have likely started around Saturday night but as of last night and testing it today there still appear to be issues - I know there is another post running a Star Wars campaign with much the same issue as well. Examples: I have searched for ancient warrior before and now it yields no results, fantasy courier much the same. It appears that I am getting better results on some of my searches than in previous days but most are still either more limited or blocked out entirely. I know you stated you were re-tooling the image library and wasn't sure if anything for this was being tested or actually in effect currently- Thanks for the update & all the great work!
Still does not work. <a href="https://app.roll20.net/forum/post/2662618/infinite-game-loading/?pageforid=2685250#post-2685250" rel="nofollow">https://app.roll20.net/forum/post/2662618/infinite-game-loading/?pageforid=2685250#post-2685250</a>
1449521584
Stephen Koontz
Forum Champion
Marketplace Creator
Sheet Author
API Scripter
Compendium Curator
We've recently pushed out some optimizations to library searches. Let us know how it's impacting your results and if you're still having the same or any new issues.
Thanks for taking the issue seriously and working to fix it, but I do want to point out that this is the 2nd-3rd weekend playtime of ours in recent memory that we encountered this issue. Each time the outage lasted for hours -- basically through our game. Part of the power of Roll20 is improvising with custom tokens on the fly, but that's kind of impossible if the search doesn't come back fast. Searches are coming back fast now, but it's not peak game time, which seems to be in the evenings on weekends. Please monitor this as we come up on it next weekend.
TheWebCoder said: Thanks for taking the issue seriously and working to fix it, but I do want to point out that this is the 2nd-3rd weekend playtime of ours in recent memory that we encountered this issue. Each time the outage lasted for hours -- basically through our game. Part of the power of Roll20 is improvising with custom tokens on the fly, but that's kind of impossible if the search doesn't come back fast. Searches are coming back fast now, but it's not peak game time, which seems to be in the evenings on weekends. Please monitor this as we come up on it next weekend. When you say that there was an outage for hours do you mean the whole site wasn't working or you specifically mean the image search library wasn't working?
Vladimir R. said: Still does not work. <a href="https://app.roll20.net/forum/post/2662618/infinite-game-loading/?pageforid=2685250#post-2685250" rel="nofollow">https://app.roll20.net/forum/post/2662618/infinite-game-loading/?pageforid=2685250#post-2685250</a> Hmm. It seems you may be experiencing a different issue. What happens when you click this link directly?&nbsp;<a href="https://roll20-5.firebaseio.com/pingdomcheck.json" rel="nofollow">https://roll20-5.firebaseio.com/pingdomcheck.json</a>
Asset search is working fine for me atm, but I'll wait and see as we close in on the weekend again. Thanks for the timely update, you guys rock. :D
Riley D. said: Vladimir R. said: Still does not work. <a href="https://app.roll20.net/forum/post/2662618/infinite-game-loading/?pageforid=2685250#post-2685250" rel="nofollow">https://app.roll20.net/forum/post/2662618/infinite-game-loading/?pageforid=2685250#post-2685250</a> Hmm. It seems you may be experiencing a different issue. What happens when you click this link directly?&nbsp; <a href="https://roll20-5.firebaseio.com/pingdomcheck.json" rel="nofollow">https://roll20-5.firebaseio.com/pingdomcheck.json</a> Nothing happens. Infinite loading.&nbsp;
Adam U. said: Asset search is working fine for me atm, but I'll wait and see as we close in on the weekend again. Thanks for the timely update, you guys rock. :D Based on our internal metrics, the overall speedup is around 2-3x (so requests are overall fulfilling in about 1/3 as much time as they took before). Like you said we're eager to see if this holds up for the weekend traffic, but we have no reason to think it won't! Vladimir R. said: Riley D. said: Vladimir R. said: Still does not work. <a href="https://app.roll20.net/forum/post/2662618/infinite-game-loading/?pageforid=2685250#post-2685250" rel="nofollow">https://app.roll20.net/forum/post/2662618/infinite-game-loading/?pageforid=2685250#post-2685250</a> Hmm. It seems you may be experiencing a different issue. What happens when you click this link directly?&nbsp; <a href="https://roll20-5.firebaseio.com/pingdomcheck.json" rel="nofollow">https://roll20-5.firebaseio.com/pingdomcheck.json</a> Nothing happens. Infinite loading.&nbsp; In that case there is something preventing your computer from contacting Firebase. Most likely a DNS issue. Are you behind any sort of corporate firewalls? You might try using Google's Public DNS and see if that makes a difference:&nbsp;<a href="https://developers.google.com/speed/public-dns/?hl=en" rel="nofollow">https://developers.google.com/speed/public-dns/?hl=en</a>
Didn't help.&nbsp; I'm using a home computer. Some 4 days ago it worked perfectly (before OS reinstall), then I reinstalled my Windows, and Roll20 is the only thing that does not work. I'm using the same ISP and didn't change any settings (except the DNS settings recommended by you). I've tried both&nbsp;addresses (8.8 and 4.4 for both v4 and v6). app.roll20.net/:12 Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'self' 'unsafe-eval' https://*.googlesyndication.com https://*.doubleclick.net https://*.googlesyndication.com <a href="https://www.googletagservices.com" rel="nofollow">https://www.googletagservices.com</a> https://*.googlesyndication.com <a href="https://www.google-analytics.com" rel="nofollow">https://www.google-analytics.com</a> https://*.googlesyndication.com <a href="https://d3clqjduf2gvxg.cloudfront.net" rel="nofollow">https://d3clqjduf2gvxg.cloudfront.net</a> https://*.googlesyndication.com https://*.firebaseio.com https://*.googlesyndication.com https://*.opentok.com https://*.googlesyndication.com <a href="http://www.google-analytics.com" rel="nofollow">http://www.google-analytics.com</a>". Either the 'unsafe-inline' keyword, a hash ('sha256-315t8IDUpS+DqpIJX04cb/QzsdvKjh9JsJcf0BDyKt8='), or a nonce ('nonce-...') is required to enable inline execution. app.roll20.net/:13 Refused to execute inline script because it violates the following Content Security Policy directive: "script-src 'self' 'unsafe-eval' https://*.googlesyndication.com https://*.doubleclick.net https://*.googlesyndication.com <a href="https://www.googletagservices.com" rel="nofollow">https://www.googletagservices.com</a> https://*.googlesyndication.com <a href="https://www.google-analytics.com" rel="nofollow">https://www.google-analytics.com</a> https://*.googlesyndication.com <a href="https://d3clqjduf2gvxg.cloudfront.net" rel="nofollow">https://d3clqjduf2gvxg.cloudfront.net</a> https://*.googlesyndication.com https://*.firebaseio.com https://*.googlesyndication.com https://*.opentok.com https://*.googlesyndication.com <a href="http://www.google-analytics.com" rel="nofollow">http://www.google-analytics.com</a>". Either the 'unsafe-inline' keyword, a hash ('sha256-v19duySXMwXd3zUU660hZgcRTRW/BoLT6eLuziV0Xdk='), or a nonce ('nonce-...') is required to enable inline execution. app.js:29 70 app.js:30 TOUCH SUPPORTED: false app.js:30 USING WEBGL ACCELERATION... app.js:30 WEBGL STARTUP SUCCESS app.js:25 select app.js:25 Switch mode to select app.js:40 Initializing new dice engine with randomness... app.js:40 Using random entropy app.js:44 window resize app.js:30 Final set zoom! app.js:30 UPDATE GL SIZE! app.js:30 Final set zoom! tutorial_tips.js:7 tuts loaded app.js:36 Final page load. app.js:44 Refresh jukebox List! app.js:36 Scan for new plays! app.js:44 window resize app.js:30 Final set zoom! app.js:30 UPDATE GL SIZE! app.js:30 Final set zoom!
Update: Tried to load a campaign using Tor Browser. It loaded, but I got a popup informing me that the browser was incompatible with Roll20.&nbsp; So, this may be a Chrome issue. Maybe I need to turn something else on or off?
1449602388
Stephen Koontz
Forum Champion
Marketplace Creator
Sheet Author
API Scripter
Compendium Curator
Vladimir R. said: Update: Tried to load a campaign using Tor Browser. It loaded, but I got a popup informing me that the browser was incompatible with Roll20.&nbsp; So, this may be a Chrome issue. Maybe I need to turn something else on or off? Do you have an anti-virus running, can you try pausing or disabling it temporarily?
Yes, I did. It did not help.&nbsp; First test was w/o any antivirus software whatsoever, as I mentioned in the previous thread (closed for some reason) I linked to.&nbsp;
As of playing again today the built in image search still appears to be highly limited as to what it used to draw in for fairly common searches. Most all search queries now return no results or very few nearly unrelated results. Not sure what all has changed with it but I currently find myself unable to find tokens with ease as I could a week or so ago. Figured I'd give an update since I have now tried a number of searchable results - many that I have tried before even - now turn up no results still.
1449683349
Stephen Koontz
Forum Champion
Marketplace Creator
Sheet Author
API Scripter
Compendium Curator
MrTBurr said: As of playing again today the built in image search still appears to be highly limited as to what it used to draw in for fairly common searches. Most all search queries now return no results or very few nearly unrelated results. Not sure what all has changed with it but I currently find myself unable to find tokens with ease as I could a week or so ago. Figured I'd give an update since I have now tried a number of searchable results - many that I have tried before even - now turn up no results still. Can you give us some examples?
What about my problem?
1449768992
Stephen Koontz
Forum Champion
Marketplace Creator
Sheet Author
API Scripter
Compendium Curator
Vladimir R. said: What about my problem? Are you having the issue in Firefox as well?
Installing the Fox of Fire did not help. Same problem.
1449775346
Stephen Koontz
Forum Champion
Marketplace Creator
Sheet Author
API Scripter
Compendium Curator
Vladimir R. said: Installing the Fox of Fire did not help. Same problem. Your issue is different then those caused by the server downtime. Can you please start a new thread in the Solving Technical Issues / Bug Reports forum. Shooting through&nbsp; these steps and PM me the bug report if they don't work.
I'm afraid I'm having similar issues with any sort of "everything" search in the art library. It'll only work with very simple searches, it rarely works with more than one word in the search and nothing overly complicated. =(
1450382524
Silvyre
Forum Champion
Hey, Hannah L. I would definitely recommend starting a new thread to post your responses to the troubleshooting steps linked by Steve K.