As a Los Angeles-based company we know about the environmental challenges of Southern California – from traffic to earthquakes to fires. And this week, it looks like the fires are taking the front and center position. We can literally see them burning from our offices in Burbank.
It reminds us of the fires we all experience in technology as well. The kind of fire that happens just when the systems are tightly constructed, the technology platform is stable, and the entire team is functioning at peak performance. So we wanted to offer our top three fire-preparedness tips for your mobile app and website outage preparation:
1. Backups / Redundancy – First and foremost it’s vital that you are prepared for system shut-downs… especially for technology that supports large audiences. As the organization grows, adding redundancy to key systems will allow for quick switch over during those unexpected sparks that inevitably happen in every business. We suggest setting up a lower-cost, lower performing duplicate of your environment on a different host that can simply hum in the background. Development releasing can be done to this at he same time it’s done to your standard development environment. While it may not provide the level of peak performance your main environment has, it works kind of like a spare tire on a car… it gets the basic job done and insures minimal downtime during issues that arise. Last year, there was red all across the AWS uptime monitor (https://status.aws.amazon.com/) … many of the websites across the US were literally on fire. This was a perfect time to have this backup in place.
2. Security Audits – Issues that can cause a mobile app or website outage come down come up due to security breaches. From code injections to server takeovers to whatever you can think of, there are numerous ways for hackers or site enemies to wreak havoc on your environment. For most of our clients we love to do a thorough Security Audit at the end of key releases where we actually try and mimic the ways in which someone could hack your environment and shut things down. This checklist forces you to pre-think your vulnerability and ultimately try to preempt and take-downs. This is sort of like clearing the brush from around your house so that if a wildfire does erupt, you are giving yourself the best possible chance to limit the damage to your property or IP.
3. The Emergency Process List – This is an obvious one, but rarely enacted. We always suggest developing a plan for when outages happen. Most of the time, these catch teams off guard and there is a significant scramble to figure out what to do. A pre-written process that determines the points below will help you to handle fires in a calm and clear manner and have a very straight forward reporting process for communicating how things will be handled. Some things you want to have on the list include:
- Who is the point person for a fire that erupts?
- How will you deliver the initial communication of what is happening?
- How will you deliver ongoing updates to the team?
- Do you have a standard structure for these updates? This includes: Current status, known issues, projected time to recovery, key point people handling problems (net ops, developer, project lead), and how you are handling the immediate traffic downtime.
- Are you prepared with phone numbers for key people working on it?
- Can you immediately open a chat session with the key technical firefighters to provide updates?
- Are you able to track start and stop times of the fire?
4. Alerts and API Monitoring – You should really track your uptime. There are a variety of services that check your site to make sure key areas are up and running. These tools alert you to downtime across the environment or on specific services. Incredibly valuable for know exactly when the fire starts… it’s your alarm bell. Most have free and paid versions with better alert mechanisms as you invest. Here are a couple of services you might want to look at:
- Pingdom – https://www.pingdom.com
- Uptrends – https://uptrends.com/
- Uptime.com – https://uptime.com/
- Uptime Robot – https://uptimerobot.com/
Additionally If you have third party connections, do yourself a favor and set up API monitoring on connection quality. If you see that a key content partner is down, you can have an alert sent to you. This should be a hidden admin page on your site that pulls from the logs and gives you insight into the status of your connections.
To learn more about services we offer that can help your organization meet their security needs, dive deeper here.