This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| tamiwiki:internal:networks:tami_sre [2023/03/05 00:11] – 444b | tamiwiki:internal:networks:tami_sre [2023/05/26 21:51] (current) – removed corshunov | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Tame Site Reliability Engineering ====== | ||
| - | This page details our efforts to keep the systems online reliable | ||
| - | ---- | ||
| - | |||
| - | Currently, we have a Raspberry pi in the space that runs a realtime Status webpage from UptimeRobot | ||
| - | The Status page of our Services is [[https:// | ||
| - | |||
| - | |||
| - | |||
| - | |||
| - | ===== Troubleshooting steps ===== | ||
| - | The first step is to identify the nature of the root cause and whether it is related to the network or the infrastructure. | ||
| - | Use the [https:// | ||
| - | * If everything is down(including Tamis IP, 82.80.54.64), | ||
| - | * If the IP address of Tami is reachable (Ping and Telnet), but the yunohost services are down, its likely just an infra issue | ||
| - | * In case the stuff is still not functioning after trying all these steps, you should reach out to someone from the [[tamiwiki/ | ||
| - | |||
| - | ==== Network ==== | ||
| - | Relevant Link: [[tamiwiki: | ||
| - | |||
| - | ==== Infra ==== | ||
| - | Relevant Link: [[tamiwiki: | ||
| - | === If there is an issue with a single service === | ||
| - | * The first step is to see if you can log into [[https:// | ||
| - | * Then check the service at [[https:// | ||
| - | * Review the logs, restart the service if necessary and maybe share logs with yunopast into a relevant group in tamis communication channel | ||
| - | === If there is an issue with a multiple services === | ||
| - | * Attempt the steps above for each services but if its all services, it might be something related to yunohost or the device it is running on | ||
| - | * Try to ssh into yunohost. The password is your yunohost SSO password | ||
| - | * ssh < | ||
| - | * Check out the output from the following services | ||
| - | * sudo systemctl status nginx.service (for website issues) | ||
| - | * sudo systemctl status mautrix_telegram.service (for telegram bridge issues) | ||
| - | * For any errors or for any other reason, try restart the service if it is already broken) | ||
| - | * sudo systemctl restart < | ||
| - | * Failing this, try look for more logs. Look up any error messages and go down the rabbit holes | ||
| - | * sudo journalctl -u < | ||