Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@ nav:
- Operational procedures: how-to/administrate/operations.md
- Restund (TURN): how-to/administrate/restund.md
- Investigative tasks (e.g. searching for users as server admin): how-to/administrate/users.md
- Support:
- Triaging Issues: how-to/support/triaging_issues.md
- Collecting information with the Web Inspector: how-to/support/inspector.md
- Reference:
- Architecture Overview: understand/overview.md
- Single Sign-On and User Provisioning: understand/single-sign-on/README.md
Expand Down
73 changes: 73 additions & 0 deletions src/how-to/support/inspector.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
<a id="inspector"></a>

# Examining Wire issues using the Web Inspector

The Web Inspector in your web browser can be a very handy tool, when debugging issues people can see in their web browsers.

## Conference Calling

### Pulling a Calls Config

End User complaint: some conference calling not working, end user cannot get logs.

This procedure should get us:

* The calls configuration of the backend.
* The HTTP error code that SFT MAY be presenting.

Procedure:

From one of the affected webapp users' machine:

* Right click, anywhere in the web application, and open the inspector.
* The browser should now have a new window in it. This is the browser's inspector showing you the code for whatever you clicked on.
* Select the 'Network' tab of the inspector, and return to Wire without closing the inspector window.
* In the Wire application, select a channel/group where others have successfully been placing conference calls, and where there are NO federated users.
* Place a call. You should see files loading into the 'Network' tab of the inspector. The results of placing this call do not matter, but do let the call either succeed (and hang it up!), or fail.
* In the inspector, Click on the 'File' or 'Name' column header once. This should sort the requests that were sent.
* If there is not a 'Method' column shown, please right click on 'Name' or 'File', and select 'Method' to make it visible.
* Look for a file named either 'v2', or 'v2?limit=3'. that is the settings given to you by your Wire backend, for calling.
* There will be at least two shown. In the 'method' column, the files will have a method of 'GET', 'GET + Preflight', or 'OPTIONS'. Click on the one labeled either 'GET'.
* A new pane will have popped open with 'Headers', 'Response', 'Timings', and other tabs.
* Click on the 'Response' tab.

You should now see the 'calls config'.

Your calls config is a JSON document, made of several sections, telling clients what credentials to use, and where they should find the calling servers.

* Please stay in the inspector, save a copy of the calls config, and give a copy to your support team.

* Examine the calls configuration for the 'sft_servers' section (not 'sft_servers_all').
* There should be a single URL in there, pointing to https://<YOUR_SFT_SERVER_HERE>/
* There should also be entries for each TURN server in your environment.

* Your inspector should still have the 'Network' tab open.
* Close the portion of the inspector that shows our request. This is the part that has 'Headers', 'Response', and 'Timings' tabs.
* You should now see the list of requests and responses again.
* If there is not a 'Url' column shown, please right click on 'Name', or 'File', and select 'Url' to make it visible.
* Click on the 'Url' column header once. This should sort the requests that were sent by Url.
* Find the requests that have the same URL as you found in the 'sft_servers' section of the calls config.
* Screenshot them. ensure that the 'Name' or 'File' is readable, the 'Url' is fully visible, the 'Method' is visible, and the 'Status' is visible.

## File Sharing

### Examining file Upload/Download Problems

End User Complaint:

File upload/download is not working. someone uploaded a file, but i can’t download it. I can try to send things, but they never upload. I’m on webapp.
Procedure:

From one of the affected webapp users' machine:

* Right click, anywhere in the web application, and open the inspector.
* The browser should now have a new window in it. This is the browser's inspector showing you the code for whatever you clicked on.
* Select the 'Network' tab of the inspector, and return to Wire without closing the inspector window.
* In the Wire application, select a channel/group where others have successfully been uploading files, and where there are NO federated users.
* Attempt to download a file by clicking on the three dots next to the file, and selecting ‘Download’. You should see traffic in the 'Network' tab of the inspector. The results of your attempt do not matter.
* In the inspector, Click on the 'File' or 'Name' column header once. This should sort the requests that were sent.
* If there is not a 'Method' column shown, please right click on 'Name' or 'File', and select 'Method' to make it visible.
* Look for a file who’s name starts with “3-4”
* There will be at least two shown. In the 'method' column, the files will have a method of 'GET + Preflight', or 'OPTIONS'.
* after the two "3-4" files shown, there will be a 'GET' request for the same filename, minus the '3-4-' on the beginning. It will be a GET request.
* Screenshot the three files. Ensure that the 'Name' or 'File' is readable, the 'Url' is fully visible, the 'Method' is visible, and the 'Status' is visible.
154 changes: 154 additions & 0 deletions src/how-to/support/triaging_issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
<a id="triaging_issues"></a>

# Triaging Issues with your On-Prem Wire Deployment

## Introduction

In order to help our users and their support staff help themselves, we are providing some general guidance for first line troubleshooting of your On-prem Wire installation.


## This is only for individuals who have accounts on an On-prem (not cloud) Wire Deployment!

If you're having a problem with your account on the 'wire.com' domain (the Wire Cloud), please check https://status.wire.com/, verify you are running a recent client (https://wire.com/en/app-download), and contact [Wire support](https://support.wire.com/hc/en-us/request/new).

## Being Prepared

If you are supporting the Wire product deployed in your network (not Wire.com) there are many things you can do before you have an incident, in order to help quickly resolve issues.

Please read and understand the content in https://docs.wire.com/latest/understand/overview.html.

### Know your Wire Deployment

In an outage there are many key questions you will need quick answers to, in order to properly triage and troubleshoot an issue. Having these facts either memorized, or written down clearly in a central location will expidite response times.

Quick Facts:

* Who can administrate your Wire installation / How do you contact them?
* Is your Wire Calling infrastructure hosted in a separate DMZ (Wire recommended), hosted alongside your Wire install, or are you using our Cloud Calling offering?
* What does the network path look like between your users, and your Wire installation?
* Is there anything "out of the ordinary" about how your Wire installation is configured?
* Have there been any major changes or failures recently? Inside your network, or in the wider Internet? (think: Cloudflare, AWS, etc...)

### Know your Infrastructure
All products have dependencies; Wire is no different. Whether these be Internet, Power, or something like an SSO provider, dependencies break, and knowing what you're dependent on gets you closer to solutions quickly.

What to know:

* What Domain Names are a part of your Wire installation? How are those domains resolved by the end users?
* Who is your Internet Service Provider?
* What DNS service does your Wire install use?
* What network time source(NTP service) are your Wire servers depending on?
* What load balancers and firewalls are in use around your Wire deployment?
* What infrastructure does your Wire service run on?

### Know your Users

Knowing what your users are using, how they use it, and what they value in it can help you get your users what they value quickly.

Quick Facts:

* What platforms are the users using, and in what porportion? (Web / Android / iOS / Windows / Mac / Linux / ...)
* What is the network path between your users, and your Wire services?
* For mobile platforms:
* How do your users recieve notifications? (APNS / FCM / WebSockets)
* Are you managing Wire on your users' mobile devices with a Mobile Device Management(MDM) product?
* How do your users find your Wire installation?
* How do your users login to Wire? What infrastructure does that depend on? (SSO, SCIM, LDAP, etc...)
* What do your users use Wire for? Mostly Messaging, mostly Calling, File sharing?
* How do your users get connected to your wire backend? Do you use a deeplink, or do your users get redirected from wire.com?

### Take Backups
Both the Wire backend, and the Wire clients have backup and restore procedures. Familiarize yourself with them, and ensure backups are taken regularly.

## When a User Reports a Problem

When a user reports a problem with your Wire service, the first thing you need to determine is what the Severity, and the urgency of their report is. If a user is reporting an icon failing to draw correctly at midnight, that might not be so bad, but if that icon is the 'call' button... that may changes things, depending on whether calling is a must-have-always feature, or a nice to have for your end users. Urgency can also depend on whether everyone is busily using the product, or everyone is on holiday.

Take a moment to understand severity, and urgency separately, before deciding how you react to an error report.

Each of the diagrams on https://docs.wire.com/latest/understand/overview.html shows a different view of your platform. Let's go through each, and a few examples of problem, and how you troubleshoot them.

### DMZ Split

![image](../../understand/img/architecture-server-simplified.svg)

Your Wire install is distributed across many physical computers, possibly in a datacenter. Wire recommends the deployment of Wire in two clusters, one cluster for "calling", which is placed in your DMZ, and one cluster for "everything else", which lives in your secure hosting location. If your user is complaining about calling issues, knowing where your calling is located has become important.

If you or the user have access to the web client (not desktop, has to be a real web browser), you or the end user can download your calling server configuration as it is given by the backend, following the (calls config retrieval procedure)[inspector.md/#pulling-a-calls-config]. This will show you where your backend is telling the clients to connect, when they want to place a call.

### Client Communications

![image](../../understand/img/architecture-client_communications.svg)

Users rely on their client devices connecting to their Wire backend properly, otherwise the application cannot function.

Your Wire backend has many domains which must be resolvable by your end users. These domains most likely point to load balancers in your environment, like pictured above.

If your problem is just effecting calling, making sure the calling domains are reachable.
If a user is having a problem with recieving notifications of new messages, they may be having trouble with their cell phone tower (on mobile), or perhaps issues with their web socket connection. remember that the user's problem is in front of them / in their hand; don't check that YOU can resolve the host, check that THEY can.

#### Routing

Once traffic has made it into your Wire environment, your load balancers and firewalls have to hand that traffic over to your Wire install. Here you can see a more detailed view of how traffic enters your cluster.

Looking at this diagram, you can see that normally, calling services are completely separated from the rest of the backend. assuming this is the case (you do know your install, yes?), you can rule out problems with the load balancer for the Wire backend, and focus on your calling firewalls/infrastructure directly.

### Health Checks

When your users complain about a service, the first instinct is to check if the service is online yourself. This is the first form of health check. Which health checks you perform and in what order should be based on the user's complaint, not "just" what we expect to see.

https://docs.wire.com/latest/how-to/administrate/operations.html?h=health#health-checks


### What does healthy look like?
As an example, this is the result of running the `kubectl get pods --namespace wire` command to obtain a list of all pods in a typical cluster:

```shell
NAMESPACE NAME READY STATUS RESTARTS AGE
wire account-pages-54bfcb997f-hwxlf 1/1 Running 0 85d
wire brig-58bc7f844d-rp2mx 1/1 Running 0 3h54m
wire brig-index-migrate-data-s7lmf 0/1 Completed 0 3h33m
wire cannon-0 1/1 Running 0 3h53m
wire cargohold-779bff9fc6-7d9hm 1/1 Running 0 3h54m
wire cassandra-ephemeral-0 1/1 Running 0 176d
wire cassandra-migrations-66n8d 0/1 Completed 0 3h34m
wire demo-smtp-784ddf6989-7zvsk 1/1 Running 0 176d
wire elasticsearch-ephemeral-86f4b8ff6f-fkjlk 1/1 Running 0 176d
wire elasticsearch-index-create-l5zbr 0/1 Completed 0 3h34m
wire fake-aws-s3-77d9447b8f-9n4fj 1/1 Running 0 176d
wire fake-aws-s3-reaper-78d9f58dd4-kf582 1/1 Running 0 176d
wire fake-aws-sns-6c7c4b7479-nzfj2 2/2 Running 0 176d
wire fake-aws-sqs-59fbfbcbd4-ptcz6 2/2 Running 0 176d
wire federator-6d7b66f4d5-xgkst 1/1 Running 0 3h54m
wire galley-5b47f7ff96-m9zrs 1/1 Running 0 3h54m
wire galley-migrate-data-97gn8 0/1 Completed 0 3h33m
wire gundeck-76c4599845-4f4pd 1/1 Running 0 3h54m
wire nginx-ingress-controller-controller-2nbkq 1/1 Running 0 9d
wire nginx-ingress-controller-controller-8ggw2 1/1 Running 0 9d
wire nginx-ingress-controller-default-backend-dd5c45cf-jlmbl 1/1 Running 0 176d
wire nginz-77d7586bd9-vwlrh 2/2 Running 0 3h54m
wire redis-ephemeral-master-0 1/1 Running 0 176d
wire spar-8576b6845c-npb92 1/1 Running 0 3h54m
wire spar-migrate-data-lz5ls 0/1 Completed 0 3h33m
wire team-settings-86747b988b-5rt45 1/1 Running 0 50d
wire webapp-54458f756c-r7l6x 1/1 Running 0 3h54m
1/1 Running 0 3h54m
```

This cluster is running one of each service, but if you are deployed in high-availability, you will see three of each.

### Historic Issues

#### Cassandra NTP Sync

Summary:
From time to time, if your NTP services go out of service, and your cassandra database nodes are allowed to have their time to drift, your cassandra nodes may refuse quorum writes.

Symptomology:
There are errors in the brig logs about cassandra, refering to Quorum.

User Visible Problems:
brig is the first service to go, having problems logging people in.

Action:
Ensure time is set correctly on your cassandra database services. We recommend using an NTP service on all nodes of your cluster, to prevent these situations.