(Internal) Email reply tracking bugs

Audience

Internal

Displayed Description

Page Type

Article

Product	Gmail & Outlook email reply tracking
Expert(s)	David Zhou (CRM team)
Slack channel
This article was last verified on	05/05/2024

🔍 Articles in This Section

Please use the following list to see additional internal articles regarding the SuccessFactors integration:

(Internal) Email reply tracking: Overview
(Internal) Email reply tracking bugs (📍you are here)

This is a playbook for debugging when a sequence continues to send after a candidates replies. (In other words, the sequence was incorrectly not marked as replied.)

This is one of the most serious categories of bugs, but at the same time, these bug reports are user error the vast majority of the time. It’s imperative that you very quickly determine whether this is user error or a real bug, and escalate to engineering as soon as possible if you think there’s even a small chance that this is a real bug.

If a Gem employee is reporting an issue where a sequence isn’t marked as replied, but it is before the next stage is scheduled to send, this may not be a real issue

See Gem-specific reply tracking issue for more details

When this playbook applies

This playbook was originally written with Gmail in mind. (That is, the sending user/the person with the ZenSourcer account is on Gmail.) If the sending user is on Outlook, refer to the Outlook section below.

Another warning: if the sender of the sequence was a ZenSourcer employee, then everything is slightly different and this playbook won’t apply (because of how prod and beta interact with each other and our push cursors).

Our reply-detection logic

A received email message will mark a candidate as replied if:

The email is not from bot@zensourcer.com
The email is not from the user’s email address, or any address they have as an alias in Gmail
One of the following is true:

The in-reply-to header of the reply matches the sent_message_id of a message in our sent_message table for the current user†.
The Gmail threadId of the reply matches the sent_gmail_thread_id of a message in our sent_message table for the current user†.
The from header of the reply matches (case-insensitive, but dots do matter) the to_email of a message in our sent_message table for the current user†.
The x-original-sender header of the reply matches (case-insensitive) the to_email of a message in our sent_message table for the current user†.

†For the current user means that sent_on_behalf_of_user_id is None and user_id matches, or that sent_on_behalf_of_user_id matches.

It’s worth noting that once a message is categorized as a “reply”, we mark the candidate as replied. This distinction is meaningful because we could have detected a reply based on an old sequence’s message id or thread id, but even in this case, we will still mark the candidate as replied if there happens to be an active sequence from the user to the candidate. In other words, any reply from candidate C detected in the inbox of user U will stop sequences between U and C.

Support playbook

Because we want to debug these issues as quickly as possible, we recommend doing most of the digging yourself, rather than waiting a long time for responses from the user.

Support should:

Ask the user for:

the sequence name
the person’s name and email
the email address they replied from
who they replied to (sometimes the user who contacts support is not the same user!)
the date and time of the reply

Next up, look up the following information:

Use this to look up the sequence_id and person_id (you can get this from assuming the user, looking at the urls, and base64 decoding)
Use this to look up the person_sequence_info_id (from the person_sequence_info table in Numeracy)
Confirm that replied_timestamp is null. If it’s not, then the user might be wrong, and maybe we did catch the reply. (See “replied_timestamp is not null,” below.)
Next, use Numeracy to find all sent_message rows matching this person_sequence_info_id. This will tell you how many messages we sent, when, and their to_email/sent_gmail_id/sent_gmail_thread_id/sent_message_id/user_id /sent_on_behalf_of_user_id (which you’ll need to know when evaluating our reply-detection logic rules, above).

[Don’t block on this step!] If it would be helpful, ask the user for the full mail headers of the reply. (In Gmail, click “…” and then “Show original,” and have them copy/paste that entire page.) But again, it’s better to get minimal information quickly rather than complete information more slowly.
Use the “What happened to this message query” on the Push Cursor board in Honeycomb to search the message id and determine what processing step occurred
Use GraphQL to look up the mail headers, and Gmail’s ID for this email:

Be VERY CAREFUL before running this! This is a very sensitive GraphQL endpoint, and we want to make sure nobody abuses it.
The GraphQL query is:

query {

messagesByUser(userId: XXX, queryString: "XXX")

}

* Common query strings might be:
  * from:foo@bar.com if all you know about the email is the from address
  * if you know the Message-ID header, you can specifically pull up that email with the query string rfc822msgid:xxx@yyy.com (if the message ID has <angle brackets> around it, remove those first)
* If this finds a matching message: This will give all the mail headers (which we can compare against our reply rules, above), and also the first two items should be id and threadId. These are not headers, but instead references that the Gmail API gives us.
  * Note: our convention is to use gmail_message_id/sent_gmail_message_id to refer to this id from the Gmail GraphQL endpoint, gmail_thread_id /sent_gmail_thread_id to refer to the threadId from this Gmail endpoint, and message_id/sent_message_id to refer to the actual Message-ID email header.
* If this does not find a matching message: go back to the user and ask for more details about the reply. (Maybe they gave you the wrong from email address? Or it was actually detected in someone else’s inbox, and not their own? Or they deleted the message from Gmail?)
  * Plug in a candidate email address and a team_id, and this helpful Numeracy query will show all emails from that address, sent to the team:

select * from all_email_metadata where id in (select all_email_metadata_id from email_address_to_email_mapping where email='XXX' and team_id=XXX and role='SENDER') order by id asc;

Warning: this table returns base 10-encoded gmail_id/gmail_thread_id, while everything else expects base 16-encoding, so either convert to hex before querying Honeycomb/etc, or use the message_id column of the results instead.
Once you have the message, take a note of the history_id and check it against this user gmail_push_cursor information. If the message’s history_id is less than last_push_history_id, then our processes wouldn’t have ever attempted to sync this message. So the problem is that we are not getting gmail push notifications for this user. There are currently no definitive reasons on why this would happen.
Search Honeycomb for this user ID and gmail_message_id. To do this, go to https://ui.honeycomb.io/zensourcer/datasets/heroku-logs/ and set the following search parameters:

At the very top, change the time window from “Last 2 hours” to something that will cover the date the email was received (perhaps “Last 7 days”).
In the “Filter” box, add:

event_type = push_cursor_syncer_run
user_id = XXX
gmail_message_id = XXX

Then click “Run query”
There should generally be 3 rows of results, with push_cursor_type set to DEFAULT, GREENHOUSE, and METADATA. We care only about the DEFAULT row.
For the DEFAULT row, look at result_category and result_description.
The possible categories are: error if we errored, noop if we think this message is not a sequence reply, and reply if we think it is a sequence reply.

Try to figure out why the reply wasn’t detected. The vast majority of the time, this is user error. (For instance, if the candidate replies in a brand-new thread without in-reply-to headers from a different email address, we can’t catch it.)

If you can’t figure it out, escalate to the eng oncall immediately.
Because of how high-priority this type of bug is, make sure that the oncall acknowledges the issue. Simply pinging them on Slack isn’t enough — make sure they see your message, and respond to it to confirm that they’re investigating. (If you’re having trouble getting the attention of the oncall, feel free to page them by using the /page-oncall slash-command in the #eng channel in Slack.)
Even if you think you’ve discovered the reason, you should send the results of your investigation to the eng oncall for a quick confirmation before we close out the issue as user error.

TODO: document some common classes of user error, and what they look like

Situation:replied_timestampis not null

Convert the timestamp to a human-readable time to figure out when we detected a reply.

If the timestamp is after they emailed support, it’s possible that they forwarded the sequence thread to support@zensourcer.com, and then we replied (or Intercom automatically replied), and our logic above (in-reply-to header) caused the sequence to be marked as replied.

You can also check the tracking_event table in Numeracy, searching by person_sequence_info_id. Look for an event_type of REPLIED; the event_payload column should say what happened (one of IN_REPLY_TO, THREAD_ID, FROM_EMAIL, or if the user manually “marked as replied” then USER_MANUAL).

Situation: sequence was sent from an alias, but reply was received by a different user

If we have a situation where a user has a sequence that is sending as someone else, via a Gmail alias, but they aren’t receiving emails to that alias in their mailbox, we will not be able to track replies correctly. We do try to alert users to this situation when picking an alias in the sequence wizard, but our detection of this situation is not perfect. This will happen when the Honeycomb logs show we processed a reply from the candidate but the result was a noop.

Situation: all emails in a sequence bounced

Likely what happened is the user added a CC or BCC address that bounced and we marked the replies as bounced. Check one of the sent_messages for a CC/BCC address and then query for that address using the message query above. If you see bounced replies from the CC/BCC address, ask them to remove that address and attempt to resend the sequence.

Eng playbook

TODO: expand this section

Some more debugging tools available to eng:

The relevant code to look at is PushCursorSyncer.py
scripts/showthread.py will query the Gmail API for the entire thread, given a person_sequence_info_id
scripts/showmessage.py will query the Gmail API for a message, given its gmail_message_id
You can also query the all_email_metadata table, but querying the Gmail API directly (via GraphQL, see above) is probably better because it gives all headers, and because the default push cursor syncer and the metadata push cursor syncers are different, so it’s possible only one of them processed the message. (Honeycomb logs should tell you if this happened; see above.)

Gem-specific reply tracking issue

Due to users who work at Gem having accounts on both the beta and production tiers, there is an issue that occasionally comes up where pushes from Gmail are received by the beta tier, and not the production tier. This results in the sequence not being marked as replied until the next time something forces that user’s push cursor to be synced (perhaps due to sending another sequence). This will not result in sequences continuing to send even after a reply is received, since when we try to send the next stage we will sync the user’s push cursor, even if we haven’t received Gmail pushes yet, which will process the reply.

If you look up the gmail_message_id in Honeycomb and the corresponding row where push_cursor_type is DEFAULT also has an env of beta, that is an indication that this is the root cause of the reply tracking bug you are investigating.

Outlook playbook

The details in the support playbook above generally apply to Outlook as well as Gmail. Some deviations are called out below:
When looking at Honeycomb logs, filter by event_type = msft_push_cursor_syncer_run and narrow down results with user_id. Some helpful fields to look at:

time_diff_between_processing_and_msg lets you know the lag between when replies are received and when they’re processed by Gem
msft_thread_id matches sent_message.sent_gmail_thread_id and can be used to query the Microsoft API for the full conversation thread (more on this below)
msft_message_id is the message id that can be used to query the Microsoft API for details on that specific email

Once you know the thread/message IDs, you can query the Microsoft API using scripts/debug/msft_graph_api.py. Sample queries:

Tip: remember to escape filter inputs using urllib.parse.quote
python scripts/debug/msft_graph_api.py –user-id gem_user_id –api-url beta/me/messages?$filter=sender/emailAddress/address+eq+‘escaped_email_address’ -p

Returns all messages from the email address to the Gem user. Lets you verify if the user actually received the reply that they claim to be missing. If they did not receive the reply, then they might have an inbox misconfiguration issue

python scripts/debug/msft_graph_api.py –user-id gem_user_id –api-url beta/me/messages?$filter=conversationId+eq+‘escaped_msft_thread_id’ -p

Returns all messages in the conversation thread. Use the value from sent_message.sent_gmail_thread_id

If you are able to fetch the missing reply via the Microsoft API, the next step may be verifying if server/MsftPushCursorSyncer.py is processing it correctly

When this playbook applies
Our reply-detection logic
Support playbook
Eng playbook
Gem-specific reply tracking issue
Outlook playbook