Logo
  • System Status
  • Gem API
  • Gem Academy
  • What's New?

(Internal) Data refresh overview

Audience
Internal
Displayed Description

Page Type
Article
Product
Data Refresh
Expert(s)
Novices: Cynthia Day, Sandi Rail (CRM team)
Slack channel
This article was last verified on
07/02/2024

🔍 Articles in This Section

Please use the following list to see additional internal articles regarding Data Refresh:

  • (Internal) Data refresh overview (📍you are here)

📖 Customer-facing Resources

  • Data Refresh

As of Nov 2023, we’re now using MixRank

Data Refresh keeps Gem profiles up to date by periodically “refreshing” basic info like company, title, and school. It’s especially valuable for Prospect Search, since past prospects change companies/roles over time. We’re moving from a expensive vendor to a more cost-effective one that has Legal’s stamp of approval.

This transition brings a few changes to our functionality, mostly (but not entirely) positive:

Core Data Refresh: Person profile freshness

  • Previously, profiles were refreshed every ~45 days with live data.
  • Now, some profiles will be refreshed more often, but most will be refreshed every 90-120 days.
  • This will improve over time as we build more functionality. MixRank refreshes some profiles more often than others, and our contract comes with the ability to “prioritize” 100k profiles per month to fill in the gaps - we just have to build support for it.

1-click sourcing on LI

  • Previously, we powered 1-click sourcing with just our “cache” of data-refreshed profiles (10 million total). For all other profiles users had to download the LinkedIn PDF.
  • Now we’ll use both our cache and MixRank’s DB and live API, potentially offering access to 400+ million profiles.
  • Not every profile will be available, and we’re being conservative to make sure we don’t surface incomplete profiles (since users can always fall back to the LI PDF).
  • Because 1-click sourcing may issue a live API lookup, it can sometimes take a few seconds; I added a loading indicator to improve the UX for this.

Gem Forms / Typeform enrichment

  • Previously, Persons added from Gem forms / Typeform were queued for enrichment with live profile data; this could take between 10 seconds and 8 hours. (However this has been broken for the past 9 months!)
  • Now we’ll enrich Persons from Gem forms / Typeform within a few seconds of them being added, using cached data from our profiles + MixRank DB and the live API as a backup. We’ll use a cached profile if it’s <90 days old, otherwise we’ll hit the live API for a fresh profile. Note that not every profile can be enriched; for example some people don’t have their LI set to publicly visible.

Who is this live for/who is eligible?

  • General data refresh : likely switching to the new system early next week for all teams with Data Refresh enabled
  • 1-click sourcing & forms: live for Gem team today; launching to all teams next week if all goes well

Operia

tl;dr

Every night, we send our 3rd party provider (Operia) 300k LinkedIn urls to refresh. Operia sends back the refreshed data for profiles with public work histories once it’s ready. This can take anywhere from a few minutes, to a few hours with a guaranteed maximum delay of 24 hours. Once we have the data for a particular LI profile, we update all Gem prospects — across all teams — who have that LI url.

Which profiles in a Gem instance get refreshed?

Only Gem profiles that have a LinkedIn URL are eligible to be refreshed. This means if you manually created a one-off prospect with only name and email, or maybe created a profile from GitHub but it doesn’t have a LinkedIn URL, those won’t be sent for data refresh since Operia can’t find information on them. We also aren’t able to refresh profiles that don’t have public work histories.

What fields are refreshed and how do they get updated?

Company, Title, School, Location, Experience section, Education section

For fields company, title, school and location, data refresh overwrites the original value if there are no manual edits. If a user has manually edited one of the fields, that is given highest priority, effectively turning off data refresh for that field. A user can “turn on” data refresh for that field again by manually clearing out the value.

For the Experience and Education sections, data refresh never deletes or modifies any of the existing items — we only add new items.

De-dupe criteria:

  • Work history: company, title, start date
  • Education history: school, field of study, degree

If an entry in Gem’s work history has the exact same company, title and start date as an entry provided by data refresh, they are de-duped and we’ll only show one. Similarly, if Gem education item shares the exact same school, field of study and degree as a data refreshed education item, they are de-duped. Otherwise, we’ll show them as 2 separate entries.

Note: If no month is provided in work or education history (either from LinkedIn when a user originally added that profile to Gem, or from our data refresh provider), we automatically assume the month to be January.

Why are there similar work or education items showing under Experience or Education that refer to the same job/schooling?

To be safe, all work and education items are shown unless there is an exact match of the de-dupe criteria. For work items, only items with an exact match on company, title and start date are de-duped. For education items, there must be an exact match on school, field of study and degree.

Why is this Gem prospect not refreshed?

Our data refresh provider, Operia, is only able to refresh LI profiles and information set to “public,” meaning the information is accessible to a non-logged in user.

image

LinkedIn public profile settings

See what your own profile’s public settings look like here: https://www.linkedin.com/public-profile/settings

Users have quite a bit of control over what appears on the public version of their profile, meaning that what Operia has access to can vary.

This also means that Gem profiles with private LinkedIn profiles are unable to be refreshed. (i.e. the toggle at the top of the screenshot to the right is turned OFF)

To test this out, open a prospect’s LinkedIn profile in an incognito window in Chrome: what you can see on their profile is what would be available to us through data refresh. This may mean, for example, that a profile’s work history won’t be updated and will look empty/incomplete.

At the moment, Gem prospects with partially public LI profiles do get a “partial” refresh. (e.g. maybe past experience is hidden but education is available on the public profile, as the toggles indicate in the screenshot to the right) but will not have a refreshed date. This makes it clear that if a Gem prospect has a “refreshed” date, you can trust all the refreshed fields on the profile

I just turned on Data Refresh for a new customer. How long will it take for their Gem instance to be updated?

It depends, but since we can only work through 75k profiles at a time, it could take up to a few days.

For external communication purposes, you can say that you’ll turn on data refresh now, and that activation should be complete within a few days.

If customers are wondering how they’ll be able to tell whether data refresh is working for them, you can tell them to check out the “Refreshed” column in Projects, Sequences, and Prospects tables, or to open a profile in the Gem extension and see whether there’s a “Refreshed X days ago” label below the tabs for “Overview,” “Info,” and “ the “Info” tab.

How often does data refresh run?

Our stated refresh period changed from 30 - 45 days to 90-120 days. This is because our old vendor would live-query profiles we asked for, at a per-profile cost. So every month we’d send a giant list of LIs and get back refreshes. This was fairly expensive, and cost grew with size of Gem’s customers’ data. Plus they refused to work with our legal dept on terms.

The new vendor maintains a database of 600M profiles which they keep refreshed on their own. Any time any of their customers asks for a refresh, they also update this database. The typical refresh frequency for this database (per the vendor) is 90-120 days. We can also ask for live refreshes for a limited number of profiles per month but we wouldn’t want to proactive message this to customers.

How many unique prospects are refreshedin each nightly batch?

~75k, across all of our customers with Data Refresh turned on. The number is approximate due to data pooling and profiles that are unable to be refreshed. Right now, this means we are able to refresh ~2.25M unique prospects every month (75k per day x 30 days), across all of our customers. As customers source more prospects, and more customers get data refresh, we can scale up the number of requests we send each night.

The profiles we send for refresh each night are prioritized according to the criteria below.

How do we decide which LI urls tosendrefresheach night?

The following criteria is used to determine which LI profiles refresh:

  • LI url is associated with a person on a team with data refresh
  • LI url for profile is not an empty string or ‘www.linkedin.com/in/’ (i.e. contains no identifying handle)
  • LI url is associated with a person that has a sourced timestamp
  • Last refresh time for this LI url is blank or 30 days ago. If a LI profile has never been refreshed, the last refresh time will be blank.

The profiles are then ordered by last refreshed time, and the oldest 75k profiles are selected. If a profile has no last refresh time, they are considered the “oldest.”

What is data pooling?

We use “data pooling” to update multiple prospect profiles across multiple teams at the same time because they all have the same LI URL. This is possible because multiple teams may have sourced the same person.

  • As of Nov 2023, we’re now using MixRank
  • Operia
  • tl;dr
  • Which profiles in a Gem instance get refreshed?
  • What fields are refreshed and how do they get updated?
  • Why are there similar work or education items showing under Experience or Education that refer to the same job/schooling?
  • Why is this Gem prospect not refreshed?
  • I just turned on Data Refresh for a new customer. How long will it take for their Gem instance to be updated?
  • How often does data refresh run?
  • How many unique prospects are refreshedin each nightly batch?
  • How do we decide which LI urls tosendrefresheach night?
  • What is data pooling?
Logo

Products

People

Outreach

ATS

Scheduling

Talent Marketing

Talent Compass

Templates

Resources

Compliance

Resource Center

Blog

Events

About Gem

About Us

Careers

Contact Us

X/Twitter

LinkedIn

YouTube