Product | Data Refresh |
Expert(s) | Novices: Cynthia Day, Sandi Rail (CRM team) |
Slack channel | |
This article was last verified on | 07/02/2024 |
đ Articles in This Section
Please use the following list to see additional internal articles regarding Data Refresh:
- (Internal) Data Refresh Overview (đyou are here)
đ Customer-facing Resources
- Data Refresh
As of Nov 2023, weâre now using MixRank
Data Refresh keeps Gem profiles up to date by periodically ârefreshingâ basic info like company, title, and school. Itâs especially valuable for Prospect Search, since past prospects change companies/roles over time. Weâre moving from a expensive vendor to a more cost-effective one that has Legalâs stamp of approval.
This transition brings a few changes to our functionality, mostly (but not entirely) positive:
Core Data Refresh: Person profile freshness
- Previously, profiles were refreshed every ~45 days with live data.
- Now, some profiles will be refreshed more often, but most will be refreshed every 90-120 days.
- This will improve over time as we build more functionality. MixRank refreshes some profiles more often than others, and our contract comes with the ability to âprioritizeâ 100k profiles per month to fill in the gaps - we just have to build support for it.
1-click sourcing on LI
- Previously, we powered 1-click sourcing with just our âcacheâ of data-refreshed profiles (10 million total). For all other profiles users had to download the LinkedIn PDF.
- Now weâll use both our cache and MixRankâs DB and live API, potentially offering access to 400+ million profiles.
- Not every profile will be available, and weâre being conservative to make sure we donât surface incomplete profiles (since users can always fall back to the LI PDF).
- Because 1-click sourcing may issue a live API lookup, it can sometimes take a few seconds; I added a loading indicator to improve the UX for this.
Gem Forms / Typeform enrichment
- Previously, Persons added from Gem forms / Typeform were queued for enrichment with live profile data; this could take between 10 seconds and 8 hours. (However this has been broken for the past 9 months!)
- Now weâll enrich Persons from Gem forms / Typeform within a few seconds of them being added, using cached data from our profiles + MixRank DB and the live API as a backup. Weâll use a cached profile if itâs <90 days old, otherwise weâll hit the live API for a fresh profile. Note that not every profile can be enriched; for example some people donât have their LI set to publicly visible.
Who is this live for/who is eligible?
- General data refresh : likely switching to the new system early next week for all teams with Data Refresh enabled
- 1-click sourcing & forms: live for Gem team today; launching to all teams next week if all goes well
Operia
tl;dr
Every night, we send our 3rd party provider (Operia) 300k LinkedIn urls to refresh. Operia sends back the refreshed data for profiles with public work histories once itâs ready. This can take anywhere from a few minutes, to a few hours with a guaranteed maximum delay of 24 hours. Once we have the data for a particular LI profile, we update all Gem prospects â across all teams â who have that LI url.
Which profiles in a Gem instance get refreshed?
Only Gem profiles that have a LinkedIn URL are eligible to be refreshed. This means if you manually created a one-off prospect with only name and email, or maybe created a profile from GitHub but it doesnât have a LinkedIn URL, those wonât be sent for data refresh since Operia canât find information on them. We also arenât able to refresh profiles that donât have public work histories.
What fields are refreshed and how do they get updated?
Company, Title, School, Location, Experience section, Education section
For fields company, title, school and location, data refresh overwrites the original value if there are no manual edits. If a user has manually edited one of the fields, that is given highest priority, effectively turning off data refresh for that field. A user can âturn onâ data refresh for that field again by manually clearing out the value.
For the Experience and Education sections, data refresh never deletes or modifies any of the existing items â we only add new items.
De-dupe criteria:
- Work history: company, title, start date
- Education history: school, field of study, degree
If an entry in Gemâs work history has the exact same company, title and start date as an entry provided by data refresh, they are de-duped and weâll only show one. Similarly, if Gem education item shares the exact same school, field of study and degree as a data refreshed education item, they are de-duped. Otherwise, weâll show them as 2 separate entries.
Note: If no month is provided in work or education history (either from LinkedIn when a user originally added that profile to Gem, or from our data refresh provider), we automatically assume the month to be January.
Why are there similar work or education items showing under Experience or Education that refer to the same job/schooling?
To be safe, all work and education items are shown unless there is an exact match of the de-dupe criteria. For work items, only items with an exact match on company, title and start date are de-duped. For education items, there must be an exact match on school, field of study and degree.
Why is this Gem prospect not refreshed?
Our data refresh provider, Operia, is only able to refresh LI profiles and information set to âpublic,â meaning the information is accessible to a non-logged in user.

LinkedIn public profile settings
See what your own profileâs public settings look like here: https://www.linkedin.com/public-profile/settings
Users have quite a bit of control over what appears on the public version of their profile, meaning that what Operia has access to can vary.
This also means that Gem profiles with private LinkedIn profiles are unable to be refreshed. (i.e. the toggle at the top of the screenshot to the right is turned OFF)
To test this out, open a prospectâs LinkedIn profile in an incognito window in Chrome: what you can see on their profile is what would be available to us through data refresh. This may mean, for example, that a profileâs work history wonât be updated and will look empty/incomplete.
At the moment, Gem prospects with partially public LI profiles do get a âpartialâ refresh. (e.g. maybe past experience is hidden but education is available on the public profile, as the toggles indicate in the screenshot to the right) but will not have a refreshed date. This makes it clear that if a Gem prospect has a ârefreshedâ date, you can trust all the refreshed fields on the profile
I just turned on Data Refresh for a new customer. How long will it take for their Gem instance to be updated?
It depends, but since we can only work through 75k profiles at a time, it could take up to a few days.
For external communication purposes, you can say that youâll turn on data refresh now, and that activation should be complete within a few days.
If customers are wondering how theyâll be able to tell whether data refresh is working for them, you can tell them to check out the âRefreshedâ column in Projects, Sequences, and Prospects tables, or to open a profile in the Gem extension and see whether thereâs a âRefreshed X days agoâ label below the tabs for âOverview,â âInfo,â and â the âInfoâ tab.
How often does data refresh run?
Our stated refresh period changed from 30 - 45 days to 90-120 days. This is because our old vendor would live-query profiles we asked for, at a per-profile cost. So every month weâd send a giant list of LIs and get back refreshes. This was fairly expensive, and cost grew with size of Gemâs customersâ data. Plus they refused to work with our legal dept on terms.
The new vendor maintains a database of 600M profiles which they keep refreshed on their own. Any time any of their customers asks for a refresh, they also update this database. The typical refresh frequency for this database (per the vendor) is 90-120 days. We can also ask for live refreshes for a limited number of profiles per month but we wouldnât want to proactive message this to customers.
How many unique prospects are refreshedin each nightly batch?
~75k, across all of our customers with Data Refresh turned on. The number is approximate due to data pooling and profiles that are unable to be refreshed. Right now, this means we are able to refresh ~2.25M unique prospects every month (75k per day x 30 days), across all of our customers. As customers source more prospects, and more customers get data refresh, we can scale up the number of requests we send each night.
The profiles we send for refresh each night are prioritized according to the criteria below.
How do we decide which LI urls tosendrefresheach night?
The following criteria is used to determine which LI profiles refresh:
- LI url is associated with a person on a team with data refresh
- LI url for profile is not an empty string or âwww.linkedin.com/in/â (i.e. contains no identifying handle)
- LI url is associated with a person that has a sourced timestamp
- Last refresh time for this LI url is blank or 30 days ago. If a LI profile has never been refreshed, the last refresh time will be blank.
The profiles are then ordered by last refreshed time, and the oldest 75k profiles are selected. If a profile has no last refresh time, they are considered the âoldest.â
What is data pooling?
We use âdata poolingâ to update multiple prospect profiles across multiple teams at the same time because they all have the same LI URL. This is possible because multiple teams may have sourced the same person.
- As of Nov 2023, weâre now using MixRank
- Operia
- tl;dr
- Which profiles in a Gem instance get refreshed?
- What fields are refreshed and how do they get updated?
- Why are there similar work or education items showing under Experience or Education that refer to the same job/schooling?
- Why is this Gem prospect not refreshed?
- I just turned on Data Refresh for a new customer. How long will it take for their Gem instance to be updated?
- How often does data refresh run?
- How many unique prospects are refreshedin each nightly batch?
- How do we decide which LI urls tosendrefresheach night?
- What is data pooling?