LinkedIn API Python Library

here is what we will be covering in this article:

why get data from LinkedIn with python in the first place?
introduction & advantages
download and setup
retrieving profile information
retrieving company details
remove connections
get conversation details
disadvantages and possible alternatives
conclusion & project ideas

why get data from LinkedIn with python in the first place?

well im glad you asked! getting data with python from LinkedIn has many benefits. first of all, it's faster. REALLY FAST compared to a human being. let's say you're an HR manager at a company and you have to retrieve information from about 100+ profiles and create a database. this would take forever if a person were to do this. and on top of this, there is a huge scope of human error. there can always be a mistype or an invalid entry which to some extent might ruin the whole database. that's why automating this process will both save you time and save you from potential errors!

introduction & advantages

we will be using the linkedin-api python library. we already have some advantages of using this library.

it just needs the credentials to your LinkedIn account to work
it is very easy to set up and has a variety of endpoints which we will discuss further in this article # download and setup to download this library, we can simply run this pip command in our terminal pip install linkedin-api now we will create a creds.py file in our project to store our login information!

#use the information associated with your linkedin account
email = '********@gmail.com'
password = '**********'

now we can make a main.py file. we will also be making all of our functions in this file

from linkedin_api import Linkedin
from creds import password, email

api = Linkedin(email, password)

over here we first import the password and email which we saved in the creds.py file and then use that information to log in to LinkedIn!

retrieving profile information

before we start retrieving data from this endpoint, we need to get the public id of the LinkedIn account whose information we want to scrape. this can be done by getting the value between the last 2 slashes in the url of the profile. for example in the case of 'https://www.linkedin.com/in/hardik-singh-0b9ab4225/', the public id would be 'hardik-singh-0b9ab4225' we can now use this to get some data.

info = api.get_profile('hardik-singh-0b9ab4225')
print(info)

this gives us a dictionary that contains useful information like country of origin, birthday, first & last name, url to the profile picture, education, skills, and work experience
the whole list of available keys in this dictionary is given below:

dict_keys(['lastName', 'student', 'geoCountryUrn', 
'geoLocationBackfilled', 'entityUrn', 'headline', 
'industryName', 'locationName', 'geoCountryName',
 'elt', 'profilePictureOriginalImage', 'birthDate', 
 'industryUrn', 'firstName', 'profilePicture', 
 'geoLocation', 'geoLocationName', 'location', 
 'backgroundPicture', 'backgroundPictureOriginalImage', 'displayPictureUrl',
  'profile_id', 'experience', 'skills', 'education'])

you can further get the specific info by getting the value of these dict keys. for example, to retrieve the education of the user, we can do this:

print(info['education'])

#output:
[{'entityUrn': 'urn:li:fs_education:(ACoAADiU3o0BpIMjtuPu3LEwZgu6XmN8_zdNY3s,785025469)', 'school': {'objectUrn': 'urn:li:school:3244736', 'entityUrn': 'urn:li:fs_miniSchool:3244736', 
'active': True, 'schoolName': 'Delhi Public School - India', 'trackingId': 'skI/DaQYTW23ZkqbA4FrPA==', 
'logoUrl': 'https://media-exp2.licdn.com/dms/image/C510BAQFJWBCrV-fjMg/company-logo_'}, 
'timePeriod': {'endDate': {'month': 3, 'year': 2024}, 'startDate': {'month': 3, 'year': 2009}}, 
'schoolName': 'Delhi Public School - India', 'schoolUrn': 'urn:li:fs_miniSchool:3244736'}]

as you can see, it returns all the relevant information related to the relevant field

retrieving company details

we can also retrieve information for any company. this is very similar to retrieving information from user profiles.

company=api.get_company('google')
print(company)

this will also give us a large dictionary. you can get information such as industries in which the company is involved in, staff count, headquarters, funding data, and many more parameters. they are as follows:

dict_keys(['staffingCompany', 'companyIndustries', 'callToAction', 
'staffCount', 'adsRule', 'companyEmployeesSearchPageUrl', 
'viewerFollowingJobsUpdates', 'staffCountRange', 'permissions', 'logo', 
'claimable', 'affiliatedCompaniesResolutionResults', 'specialities', 
'confirmedLocations', 'followingInfo', 'viewerEmployee', 
'affiliatedCompaniesWithEmployeesRollup', 'lcpTreatment', 'affiliatedCompaniesWithJobsRollup', 
'name', 'tagline', '$recipeType', 'fundingData', 'overviewPhoto', 'multiLocaleTaglines', 'description', 
'entityUrn', 'headquarter', 'showcasePagesResolutionResults', 'paidCompany', 'universalName',
 'viewerPendingAdministrator', 'companyPageUrl', 'viewerConnectedToAdministrator', 'affiliatedCompanies', 
 'dataVersion', 'companyType', 'coverPhoto', 'associatedHashtags', 'groups', 'url', 'showcasePages', 
 'claimableByViewer', 'jobSearchPageUrl', 'showcase', 'autoGenerated', 'backgroundCoverImage'])

remove connections

let's say you have a huge list of people you want to un-connect with. this can be done by the following snippet:

api.remove_connection('profile id')

if you have a huge database of profile urls, you can loop over them and then remove your connection with them using the above snippet. note that the profile id can be obtained by the value between the last 2 slashes in the profile url as mentioned earlier in this article

get conversation details

you can also retrieve ALL your conversations with everyone you have talked to on LinkedIn. this returns a huge dictionary with many elements. each of these elements contains data for each conversation in your inbox

print(api.get_conversations())

disadvantages and possible alternatives

however easy to use and cool this library sounds. it still has a couple of disadvantages which are listed below

sometimes LinkedIn might stop you with a captcha test. a solution to this issue is to log out and log back into your browser
this happens because of two-way factorization or rate limiting issues. The rough limit is said to be around 800 requests within a single session some possible alternatives could be trying to make your own scraper using selenium. this would require a lot of work and experimentation but it would also offer customizability and some extra features which would otherwise not be available in this library. these features can literally be anything since you are the creator of the scrapper and you can add/make anything of your choice! again it all comes down to what is the end result you want to achieve and use the tools available to you accordingly.

conclusion & project ideas

this library can be very handy and useful for a large scope of problems. it will save you loads of time and is very easy to use and you can set it up and start playing around with it in a matter of minutes! you can make cool projects like comparing famous people/profiles as a data science project since there is so much data available on LinkedIn. you can also try to make an ai model which can predict weather a person would get a job based on their profile. the model can be trained on employee data which can also be scrapped from LinkedIn itself. these were some fun project ideas. have fun coding :)