I'm new to trying BeautifulSoup after reading such great things about it, but I frustratingly hit a wall very early.
My idea was to get data from Marketwatch and Google Finance and compare what should be the same analytic. My first case was to get the EPS estimates and actuals for the past 4 years/quarters & the coming quarter/year. I started with Marketwatch, and it's contained in this page (for AAPL).
https://www.marketwatch.com/investing/stock/aapl/analystestimates
However, after adding my actual Chrome User-Agent as a User-Agent to my header, it returns a page that says I need to upgrade my browser. I then blindly copied all of my Chrome request headers and tried again, same result.
from bs4 import BeautifulSoupimport requestsurl = "https://www.marketwatch.com/investing/stock/inmd/analystestimates"headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7','Accept-Encoding': 'gzip, deflate, br','Accept-Language': 'en-US,en;q=0.9','Cache-Control': 'max-age=0','Cookie': 'letsGetMikey=enabled; refresh=off; letsGetMikey=enabled; refresh=off; gdprApplies=false; ab_uuid=55e2945f-a2ec-43a0-a847-e8816e3afdda; dnsDisplayed=undefined; ccpaApplies=false; signedLspa=undefined; _pubcid=fa0bf288-7d62-4110-8372-1b024d6b4f7d; _sp_su=false; ccpaUUID=326cb90b-0dd7-4910-8d9d-5ff61c05bae5; permutive-id=af5bf5ac-6fdf-42cf-93e8-f723fa40f521; vcdpaApplies=false; regulationApplies=gdpr%3Afalse%2Ccpra%3Afalse%2Cvcdpa%3Afalse; usr_bkt=K8ZnHJ5Uyn; _mfuuid_=cc89c62a-e0bd-47b4-b5e1-b69f5ec44604; djvideovol=1; AMCVS_CB68E4BA55144CAA0A4C98A5%40AdobeOrg=1; _pnvl=false; pushly.user_puuid=SgW5vS3oF2lfhrcel0zRmnFvK4SgPd6F; _rdt_uuid=1697043746155.909bd6b4-3c1f-4ede-ae69-19c3fbcdaa48; s_cc=true; _pcid=%7B%22browserId%22%3A%22lnm012x2m39y5ji8%22%7D; cX_P=lnm012x2m39y5ji8; _pctx=%7Bu%7DN4IgrgzgpgThIC4B2YA2qA05owMoBcBDfSREQpAeyRCwgEt8oBJAEzIE4AmHgZi4CsvAIwB2DqIAMADkHTRvEAF8gA; _ncg_domain_id_=63b14ff9-3254-4fd6-bbb3-54f2adb4bf7a.1.1697043745806.1760115745806; _dj_sp_id=6daf2311-13b7-45df-b86c-b136409b9df8; _ncg_g_id_=2a1dc274-c24c-4613-bc6c-85cbe03b9692.1.1697043745.1760115745806; cX_G=cx%3A1v0qwnukyiwjo3v0qptcgcbksq%3A1si5pep8z172j; _pnlspid=11018; _ncg_id_=63b14ff9-3254-4fd6-bbb3-54f2adb4bf7a; _pnss=blocked; letsGetMikey=enabled; wsjregion=na%2Cus; _cls_v=bc22f7d5-a749-4df7-b90a-5b281bd95466; _cls_s=8c158253-143e-4031-8bce-ca64366bc44a:0; cls_e=8c158253-143e-4031-8bce-ca64366bc44a:244350746520436; s_tp=4367; _dj_id.cff7=.1697043747.44.1704301559.1704246489.1dc1df74-9643-41fb-80f4-eed69d609e2a; _ncg_sp_id.f57d=63b14ff9-3254-4fd6-bbb3-54f2adb4bf7a.1697043747.44.1704301559.1704246490.82d9d318-c56b-464a-ac34-10e617082e7c.55b6e9a4-459c-4a61-b6aa-bf5b1b137ef7.e7a54b7a-0e3d-405e-9ccd-475dd2a18017.1704301133757.6; s_ppv=MW_Summaries_Economy%2520%2526%2520Politics_U.S.%2520Economic%2520Calendar%2C36%2C27%2C1554; fullcss-home=site-37758705d2.min.css; refresh=off; fullcss-quote=quote-4f7c97120b.min.css; kayla=g=9b6d321ca05b40eb84182ab4e98ab83e; mw_loc=%7B%22Region%22%3A%22MI%22%2C%22Country%22%3A%22US%22%2C%22Continent%22%3A%22NA%22%2C%22ApplicablePrivacy%22%3A0%7D; icons-loaded=true; fullcss-section=section-4063dd6ae2.min.css; recentqsmkii=Stock-US-INMD|Stock-US-HPE|Stock-US-UNIT|Stock-US-MBLY|Stock-US-A|Stock-US-AEHR|Stock-US-AAPL|CloseEndFund-US-FTHY|CloseEndFund-US-ECAT|CloseEndFund-US-FSCO|PreferredStock-US-MGR|Stock-CA-LTHM|Stock-US-CVX; mw_bulletins=SE58ay; _pubcid_cst=kSylLAssaw%3D%3D; DJSESSION=country%3Dus%7C%7Ccontinent%3Dna%7C%7Cregion%3Dmi; usr_prof_v2=eyJpYyI6NH0%3D; spotim_visitId={%22creationDate%22:%22Wed%20Jan%2010%202024%2017:15:19%20GMT-0500%20(Eastern%20Standard%20Time)%22%2C%22duration%22:0}; utag_main=v_id:018b1fb0868d0007c45f252153a90506f003506700a83$_sn:47$_ss:1$_st:1704926728972$vapi_domain:marketwatch.com$ses_id:1704924928972%3Bexp-session$_pn:1%3Bexp-session$_prevpage:MW_Company%20Analyst%20Estimates%3Bexp-1704928528979; AMCV_CB68E4BA55144CAA0A4C98A5%40AdobeOrg=1585540135%7CMCIDTS%7C19733%7CMCMID%7C89889684677145527779162982137623154110%7CMCAID%7CNONE%7CMCOPTOUT-1704932129s%7CNONE%7CvVersion%7C4.4.0%7CMCAAMLH-1704905931%7C7%7CMCAAMB-1704924927%7Cj8Odv6LonN4r3an7LhD3WZrU1bUpAkFkkiY1ncBR96t2PTI%7CMCSYNCSOP%7C411-19733','Dnt': '1','Referer': 'https://www.marketwatch.com/investing/stock/inmd/analystestimates','Sec-Ch-Ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"','Sec-Ch-Ua-Mobile': '?0','Sec-Ch-Ua-Platform': '"Windows"','Sec-Fetch-Dest': 'document','Sec-Fetch-Mode': 'navigate','Sec-Fetch-Site': 'same-origin','Sec-Fetch-User': '?1','Upgrade-Insecure-Requests': '1','User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',}html_content = requests.get(url, headers=headers).textsoup = BeautifulSoup(html_content, "lxml")print (soup)
yields
...<p class="text">This browser is no longer supported at MarketWatch. For the best MarketWatch.com experience, please update to a modern browser.</p></div><div class="group group--buttons"><a class="btn btn--primary" href="https://www.google.com/chrome/">Chrome</a><a class="btn btn--primary" href="https://support.apple.com/downloads/safari">Safari</a><a class="btn btn--primary" href="https://www.mozilla.org/en-US/firefox/">Firefox</a><a class="btn btn--primary" href="https://www.microsoft.com/en-us/windows/microsoft-edge">Edge</a>...
:sound of brakes screeching:
Am I missing an important step here? Is BeautifulSoup the wrong tool for the job?