Find Jobs
Hire Freelancers

Parsing HTML files that may not comply with HTML DOM

$10-30 USD

Imefungwa
Imechapishwa about 5 years ago

$10-30 USD

Kulipwa wakati wa kufikishwa
I have 1TB of htm files. These files have been collected from the same 30 websites since 2011. When I'm trying to write a parser for these (python,nlp,beautifulsoup,decruft,etc.) large portions of known text does not appear. Upon review I notice that some pages are putting the actual page content into json strings that live within javascript elements. This makes the parsing process very cumbersome for me. One example of this behavior is the current [login to view URL] index page. This project will have 1 deliverable. A conversation with me to discuss modern technologies and techniques that can be used to parse these pages, store the results, and make it available for searching. After our conversation a second project may be opened on freelancer exclusively for the winning bidder to perform further work related to the parsing of HTML based on their suggestions provided from our discussion on this project.
Kitambulisho cha mradi: 18626886

Kuhusu mradi

5 mapendekezo
Mradi wa mbali
Inatumika 5 yrs ago

Unatafuta kupata pesa?

Faida za kutoa zabuni kwenye Freelancer

Weka bajeti yako na muda uliopangwa
Pata malipo kwa kazi yako
Eleza pendekezo lako
Ni bure kujiandikisha na kutoa zabuni kwa kazi
5 wafanyakazi huru wana zabuni kwa wastani $43 USD kwa kazi hii
Picha ya Mtumiaji
This can be done by combining different techniques including regular expressions. I have huge experience with parsing HTML files. Ready to start immediately. Please contact with details if you are interested. Thank you, zeke.
$30 USD ndani ya siku 1
4.5 (103 hakiki)
7.3
7.3
Picha ya Mtumiaji
Hi, I am a web scrapping expert. I have worked extensively on NLP. What i think you need is combination of both of these in order to parse your data correctly. Let us discuss further details in chat. I will be glad to work with you. Thanks, Shubham Sharma
$100 USD ndani ya siku 1
4.7 (24 hakiki)
4.6
4.6
Picha ya Mtumiaji
I'm an expert Python developer and I've done web scraping. For that reason I think I'm the best candidate for this work... Have you tried to run the Javascript from the HTML? This is a good solution but could be slower...
$30 USD ndani ya siku 1
3.2 (4 hakiki)
3.2
3.2

Kuhusu mteja

Bedera ya UNITED STATES
Troy, United States
5.0
12
Njia ya malipo imethibitishwa
Mwanachama tangu Mei 31, 2006

Uthibitishaji wa Mteja

Asante! Tumekutumia kiungo cha kudai mkopo wako bila malipo kwa barua pepe.
Hitilafu fulani imetokea wakati wa kutuma barua pepe yako. Tafadhali jaribu tena.
Watumiaji Waliosajiliwa Jumla ya Kazi Zilizochapishwa
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Onyesho la kukagua linapakia
Ruhusa imetolewa kwa Uwekaji wa Kijiografia.
Muda wako wa kuingia umeisha na umetoka nje. Tafadhali ingia tena.