Find Jobs
Hire Freelancers

Parsing Text File (Python) (1. Locate Table Based on Keywords, 2. Extract Table Info)

$15-25 USD / hour

Imekamilika
Imechapishwa over 7 years ago

$15-25 USD / hour

I would like to obtain a program that extract a specific table data from text files. Most of the text content is in html, the remaining are not. To achieve that, you need: 1) Locate the table that I want. The table I want is the "Security Ownership For certain beneficial owners". However, the name of the table can change. You will need to write the program to find ("ownership" and "security") or ("ownership" and "stock") to locate the table. The key words ownership, security/stock/securities, beneficial/beneficiary sometimes do not appear in the same row. 2) Extract the table data to csv (preferably using python. You could manually do it as well. There will be about 2000 files if you do it manually) I have attached 5 text files in the attachment as well as the output file. Please see the attachment. The output for the 5 text files are also pasted below: 1st Example [login to view URL] none 2nd Example [login to view URL] none 3rd Example Input Number of Shares of  Shares which may  Common Stock  be Acquired within Percent Name and Address Beneficially Owned 60 Days(1) Owned(1),(2) Genstar Capital LLC(3) 3,534,074 1,335,000 31.5 % Jean-Pierre L. Conte(4) 3,473,407 1,311,000 31.1 % Oxford BioScience Partners IV L.P.(5) 717,293 ? 7.3 % Bio-Rad Laboratories,ےInc.(6) 665,639 ? 6.7 % Gabelli Asset Management Inc.(7) 537,521 ? 5.4 % Terrance J. Bieker 160,498 142,498 1.6 % Kevin J. Reagan 116,832 111,331 1.2 % John L. Zabriskie, Ph.D. 60,500 45,500 * David J. Moffa, Ph.D.(8) 56,350 48,500 * John R. Overturf,ےJr. 43,600 36,000 * Alan I. Edrick 40,916 35,416 * Robert J. Weltman(9) 27,333 24,000 * All directors and executive officers as a group (eight persons)(10) 3,979,436 1,754,245 34.2 % 4th Example [login to view URL] Name and Address of Beneficial Owner Number of Percentage Shares of Class(1) Larry S. Flax(2) 2082053 8.1 %  Richard L. Rosenfield(3) 2118017 8.3 %  Leslie E. Bider(4) 8852 0 %  Marshall S. Geller(5) 16152 0.1 %  Charles G. Phillips(6) 133378 0.5 %  Alan I. Rothenberg(7) 46520 0.2 %  Thomas P. Beck(8) 138750 0.6 %  Susan M. Collyns(9) 463203 1.9 %  Sarah A. Goldsmith-Grover(10) 160211 0.6 %  Steven E. Rich(11) 18794 0.1 %  BlackRock Inc.(12) 1723416 7 %  40 East 52nd Street New York, NY 10022 Fisher Investments(13) 1249015 5.1 %  13100 Skyline Boulevard Woodside, CA 94062-4527 The TCW Group, Inc.(14) 2144619 8.7 %  865 South Figueroa Street Los Angeles, CA 90017 Thompson, Siegel & Walmsley, LLC(15) 1685519 6.9 %  6806 Paragon Place, Suite 300 Richmond, VA 23230 All directors and executive officers as a group (10 persons)(16) 5185930 18.9 % 
Kitambulisho cha mradi: 11086582

Kuhusu mradi

35 mapendekezo
Mradi wa mbali
Inatumika 8 yrs ago

Unatafuta kupata pesa?

Faida za kutoa zabuni kwenye Freelancer

Weka bajeti yako na muda uliopangwa
Pata malipo kwa kazi yako
Eleza pendekezo lako
Ni bure kujiandikisha na kutoa zabuni kwa kazi
Imetolewa kwa:
Picha ya Mtumiaji
Hello Sir, Give me full detail about project. If you need, will show sample now. I am waiting for your message. Hope we can meet here. Thanks.
$25 USD ndani ya siku 10
5.0 (130 hakiki)
6.5
6.5
35 wafanyakazi huru wana zabuni kwa wastani $20 USD/saa kwa kazi hii
Picha ya Mtumiaji
Hi, I have gone through the files. I am good at Data Entry and Excel. I can make it via Data Entry. I will copy paste the tables. Looking forward to work on this.
$20 USD ndani ya siku 10
5.0 (878 hakiki)
8.0
8.0
Picha ya Mtumiaji
A proposal has not yet been provided
$21 USD ndani ya siku 10
4.9 (243 hakiki)
7.9
7.9
Picha ya Mtumiaji
Experienced Python Expert FREELANCER HERE to work for your project. Let's discuss more and finalize the project and cost. Feel free to ask me questions, if any. I look forward to work with you. You can also contact me through Skype. Have a good day and stay fine :-) Sincere regards, Jubair
$20 USD ndani ya siku 10
4.9 (327 hakiki)
7.8
7.8
Picha ya Mtumiaji
Hello I'm interesting your project very well I'm a Good Python, Scrap, Excel, Math, Algorithm expert. I m quite well experienced in these jobs. Let's go ahead with me I want to service for you continously. Thanks
$21 USD ndani ya siku 10
5.0 (39 hakiki)
7.0
7.0
Picha ya Mtumiaji
Hi I have a team of 8 members, expert in web scraping & excel work. I understand the requirements of your project and I can assure you of completion with desired quality of work. I have good skills and experience in ♦ web scraping, ♦ find contact information, ♦ phone , e-mail searching through “GOOGLE OR SOCIAL MEDIA OR GIVEN URL” . I can do this project for you quickly and successfully . I'll work for the lowest price because I want to build a reputation on freelancer.com . Please, give me a chance to show my quality and help me to build a good reputation for my feature jobs. I am a new freelancer but I have long time experience with Microsoft Office (Word, Excel),Data mining , web search etc .
$22 USD ndani ya siku 10
5.0 (196 hakiki)
7.0
7.0
Picha ya Mtumiaji
all 5 star reviews for Python projects with years of experience in Python
$15 USD ndani ya siku 5
4.9 (61 hakiki)
5.7
5.7
Picha ya Mtumiaji
Hey there... I had a look at your examples and the corresponding output tables...I can do this in Java or C# (not Python!)... Lets agree to a fixed price instead of hourly ? $120 in 3 days...DEAL ?....Please reply.. We can discuss further and hopefully get it started soon... Thank you.. !
$15 USD ndani ya siku 10
4.8 (157 hakiki)
5.8
5.8
Picha ya Mtumiaji
Hi, I am a Python developer with proven and extensive experience writing Python scripts used to parse HTML markup with demonstrated quick turnaround. This is normally done as part of web scraping projects using Beautiful Soup library (Python.) My APPROACH I can write Python code that can: 1. Locate relevant table based on given keywords: -- Case 1 (HTML files - 3 & 4): Use paragraphs (elements with tag <p>) to search for keywords (so that search is done on text instead of table rows) Then, locate next sibling table -- Case 2 (Text files - 5): Use elements with tag <PAGE> to search for keywords Then, locate element with tag <TABLE> inside. 2. Extract Table Info: -- Case 1 (HTML file): Regardless of table format or number of columns, there is actually a consistent structure inside each <tr> element (table row) for both columns names, and data rows (ie. same number of <td> elements). Script will exclude non-breaking space (" ") character. -- Case 2 (Text file): Read data rows line by line. 3. Generate Excel sheet with relevant rows as output. Deliverable is a Python script that can be run on schedule or on demand. Hours of work: 8 Hr Project Duration: Max. 3 days Total Cost: 190 USD Look forward to hearing from you. Kind regards, Yordan B
$25 USD ndani ya siku 25
4.9 (24 hakiki)
5.9
5.9
Picha ya Mtumiaji
**Fast & Efficient Delivery** Greetings! Hi, I'm computer science graduate with more than 2 years of experience in Application development, I've read all details and also files that you attached here (input & output). I will do this task by first extracting data from files and parse for the required table on search entire html file for each account entry and remove duplication if found, after doing this i will write back that data to the xls. I will do this task in C# Language, that Application Interface can be Desktop or Console Application. Note! I've already worked as parsing document file parsing so it will be easy task for me My Job will speak for itself. Looking forward for consideration
$15 USD ndani ya siku 15
4.9 (34 hakiki)
5.0
5.0
Picha ya Mtumiaji
My name is Mike and I’m from UK. I work with individual clients and also provide outsourcing services for a number of UK and USA based agencies. Your project description sounds interesting to me and I do have skills & experience that are required to complete this project. I can show you some examples of my work. Please contact me to discuss your project.
$22 USD ndani ya siku 10
5.0 (1 hakiki)
3.2
3.2
Picha ya Mtumiaji
This looks like a fun project. All tables seem to be within some html code. The files do contain some extra text. The extra text seems to not be relevant. The plan would be: 1. Strip extra text using regex 2. Convert html to tree using lxml 3. Use xpath to locate tables 4. Extract table information using xpath 5. Use csv module to write to csv file (one for each processed file) 6. Merge all files into one (if necessary) 7. Convert final file to xlsx (if necessary) Milestone 1: Result for first 100 files Milestone 2: Result for all files All files to be provided. Best, Tammo
$27 USD ndani ya siku 10
5.0 (3 hakiki)
2.8
2.8
Picha ya Mtumiaji
Hello, I have read your description very carefully. I am very good at parsing and python scripting. I can deliver the result as per your requirement. Price for whole project (both task included) : 200 USD Lets discuss more over chat. Looking forward to work with you. -Viral Parekh
$16 USD ndani ya siku 10
5.0 (2 hakiki)
2.8
2.8
Picha ya Mtumiaji
I can get this done quickly using python Pandas tools. You can count on me to deliver quickly and efficiently.
$22 USD ndani ya siku 10
5.0 (1 hakiki)
2.2
2.2
Picha ya Mtumiaji
Happy to help
$15 USD ndani ya siku 20
5.0 (2 hakiki)
2.2
2.2
Picha ya Mtumiaji
You pay me after checking the work. Hi I have read out all the details given in your project and I am fully capable to deliver you this project with 100% accuracy. I have completed many projects related to this in the past. Why you do not knock me here for further detail? You can release the milestone after checking the work.
$16 USD ndani ya siku 10
5.0 (4 hakiki)
2.2
2.2
Picha ya Mtumiaji
Hello, I'm a recent graduate about to begin a program working in data science. For the past year I have been working extensively in my Python, performing a lot of research analysis. This required me to effectively learn to parse through text files and extract the information I need both quickly and cleanly. Using these skill, combined with some regex, and my familiarity with html, I could finish easily do this job. I look forward to hearing from you, Charlie
$15 USD ndani ya siku 5
5.0 (1 hakiki)
0.4
0.4
Picha ya Mtumiaji
I have almost 5 years of experience writing Engineering tools in python. During this time I had to parse many files, so I am well acquainted with your problem. Thanks
$16 USD ndani ya siku 10
0.0 (0 hakiki)
0.0
0.0
Picha ya Mtumiaji
i code in python regularly, python is such a great choice for this kind of text processing task, would like to give it a try.
$22 USD ndani ya siku 10
0.0 (0 hakiki)
0.0
0.0
Picha ya Mtumiaji
Please see summary about myself. But I am very easy to work with and am detail-oriented. I don't need a lot of guidance, mostly just an outline of what needs to be done. At my last job, which I quit after getting a new job, all I did was write scripts in Python. I did similar things to get accomplish your purpose on this project. For my hourly wage, I just put down what I was getting paid previously for doing this kind of work, and I am more experienced now than before.
$21 USD ndani ya siku 10
0.0 (0 hakiki)
0.0
0.0
Picha ya Mtumiaji
I have been working as a Quantitative Researcher in finance industry for 7 years and have done lots of projects like this. For example, I worked on an ETF strategy before and had to scrape those ETF websites, i.e. parse the html file, locate the data table, extract the data and finally store the data in our database. I'm a Python expert and very skillful with libs like pandas, requests, beatifulsoup and so on. I will deliver in a very effective and efficient fashion.
$22 USD ndani ya siku 10
0.0 (0 hakiki)
0.0
0.0

Kuhusu mteja

Bedera ya UNITED STATES
Tempe, United States
5.0
8
Njia ya malipo imethibitishwa
Mwanachama tangu Sep 28, 2012

Uthibitishaji wa Mteja

Asante! Tumekutumia kiungo cha kudai mkopo wako bila malipo kwa barua pepe.
Hitilafu fulani imetokea wakati wa kutuma barua pepe yako. Tafadhali jaribu tena.
Watumiaji Waliosajiliwa Jumla ya Kazi Zilizochapishwa
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Onyesho la kukagua linapakia
Ruhusa imetolewa kwa Uwekaji wa Kijiografia.
Muda wako wa kuingia umeisha na umetoka nje. Tafadhali ingia tena.