Find Jobs
Hire Freelancers

Machine Learning / TensorFlow / OCR Document Classification and Data Extraction

€750-1500 EUR

Imekamilika
Imechapishwa over 7 years ago

€750-1500 EUR

Kulipwa wakati wa kufikishwa
Looking for an experienced Machine Learning specialist who can help us build an advanced document categorisation and data extraction application using Google’s ML and TensorFlow if determined suitable. We process tens of thousands of documents each day from a library of 100 different categories, of which each could have 1000 different variations. Currently, we use traditional expression based document categorisation and template based data extraction, which is cumbersome to setup and manage. As this is a pilot project it will be built in stages starting with 2 basic Identity documents, a drivers licence and passport. The aim will be to receive an image of the document and the OCR extracted unstructured text file, from these we will want to determine the document type and then structure the text into an organised JSON format. Ideally the application should be able to receive any document, determine which ones are identity documents, determine what type of identity document they are, and extract the text into a structured format. Please outline your experience with Machine Learning and Tensor flow to help assist in candidate selection for this project. Once the pilot project is successful an opportunity could be available to build a large project. Questions to respond to in your proposal: - How would you approach this project? - What roadblocks do you see in this project?
Kitambulisho cha mradi: 11290414

Kuhusu mradi

8 mapendekezo
Mradi wa mbali
Inatumika 8 yrs ago

Unatafuta kupata pesa?

Faida za kutoa zabuni kwenye Freelancer

Weka bajeti yako na muda uliopangwa
Pata malipo kwa kazi yako
Eleza pendekezo lako
Ni bure kujiandikisha na kutoa zabuni kwa kazi
Imetolewa kwa:
Picha ya Mtumiaji
I registered at freelancer specifically in response to this posting. I"m an entrepeneur working in the last stages of a self-funded fintech project, needing to generate income to push through the last few months of development. I've been writing empirical algorithms for data collection, feature extraction, and analysis since 2010. My first major project was a smart OCR utility for screen-scraping online gaming clients, parsing text and translating to a database in real-time. Currently I'm working on a end-to-end learner for futures market portfolio optimization by swarming recurrent neural networks. My everyday work revolves around deep-learning and statistical optimization. I prefer Keras, an ML library wrapping Tensorflow and Theano, in most cases. Its an approachable, well-documented and well-maintained library with lots of power; and it's framework can be easily extended when circumstances require non-standard approaches. Your pilot project needs this: convolutional neural nets for document classification, sequence-to-sequence (bidirectional) recurrent neural nets for parsing text files, Bayesian hyper-parameter search with Mint for best general settings with heterogeneous data . Your obstacle is document standardization, workflow, and preprocessing. SciKit Learn provides easily open-source facilities for this purpose. I'd be happy to discuss further and provide code samples with tests upon request. Best of luck on this project, Josh W.
€1,111 EUR ndani ya siku 15
5.0 (1 hakiki)
4.3
4.3
Picha ya Mtumiaji
Hello. I have a masters degree in AI and have worked on Image Processing algorithms and tools for more than 7 years. I'm developer of this app: [login to view URL] and have worked with the latest algorithms and tools for OCR, Image detection, feature extraction, SIFT, SURF, Deep Learning, ... As you mentioned I think we can use TensorFlow and also Tesseract OCR in this project. Also Microsoft has a good open source OCR tool. This API is using it: [login to view URL] I think we first can train a classifier algorithm to detect the type of document. We can use some features like Image features, raw texts from OCR algorithm, ... to detect the type and then try to parse the OCR texts for each type and try to convert it to JSON. I assume that you want to build something like this: [login to view URL] I have worked with it in one of my projects before. One of the issues is that the accuracy of OCR tool/algorithm is not 100%. Especially for noisy images and to fix this we can use some filters for the different fields to make sure that the output is fine. For example we can define that all characters of a field should be digits or length of the field should be 10, ... Please let me know if you have any questions. Thanks, Helmot
€777 EUR ndani ya siku 20
4.8 (148 hakiki)
7.7
7.7
8 wafanyakazi huru wana zabuni kwa wastani €1,196 EUR kwa kazi hii
Picha ya Mtumiaji
I have done MS Software Engineering. I had a course on DATA ENGINEERING and Artificial Intelligence. I know all data mining techniques (Predication & Classification) and data analysis techniques. I have worked on K-mean, ID3, Bayesian theorem, confusion matrix, Hungarian algo and so on .My research was on Rough Set Theory. Tools I uses are Weka, Matlab, RapiMiner, SPSS,Java, R programming and Excel . Please see my profile and reviews as well. Thanks
€1,000 EUR ndani ya siku 20
4.9 (206 hakiki)
7.2
7.2
Picha ya Mtumiaji
Hi, have a good day. The interest in this project is about experience in Data Analysis by this platform, although, all my experience has been in Industrial Analysis but now I´m searching for a new challenge that can give me the opportunity to expand my knowledge in this area. I didn´t find a roadblock for this project, the low experience that I have is a competition for me and my ability to learn quickly is an advantage to tackle this project. At present Im only studying my master degree, in advanced Math and Computing, and this give the opportunity and time to take this job. I hope to take the time to know me. Regards.
€1,250 EUR ndani ya siku 10
0.0 (0 hakiki)
0.0
0.0
Picha ya Mtumiaji
Hello, I understood the initial scope of this project. Although i want to discuss further this job in order to prepare the final concept for this project. After Complete discussion over the call or in chat, i will prepare following things for you - Technical Project Proposal - Flow chart for this Project - Execution plan (Step by step procedure with explanation how and at what that we are going to execute a particular task)
€1,764 EUR ndani ya siku 40
0.0 (0 hakiki)
0.0
0.0
Picha ya Mtumiaji
Hi, I am about to finish my Data Science certification, along with my masters from Harvard University and have done significant amount of projects and coursework along the way related to ML. To add to that I have around 7+ yeas of programming experience and would be able to handle your project efficiently. I have done projects related to text classification, NLP, in both supervised and unsupervised learning scenarios. I feel confident that I would be able to solve this problem. Here are your answers - - How would you approach this project? Without going into too much details, here is a brief summary of the approach - First step is to classify the document, without looking at the raw data, I am assuming it would be easier to use the OCR to classify the document, rather than using the image. We can try supervised/unsupervised approaches here based on exactly what data-sets you have available. Obviously we will have to clean and normalize the data and get onto feature identification before moving forward. The next step would be converting the data into structured format. I would really need to look into sample data sets to give any opinion on that. one approach could be use NLP to tokenize the data set and then identify certain key tokens. Finally, the coding infrastructure should be modular to allow addition of more documents in future without messing everything up. - What roadblocks do you see in this project? converting the unstr data into str data (char limit reached)
€1,666 EUR ndani ya siku 20
0.0 (0 hakiki)
0.0
0.0
Picha ya Mtumiaji
Hello ► We follow Agile Scrum. ► We can help you on it. ► Please ping me back to discuss roadblocks. ► For your note, we have team of 85+ in-house developers skilled with all major technologies including RoR, PHP, .NET, Android, iOS along with Designers and SEO... ||| ❰1❱ 2000+ Successful Project Deliveries. ❰2❱ 85+ In-House Developers & CSM / CSPO. ❰3❱ Offices in USA / Canada / Ireland / UK. ❰4❱ Execution Methodology: Agile Scrum.  ||| So, Looking forward for your reply ASAP. THANKS & REGARDS...
€1,000 EUR ndani ya siku 20
0.0 (1 hakiki)
0.0
0.0

Kuhusu mteja

Bedera ya ROMANIA
Bucuresti, Romania
5.0
7
Njia ya malipo imethibitishwa
Mwanachama tangu Apr 2, 2011

Uthibitishaji wa Mteja

Kazi nyingine kutoka kwa mteja huyu

XSLT file transformation
$30-250 USD
Asante! Tumekutumia kiungo cha kudai mkopo wako bila malipo kwa barua pepe.
Hitilafu fulani imetokea wakati wa kutuma barua pepe yako. Tafadhali jaribu tena.
Watumiaji Waliosajiliwa Jumla ya Kazi Zilizochapishwa
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Onyesho la kukagua linapakia
Ruhusa imetolewa kwa Uwekaji wa Kijiografia.
Muda wako wa kuingia umeisha na umetoka nje. Tafadhali ingia tena.