We are exploring an option with Text Analytics from Azure to analyze existing PDF documents to figure out different entities such as case number, name of plaintiff and defendant, court name, judge name, and important keywords in the documents. Many documents have a similar format but many others don't. So it's important to have an ability to highlight or change a list of important keywords in Azure. So that future analysis of these documents would produce better result. We don't have hundred of thousands of documents for training. So this is a supervised machine learning engine. Here is the flow the we would like to accomplish:
- Identify keywords and patterns to recognize entities such as "Magistrates' Court", "District Court", "High Court" and the "Court of Appeal"; or "Coram: Kamala Ponnampalam".
- Manage clusters of similar keywords. Similar keywords are keywords that appear together in the same paragraph.
- Import a document
- See a list of keywords and entities based on the predefined keywords and patterns. Also see nouns that are extracted by the engine.
- Able to save the document and the extracted keywords and patterns into a cloud storage for future comparison.
What we need:
Propose a solution using Azure to analyze imported PDF documents to produce the result above
Set up the proposed solution in Azure
Please specify your solution in proposal, what are steps you are going to do to complete this project. How can I test the app after you finished it. You can find some sample files in attachments.
19 freelancers are bidding on average $1227 for this job
i am a lead .NET software engineer and responsible for creating a web and desktop applications using different languages and technologies if u r interested just send me a message thank you and good luck