The Raw data:
My required format:
I had five files in total, with the question count being 440 or so. Typing was out of the question. So I decided to go with Submlime Text’s search and replace. Sublime supports regex search and substitution. This entire exercise has been a real eye opener for me.
I hope this post serves as a rough guideline. The chances my exact code will be useful in any other case are very slim.
Step 1 : Copy from PDF to a text file
First I copied everything in the pdf to a plaintext file. ctrl+A ctrl+C ctrl+V
There seems to be rough order, with the question and each option being on new line. Questions and options are preceded by a number or an alphabet. Now this is a small snapshot of the file. This format was more or less consistent throughout the file, but with the occassional aberration. I have not included those here for simplicity’s sake. And now, to start writing my Regular Expression!
Step 2 : Target the pattern
I’m still a rookie when it comes to regexes. This is a simple project, but it still took a considerable amount of trial and error, and after much playing on two regex testing sites Regex 101 and RegExr I finally got the required regular expressions.
If you aren’t sure what these cryptic characters mean, here’s an explanation:
Step 3 : Replace!
Now this is where the magic happens. Regex substitution lets you use the matched ‘groups’ in your substitution. I am going to make use of this feature to generate my structured JSON file.
Putting it together
Mashing together the individual pieces, to form the final search and replace expressions gives very satisfactory feeling. The expression used for searching is very poorly constructed, and can definitely be improved. I repeated the same term 4 times because I could not get the grouping construct to work.
Here’s how it looks, in action:
And that’s the way it’s done :neckbeard: