A co-worker needed to get some data from our production database.
So he sent a sql query to the database administrators. As the result
he got the output of sqlplus. Neither he nor the database
administrator had specified a custom format for the output. As the
query had a couple of columns as output the result was that the rows
were broken into two lines. My co-worker asked me to prepare the
output so that it could be analyzed in Excel.
I've first
suggested to do the query again but this time with some formatting
options enabled. Unfortunately this wasn't possible as the data had
changed.
Merging the lines to the original rows is easy.
Opening in Excel showed that this wasn't sufficient as there were no
clear separator/column size for splitting the row into columns. Using
Notepad++ confirmed that there is no obvious pattern. I've done some
research and testing but it didn't help. After some time I've
accidentally opened the original output file in Windows Notepad.
There it was nicely formatted. The headings started at the same
position like the underlines and the data columns. No data column was
larger than the underline of the the column header.
Sounds
strange but it turned out that Notepad uses tab to move the text
position after a multiple of eight. When you have entered one
character and press tab you have to replace the tab with seven
whitespaces. When you have entered five characters you just need to
add three whitespaces and when you have already eight characters you
need to add another eight characters to replace a tab. If you use
this rule to replace the tabs you end up with a nicely formatted text
in Notepad++.
This almost allows you to split the text via
column length. First, how to get the column length. For each column
header an underline is created by hyphens. The number of hyphens is
the number of the characters that the data of the column will have.
Space at the end not used by the data is filled with tab and white
space. So parse the underline of the header and you get the column
length.
This works great except for the column at the end of
the line. Well, to be precise, at the end of a line but not at the
end of the row. At the end of the line there are no trailing
whitespaces. When you now merge the lines to the row the content of
the next column gets into the column at the end of the line. To fix
this you either fill the column with whitespaces until you've reached
column length. This is tricky as you don't have the size for the
first header row. You may also use a hack in that you fill each line
up to a value that is greater than any line in the output. Then after
splitting the columns you have to trim the output. This isn't as bad
as it sounds that thanks to padding of the sqlplus output you have to
trim anyway.