The dangers of relying upon scanned text searches

With all of the on-line documents out there, it is tempting to avoid the sitting-in-a-library-and-reading-miles-of-microfilms approach to newspaper research, and replace it with using text searches on digital newspapers, texts and other scanned material.

There is some merit to that: saves time, microfilms readers can be annoying to use, and you can do your research at home in your underwear.

But lest you rely too heavily upon the scanned text,  remember one thing: character recognition software can be deeply flawed, especially when it is trying to recognize characters from texts printed over a hundred years ago.

Case in point: I’m reading through  legislation discussions as reproduced in the British Colonist online.  (You can find a link to it on the Digital Newspapers page, above)

On 18 March, 1881 the BC legislature had a brief discussion of amendments to the 1881 Poison Act.

Here is what it looked like in the paper:

British Colonist, 19 March 1881, p. 3

Here is how the character recognition software read it:

House wont Into Committee on silo of
Poisons Regulation Bill , 1ItW . Ucown
in the cir
Olauso 1 was altered by suhstitutin
” medical practitioner
” instead of ” apothecary
” ‘
Clause 5 , for mixing colored fluid with
poison , was struck out . ,
Schedule A was slightly amended tniljo :
: )
third line so as to road\ : Strychnine nnd1’
all ; poisonous vegetable alkaloids and their
511.ItsJan striking out aconite :
\ and its
preparations ,

. ‘
Sohodulo B calls for name and address
of 1sttichiasor.atsddata of sale in addition \

to name and quantity of poison , purpose
for wJi.lyh.iiaiuliHU

; \
qitilRigiiilt.IfIii \
, ;
! of put

; ”
‘ .l\l1d of person introducing it.
, .
f Committee ! rose and reported Bill com¬
plete with amendments