PetterSandvik VIP
Total posts: 110
22 Mar 2018 01:44

Hello,

Will, or how is it to make it possible to get PDF-content searchable in pagesearch? I think of creating Cobalt section of downloadable PDFs of old newspapers, but its needed to be searchable.

Last Modified: 06 Apr 2018


pepperstreet VIP
Total posts: 3,837
22 Mar 2018 18:15

Hello Petter! ;)
To my knowledge there are some Joomla extensions which can parse and index PDFs and some other formats.
- OS PDF Indexer (commercial, JED link)
- JiFile (free, JED link)


Some years ago I have used OS PDF Indexer with JoomlaTools DOCman. Worked pretty well.
I can't remember the configuration, but it should allow to select a folder, and it parses also sub-folders.

Not sure about JiFile though. It's free, so you might try it for yourself.

I believe they work with J! global search.
No Smart search, no special integration with Cobalt.


If there are not too many newspapers, you might also create a detailed listing and search manually. Either for J! Smart Search, or even Cobalt. Just thinking loud …:

Parse and Index the PDFs with one of the tools above…
or with a desktop app like "FileJuicer" or "Acrobat Pro". They can extract and save content in text files.
Then create J! Articles or even Cobalt records with the extracted contents.
Either simple content for fulltext search, or even separated fields.
Then attach the original PDF as a download.

Hope this helps.


PetterSandvik VIP
Total posts: 110
28 Mar 2018 10:37

Hello,

Thanks for tiph. I will check it out, as i also need Emerald to handle subscriptions to this archive. Its about 9500 newspaper copies.


PetterSandvik VIP
Total posts: 110
28 Mar 2018 16:11

I have tested a bit now, not exactly perfect, but it works if i dont find any other way out of it. As the old newspaper back to 1941, shuld only be downloadable by subscribers, it need to restrict some. Also i want it to be possible by all to search.

The system index ok, so the missing part is to restrict the direct download link on search results.

Also, Mika, Im missing some reply from you.


pepperstreet VIP
Total posts: 3,837
29 Mar 2018 06:03

PetterSandvik The system index ok,

Curious about your solution? Which route and extension did you choose?

so the missing part is to restrict the direct download link on search results.

Joomla search results page?
Maybe an override to check for current user/ACL ...
and/or active subscription...

BTW, does the Emerald plugin syntax work inside an HTML override?


Also, Mika, Im missing some reply from you.

Yes! Just found your last E-Mail in my overflooded Apple mail client.
During the last 12 months, the 2nd time I had serious issues with my local mail database.
Still not perfect and clean, but I coud finally save and read most recent entries.

Yep, we have to finalize it ;) Should be possible in April.


PetterSandvik VIP
Total posts: 110
30 Mar 2018 13:46

I think OS PDF Indexer and OS EDocman or the JT Docman, I have to test a bit more and find the need.

I have only tested the JiFile, but it indexed ok, so i think this is possible. The solution I have worked with in long time is the JoomPlace HTML5 viewer, but its a great job to upload one and one file, so its impossible with 9000 newspapers. The JP HTML5 works great to view PDF like digital newspaper, and I think users is okay about thats used only on new newspapers.

I have to checkup the extension i choose to know that it checks subscription for user before download. But that shuld be possible I hope with some small modificatinos if extension dosnt support that.

Good!


PetterSandvik VIP
Total posts: 110
03 Apr 2018 15:23

Ive choosed JoomlaTools Docman, only problem is to handle integration with Emerald.

I have used parameters:

URL Parameter: view Condition =

Parameter value: list

I cant search in J! search if not subscription is active.

Urls in Docman are like https://friheten.no/index.php?option=com_docman& ;view=list&layout=gallery&slug=papirutgaven&own=0&Itemid=666

To donwload a file

https://friheten.no/index.php?option=com_docman& ;view=download&alias=152-friheten-13-29-mai&category_slug=friheten-2012&Itemid=666

Have you any tip about what i shuld use to restrict only document, and not whole component?


pepperstreet VIP
Total posts: 3,837
04 Apr 2018 14:20

PetterSandvik Have you any tip about what i shuld use to restrict only document, and not whole component?

If you check the URL for view=list, you restrict the entire listing.
Check the single file URL instead. It should have another, specific view value.
Your example URL says: download

Is this a direct download call? Or is there another full-view URL for a single item?
It's been a while I have worked with DOCman... ;) and the demo has SEF URLs only.


Apart from that, you might also use the ACL/usergroups feature.


DOCman 3 seem to have its own INDEXER feature. But it is bound to a commercial online service.


PetterSandvik VIP
Total posts: 110
04 Apr 2018 19:22

Thanks, Ill try!

The commercial online service is about 100 $ year, and thats easier, as it also makes thumbnails!


Sergey
Total posts: 13,748
05 Apr 2018 04:11

PetterSandvik Thanks, Ill try!

The commercial online service is about 100 $ year, and thats easier, as it also makes thumbnails!

Commercial services is the way to go. Those services use core OS OLE components to do the conversion. PHP will never convers as well.


PetterSandvik VIP
Total posts: 110
05 Apr 2018 19:43

I used download, then it works, but Joomla!-sitesearch dosnt give results, if not active subscription in Emerald, but it works to search in component search. Ill checkup with componentmaker.

Thanks for all answeres!


pepperstreet VIP
Total posts: 3,837
06 Apr 2018 11:34

PetterSandvik but Joomla!-sitesearch dosnt give results, if not active subscription in Emerald, but it works to search in component search. Ill checkup with componentmaker.

Do you mean an issue related to the restriction?
Or the search in general?


To my knowledge DOCman 3 supports 3 different search features:
- DOCman3 own search - Joomla global search
- Joomla Smart Search / Finder

Remember to enable the plugins.
Smart Search requires his own index.

Powered by Cobalt