libreoffice calc freezing with large xls file from exel(97-2003)

A new install of libreoffice in W7pro is slow when opening/saving or resequencing a large((25M 64Kline) .xls file.

Takes forever to simply clear a column.

It also crashes/freezes when copying and pasting lines between sheets, within the same file. (you can't insert copied or cut lines - you have to 'paste', with a shift down provision.)

After the freeze (program terminated externally), the 'recovered' file shows partly disassociated lines at the end of reopened sheets that have no association with work at the time of the freeze.

Has anyone had similar issues handling .xls files with 'calc'?

I'm only doing this because the older Excel sw had a 64kline limit. I could use Excel2010, but that program is already flashing 'end of support' messages and I despise its GUI's appearance and disorganization. ...........................

Calc also seems to have an issue recognizing the system clip-board. When copying and pasting simple text from external applications, it sees an empty clipboard, two times out of three, when the paste is attempted.

It will even lose cut text, prior to pasting, WITHIN the application. PFUT, the text is gone . . . . Can I remeber what it was . . . .?

I'm ignoring the fact that the first .xls file save reduced the file size by 10%. I have to assume (hope) these were empty rows or columns being expunged.

RL

Reply to
legg
Loading thread data ...

Can you watch (process monitor, task manager, etc.) to see if it is CPU bound or I/O bound? I.e., it could be you are seeing a flakey disk (or portions thereof) reflected through the operation.

You can also guesstimate the "size" of the operation wrt the amount of memory on-hand (note that even "clear" can cost memory if the software needs to be able to undo the operation)

Perhaps go back to Excel and export it as a CVS? You could then do a cursory examination of the contents to convince yourself that you've got it all (number of lines, number of columns, cursory examination of leftmost column contents, ...rightmost column contents)

I.e., you don't know what's actually been transfereed (is everything UTF-16???)

Can you verify that there actually *are* items on the clipboard? E.g., paste into some other "benign" application?

Presumably, you've saved a copy of the "original" that you can examine and use on a subsequent "reload"?

I'd hate to suggest it but it sure seems like a reinstall might be the first step. (Or, better yet, do this on another machine to rule out the disk issue)

I will also offer the idea that you might, instead, want to be putting all of this stuff in a DBMS, instead. And using little scripts/SQl to do any processing that you want. I.e., if you are trying to *use* Excel as a database, it's likely the wrong tool for the job (though often the one folks "go to" first!)

Reply to
Don Y

Bunch of tale of woe trimmed

I would second that. Too many people misuse Excel or Calc when they should go to a data base. My take is when you get to 100's of lines it is probably time to consider a data base.

MySQL and MariaDB are fairly easy to install and are commercial grade. One or the other is in the distribution for many Linux variants.

SQLite is another possibility which is even easier to install. I haven't used it for "large" databases so I am not sure of the limits. (Hmm - just looked at their site and max size DB is about 281 terabytes - so more than enough to handle a bloated spreadsheet :-).

All the above can import CSV data and there are GUi tools to simplify use.

(Full disclosure - I have spent a lot of time in the data base arena and have converted unwieldy Excel files to a data base.)

Reply to
Dennis

On Sun, 25 Jul 2021 17:46:10 -0700, Don Y snipped-for-privacy@foo.invalid wrote: <snip>

I have a task manager application running at all times. The only program that seems to forcibly grab processor capacity (to interfere with even the simplest keyboard/mouse operations) is a skype window. Skype runs 4 processes simultaniously, only one of which is an issue. This rogue process can be terminated whenever its activity is noted, without affecting operation of the program, in the whole, as a com terminal. I suspect that skype is selling machine processor cycles, a la distributed computing apps, but with less gentility.

There is processor activity during calc slow saves or column clearances, but nothing that inhibits other functions. RAM runs about 50%, regardless. Any freezing activity blocks most applications, including the task manager, so you can't tell what's going on. <snip>

It's all text, as far as I know, recognizable in notepad, though carriage returns or line breaks tend to get misinterpreted, identically to the spreadsheet - suggesting that the source was a bit more than WYS.

Haven't had this issue with any other program, performing identical functions.

I save regularly with date-coded file names. So far calc has been chewing up its own copy of a specific file, saved with different file names at different times. These can be examined with the versions of excel, while their line count remains below the tripping point. (within 500 lines of the 64k limit - I hope this isn't a number schmoozed by Sales to describe something that is effectively smaller).

Will be reinstalling libreoffice before going any further with it. I doubt that updates address this behavior.

If a database can produce a similar transferable file, with similar visual quality ie just a long number of lines in columns of text that can be user-resequenced and searched, I'd be interested.

The .xls file format has seemed, in the past, to be the go-to version, if you want the info to be easy for end-users to access and manipulate. They can do what they want with it. There are many uses - I see it (and tell-tales) showing up on html- formatted websites all the time. Once frozen into that or other forms, the end-user access and flexibility is severely limited. ...............

One fault I've found in excel is its tendency to apply arbitrary text formatting rules that 'correct' user input, despite all user settings selected for the contrary. Once it finally accepts an aAAAA input, it seems to preseve it, without changing it to Aaaaa, as it will try to do during the text entry phase.

I'd like to disable formulas completely, as ' - ' is a common text prefix requirement, and cell formats seem to revert from the text setting at random. All empty cells in valid columns and lines are currently occupied by this character, as a place-holder.

RL

Reply to
legg

If you're loading a .xls spreadsheet into Calc, then LibreOffice has to convert everything into whatever Calc uses internally, probably ODF (open document format). The conversion might be a bit slow on both open and save in .xls format.

When you save the Calc spreadsheet, you have the choice of saving it as: Open Document Format .ods (default) Excel 2007-365 .xlsx Excel 2007-365 (with macros) .xlsm Open Office XML .xlsx Excel 97-2003 .xls Unified Office Format .uos

Try a little benchmarking and save your large spreadsheet in each of these formats and time how long it takes to save and load. I suspect you'll find some differences. Also, check your saved file sizes.

Google finds quite a few complaints about large Calc files being slow to open and save:

formatting link
There does seem to be some weirdness involved. For example, changing the font also changes the load/save speed:
formatting link

Drivel. Be careful with large spreadsheets: "Missing 16,000 coronavirus tests glitch 'caused by large Excel spreadsheet file'"

formatting link

Reply to
Jeff Liebermann

Don Y snipped-for-privacy@foo.invalid wrote in news:sdl0kr$i4n$ snipped-for-privacy@dont-email.me:

Excel works great storing and accessing small data sets.

Reply to
DecadentLinuxUserNumeroUno

Dennis snipped-for-privacy@none.none wrote in news:sdl3v5$4k8$ snipped-for-privacy@dont-email.me:

Excel works fine with large data sets. Modern Excel on a modern machine running a modern OS.

Trying to open a large data set file on an old unsupported dead OS is likely the problem here. Probably only has 4GB RAM or some other barely runnable hardware setup as well.

Look at the size of this "database" captured in excel...

formatting link
Download that xls file and see that it opens fine and such. Trying to move columns is an operation that takes up resources.

Reply to
DecadentLinuxUserNumeroUno

What about disk activity? If the system is sitting in a "wait" pending results from the physical drive, CPU utilization will appear low -- but the disk will be "lit" continuously.

Don't put it past Excel to "embelish" the data that it manages.

So, you're not copying from Excel to <whatever> but, rather, the other way around (and, thus, assume that it gets *onto* the clipboard but Excel somehow munges it)

Good boy. Getting f*cked is a great learning tool! :>

I recall some <generic> DBMS-based application that would crap out "somewhere around" 16K records. But, it wasn't predictably "16K" so folks learned to trim the dataset any time they got even close!

In their case, the application would just lock up -- nothing in or out (so, if you didn't trim the dataset BEFORE that happened, you had to roll back to an earlier copy of the dataset -- losing all of the changes you'd made since then -- and trim it BEFORE you got into trouble).

A lot of applications don't know their own "limits".

Agreed.

In general, you might want to invest the time to learn about running VMs under <whateverOS>. This lets you tinker with installs, reinstalls, etc. without messing up your "real" work environment.

And, has the delightful benefit that you can just COPY a "file" (the VM disk image) and effectively have backed up the entire "system"!

[E.g., you create one file that contains a virgin system -- just the OS. Then, COPY that (under your native/host OS) and you have a new "system" into which you can install whatever apps you choose. Lather, rinse, repeat.]

The database will give you whatever "objects" (text, in your case; but you can also store images, etc.) in a list that can be massaged (by some GUI-based tool that sits atop it).

Note that with a *relational* database (as MSAccess and the LibreOffice likely work-alike), you can tie tables together (relations).

So, you could create a table of "countries" (I'm assuming this is related to your ongoing covid reporting) with records that are: {country_id, country_name, total_population, data_source} with entries like: {IL, Israel, 9227700, http://somewhere} {US, United States, 331449281, http://someotherplace} ...

Then, in another "covid_cases" table, you could have: {country, report_date, total_cases} with entries like (made up numbers): {IL, 7/3/2021, 23555532} {IL, 7/1/2021, 23542342} {US, 6/15/2021, 311} {IL, 7/7/2021, 23560944} {US, 6/12/2021, 9008} [Note that the entries need not be grouped by country *or* recorded in chronological order!]

You can then "ask" the database to determine the number of cases per 129,000 (!) population ordered by report date per country -- by consulting the "countries" table to get the total_population (which would be divided by 129,000) against which the total_cases would be compared.

At a later date, if you wanted to make note of the male/female population mix in a country, you could add a "males" column to the countries table (and letting females be "total_population - males") Thereafter, any queries can draw on this additional information to further qualify your "reports" (e.g., if you had cases reported by gender, you could show infection rates by gender, per 129,000 population)

Yeah, I think there'a a lot of "I know better what the user wants than he does" that goes into many applications! I'm fighting with one, now, that inconsistently wants to convert (most!) everything I type into lowercase (despite having a configuration option that tells it NOT to!)

Reply to
Don Y

If you are going to use Calc then you are much better off saving in the native format of its file storage. .XLS will involve the Calc program going through contortions to accept the tangled morass that is .XLS.

File that size from Excel that have been in use for some time likely have orphaned metadata lurking in them. Most non-Mickeysoft applications are better at opening Office files than the Mickeysoft products! (always worth a try if you have a valuable damaged file)

One way would be save formulas as CSV or save as HTML and import the thing into Calc from that neutral format. That should eliminate most of the accumulated dead wood and dross in the Excel version.

More likely orphaned metadata. Excel excels at ending up with drag and drop images hidden in its binary format. No longer referenced but going along for the ride. The wrong combination of Excel versions of .XLS in a corporate environment with pre XL2007 templates still in use can create files which pretty much double in size each time they are saved.

The export to HTML and re-import will usually fix them. You can find ancient images lurking in amongst the rubble.

Reply to
Martin Brown

I went through one exercise, running the troublesome libreoffice install in 'Safe libreoffice' mode - an option provided by libreoffice.

I opened and closed a specimen file. I resaved it (excel 97-2003).(different filename) I resequenced it and shifted column order and resaved it (new file name). Then I exited the application.

Never more than 50% of machine processing consumed. RAM rock solid on %50%, during that time.

I had background distributed computing projects running - these choked themselves back when other processes used machine capacity.

After exiting the calc program, the taskmanager showed this process to still be running, with previously indicated ~50% machine processing capacity.

After a few minutes the RAM started to fill up, gradually, until it hit the recording limit and the machine became slow to respond to user keyboard/mouse input.

I think they call that a memory leak? Hard to characterize against previous day's issues, except that no RAM overloading had been evident via the process monitor, during that time. The mouse/keyboard input could have been increasingly sluggish, but I wasn't using either until after the process monitor showed all but system idle processes had ceased.

An uninstal process doesn't appear to be on the menu.

RL

Reply to
legg
<snip>

No, not covid related. This is an SMD marking code spreadsheet.

I thought I'd get through the current rev under the 64kline limit, but Ablic and Analog Devices put paid to that.

How many LDO and POR variations do you really need, for

8)#%!*^&'s sake? I'd already expunged a lot of Maxim sales vaporware numbers. Looks like they're not the only culprits.

RL

Reply to
legg
<snip>

w/r to 'running excel 97-2003 into the ground, it didn't take long (though any work seems to eat days from the life)

. . . . got to 65150 lines before Excel refused to save the file, parking it as an unspecified file, without a file extension, in the working directory. A baker's 64k. Unfortunately it also disappeared the last correctly-saved file, with the day's work in it.

With the unsavable file still open, I could copy and paste the specifically-long worksheet into a new spreadsheet.

The unspecified file remenant, when assigned an .xls file extension, could be opened normally in excel2010, or libreoffice calc.

The .csv file of the 65150-line sheet from the remenant and one produced from the cherry-picked worksheet were identical, as compared by Notepad++. Apparently the day's work survived, if it remained in the unsavable form, last seen on the screen.

...........

Continuing the work with 'calc' is proving to be increasingly annoying - there is a two or three-chimpanzee delay between keystrokes and the GUI displayed on the screen. You can't tell whether a keystroke or mouse-click has landed squarely until much later.

Even scrolling the screen is hazardous, as a 'selected' cell suddenly turns out to be the one located where the intended cell 'used to be', before the gui decided to cough up the last click of the mouse wheel.

Beckspaces become deletions, in that case . . .

In order to 'undo' the deletion, you have to return to the affected cell.

Things go much quicker if you ignore the screen and just pile on the keystrokes (operations dependant on user memory) -

- backspace the hanging spaces in the next five vertical cells is an example - the machine seems to wait it's usual

3 chimpanzees and then just does the whole shebange . . . so long as the cell selection doesn't get screwed up, there are no full cell deletions.

This is the exact opposite of established typing procedure - where you're supposed to ignore the keyboard and concentrate on the target.

These hanging spaces are not displayed in excel, but affect alpha-numerical sequencing. 124a_ is apparently larger and places later than 124a. ( _ used here to illustrate ' ')

My machine just don't like calc, it seems.

Any software that ignores human input from the operator, for whatever reason, is defective. It takes so little machine capacity to keep up on communications with the slow human, it really is inexcusable.

RL

Reply to
legg

Hear, hear!

Jeroen Belleman

Reply to
Jeroen Belleman

Perhaps try Gnumeric instead of Calc, there used to be a windows version, or maybe it will run under WSL.

Reply to
Jasen Betts

OK What gives? Excel 2010 just told me I'd reached the end of a worksheet - 'cannot insert new line without moving non-zero cells off of worksheet'.

Line count 65536.

This wasn't supposed to be an issue with later versions of excel.

?

RL

Reply to
legg

so not an .xls file.

has to be ..xlsx

RL

Reply to
legg

Maybe save as a never version of excel, you could be hitting a compatability mode.

OTOH perhaps 2010 is too old.

Reply to
Jasen Betts

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.