Changes in Google Groups - sources posted to Usenet lost forever?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
A few Usenet groups allowed users to post their source code as shar archive.  
The Google Groups website supported access to those groups, viewing the
message in the original (raw) format and upacking the sources.
Unfortunately, last update of Google Groups has dropped a possibility
to access the original of the Usenet posts.
The "formatted" (in fact corrupted) version of the message does not  
allow to unpack the (now damaged) shar archive.
Does it mean that all sources that were posted to Usenet are now
lost for us forever?
Is there any other way to access the old Usenet messages in their original
form?

TIA & Regards,
Wojtek



Re: Changes in Google Groups - sources posted to Usenet lost forever?
On Thursday, November 26, 2020 at 6:11:13 PM UTC-5, Wojciech Zabo?otny
 wrote:
Quoted text here. Click to load it
ve.  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
l  
Quoted text here. Click to load it

I think you are referring to the indication Google gives that you can't vie
w the "original message" because of email protections or something similar.
  Seems very goofy, but there it is.  

But bear in mind that Google Groups is not usenet.  However, retention of u
senet posts varies with the access provider and is seldom "forever".    

--  

Rick C.

- Get 1,000 miles of free Supercharging
We've slightly trimmed the long signature. Click to see the full one.
Re: Changes in Google Groups - sources posted to Usenet lost forever?
Quoted text here. Click to load it
I'll just quote the thread: https://support.google.com/groups/thread/61391913?hl=en&msgid61%725204

"Google bought the Usenet archive from Dejanews. Having now "banned" these groups there is no way to access the historically important posts spanning decades.
Pointing at another Usenet service that provides access to current posts is irrelevant.
If Google are unwilling to continue to host the archive they should donate it to the internet archive or a similar group.
Erasing history is not acceptable."

With best regards,
Wojtek

Re: Changes in Google Groups - sources posted to Usenet lost forever?
On Thursday, November 26, 2020 at 6:35:07 PM UTC-5, Wojciech Zabo?otny
 wrote:
Quoted text here. Click to load it
otny wrote:  
Quoted text here. Click to load it
chive.  
Quoted text here. Click to load it
e  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
inal  
Quoted text here. Click to load it
 view the "original message" because of email protections or something simi
lar. Seems very goofy, but there it is.  
Quoted text here. Click to load it
f usenet posts varies with the access provider and is seldom "forever".  
Quoted text here. Click to load it
1913?hl=en&msgid61%725204  
Quoted text here. Click to load it
e groups there is no way to access the historically important posts spannin
g decades.  
Quoted text here. Click to load it
is irrelevant.  
Quoted text here. Click to load it
e it to the internet archive or a similar group.  
Quoted text here. Click to load it

Google may have bought "a" usenet archive, but my understanding is there is
 no one archive of usenet.  

https://www.fastusenet.org/blog/what-is-dejanews-where-did-it-go.html

They provide 12 years of retention.  Seems like it should be easy to not de
lete anything, but I guess the use of usenet is not one of those things tha
t is growing exponentially making the previous usage small in comparison.
  

--  

Rick C.

+ Get 1,000 miles of free Supercharging
We've slightly trimmed the long signature. Click to see the full one.
Re: Changes in Google Groups - sources posted to Usenet lost forever?
On 11/26/2020 4:11 PM, Wojciech Zabo?otny wrote:
Quoted text here. Click to load it

I don't understand what you mean by "corrupted"?  Do you have a
pointer to an example that I can examine (without a google login)?

Are you sure the "corruption" can't be stripped from the post
with a filter (script)?

Quoted text here. Click to load it

Re: Changes in Google Groups - sources posted to Usenet lost forever?

On Thu, 26 Nov 2020 19:01:33 -0700, Don Y

Quoted text here. Click to load it

The junk is HTML formatting.  The worry is that things like C++ source
legitimately may contain angle bracket delimited text.  You'd need a
smart filter that understands HTML tags.

And there may be a *lot* of it. I've seen usenet messages sent (or
forwarded) from Google Groups with ... not kidding! ... ~10,000 lines
of deeply nested HTML surrounding ~10 lines of text.


Quoted text here. Click to load it

Since Google has removed the option to see the raw message, the only
way to get things unmangled is from some other source.

Unfortunately few NNTP servers go back further than about 10 years,
and ftp.uu.net (the original usenet archive) is no longer operating.

You can try  
  https://usenetarchives.com/ or
  https://www.crunchbase.com/organization/the-usenet-archive .

Many(most?) of the historically popular groups are available, and that
includes pretty much everything in the comp.* and sci.* hierarchies.
But searching is not easy, and if you're looking for something
esoteric you may not find it.

George

Re: Changes in Google Groups - sources posted to Usenet lost forever?
On 11/26/2020 9:14 PM, George Neuner wrote:
Quoted text here. Click to load it

Hi George!

Have not heard from you in a while -- was beginning to think that you
may have been coviderated!  Hopefully, that's not the case (?)

Quoted text here. Click to load it

Or, scrape the posts manually?  E.g., highlight text in browser,
copy, paste?

If posted as an "image" of text (to deliberately hinder capture),
a screen capture program feeding an OCR... and manual touch-up.

Though, having seen Wojciech's example, it appears that there is
more involved than just eliding HTML tags!  I've not actively studied
the (apparent) transformation to try to codify the rules that may
have been applied...

Quoted text here. Click to load it

For a small-ish post, I'd wager you could scrape (as above) and
manually edit the resulting text to something that's faithful to
the original intent.  Tedious and potentially error prone but
denies "lost forever".

Quoted text here. Click to load it

Some of the better known sources are also available on FTP servers.
E.g., I think Vixie's cron(8) is available like this.

Re: Changes in Google Groups - sources posted to Usenet lost forever?
On Sat, 28 Nov 2020 00:53:43 -0700, Don Y

Quoted text here. Click to load it

Nope. I had a viral flu in early 2018 that had eerily similar symptoms
to what is claimed for Covid-19: I was really sick with respiratory
problems for ~5 weeks, and it was ~14 weeks before I really felt well
again.  I was never hospitalized, so that virus was never identified,
but I'm hoping that was a coronavirus because some studies in Europe
found that prior exposure to other coronaviruses *may* give some
increased resistance to this one.

In any event, I don't have your current email.


Quoted text here. Click to load it

Laborious if someone posted a long program.


Quoted text here. Click to load it

Yuck! On average OCR still makes ~1 mistake per line.


Quoted text here. Click to load it

The problem there is Python. For almost any other language, your idea
of scraping it manually would work.  For Python, you have to
understand the logic to reinstate the required indentation.

I have always been opposed to significant whitespace in a language.


George

Re: Changes in Google Groups - sources posted to Usenet lost forever?
On 11/29/2020 1:57 AM, George Neuner wrote:
Quoted text here. Click to load it

<frown>  I was evaluating lawyers and their ilk (good use
of that word in that context) a few months back and "consumed"
several email addresses in the process -- giving them out
"temporarily" and then canceling the accounts once I'd made
up my mind to cut off further communication from the
"undesirables" (Q:  are ANY of them "desirables"  :> )

I thought I'd picked accounts that I wasn't actively using.
But, may have screwed up.  I'll check my mail archive to
see what you were using to see if it was affected.

In either case, you should have a couple of addresses for me (?)

Quoted text here. Click to load it

Of course.  My point was that the "content" isn't really "lost",
just less easily accessed!

(I had to resort to scans of much of my earliest work to get
them back into electronic form)

Quoted text here. Click to load it

I've not seen that sort of problem with good images.  Much worse
with scanned stuff (esp if scanned at too low resolution).

In any case, it appears that much of the delimiters that SHAR introduces
are arbitrarily removed from those posts.  Perhaps google thinking
a leading nonspace character is indicative of an indent level
in quoting?  (you can specify which character to use in many MUAs)

Quoted text here. Click to load it


Re: Changes in Google Groups - sources posted to Usenet lost forever?

Quoted text here. Click to load it
hive.  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it

Here you are: https://groups.google.com/g/alt.sources/c/YeeAV3fBAVc/m/AZgPo
FxS4NYJ
The Python code has completely removed indentation.

Quoted text here. Click to load it

No, the indentation space are simply removed. There is no way to recover th
em.

Quoted text here. Click to load it
nal  
Quoted text here. Click to load it

Re: Changes in Google Groups - sources posted to Usenet lost forever?
On 11/27/2020 5:40 AM, Wojciech Zabolotny wrote:
Quoted text here. Click to load it

Indentation and whitespace /tend/ to be insignificant to the operation
of the code.  Of course, presence in string literals is a different
story -- where even replacing tabs with spaces is a hazard.

Quoted text here. Click to load it

 From a quick look, it seems like the problem goes beyond that.
Note that the leading 'X' is stripped from most -- but not all -- lines
of "encoded" files.

Quoted text here. Click to load it

As you appear to be the owner of the file (and presumably have another
copy stashed away), you might try reposting it as a SHAR but uuencoded,
first.  I would suspect that would be more robust wrt whatever pretty-printing
algorithm google is trying to impose.

Or, just keep a copy on some other public archive.

Re: Changes in Google Groups - sources posted to Usenet lost forever?
On Sat, 28 Nov 2020 00:47:00 -0700, Don Y

Quoted text here. Click to load it

In Python, indentation is required syntax: in general, it is an error
for code in the same scope not to be vertically aligned.

However, with a nested 'if-else', logic actually depends on the
indentation:

  if <expr1>:
    <statements1>
    if <expr2>:
      <statements2>
    else:

is very different from

  if <expr1>:
    <statements1>
    if <expr2>:
      <statements2>
  else:

In C the 'else' goes to the nearest 'if' regardless of whitespace.  In
Python, the 'else' goes to the nearest 'if' with which it vertically
aligned.


Significant whitespace sucks!
George

Re: Changes in Google Groups - sources posted to Usenet lost forever?
Quoted text here. Click to load it

You'll love my new language "Point Blank".  Its file extension is a
space character.

There is also my Haskell dialect for embedded microprocessors.  It is
called Control-H.  its file extension is a backspace.

;-)

Re: Changes in Google Groups - sources posted to Usenet lost forever?
On 11/29/2020 2:16 AM, George Neuner wrote:
Quoted text here. Click to load it

Sorry, I didn't even examine the "content" of the archive; rather,
concentrated on the "SHAR wrapper" as it was quite obviously
corrupted.

Quoted text here. Click to load it

Yes.  I dislike Python as my naming and coding styles rely on long
logical lines.  I prefer to let a pretty-printer clean up my
code to my own coding standards (indents, braces, function templates,
etc.) than to let the language dictate what my code HAS TO look like.

[I most often don't write in an IDE so can't rely on the "editor"
to "correct" formatting for me if, for example, I prepend an "if"
to a block of code or wrap it into some other explicit block]

Quoted text here. Click to load it

There are still places where a space is not a space and you have to
deal with it.  I frequently find tabs and spaces interchanged for
each other when cutting and pasting across systems; the machine
sees things that the human doesn't care about.  Try CONCLUSIVELY
sorting out whether you're looking at " \t", "     " or "\t " (or
variations thereof) from a paper printout!

But, there are also annoyances with things as banal as typefaces
that needlessly confound.

Or, displays that have opted to use particular glyphs that
can't readily be resolved as being rightside up or upside
down.  Is "529" five hundred and twenty nine?  Or, six hundred
and twenty five?

Site Timeline