From: rosera**At_Symbol_Here**COMCAST.NET
Subject: Re: [DCHAS-L] Chemical Safety headlines from Google (14 articles)
Date: Sat, 9 Jul 2016 02:12:13 +0000
Reply-To: DCHAS-L <DCHAS-L**At_Symbol_Here**MED.CORNELL.EDU>
Message-ID: 38531478.58501165.1468030333008.JavaMail.zimbra**At_Symbol_Here**comcast.net
In-Reply-To <151CC2E0-EE44-4549-A2D5-85E85BB4FABF**At_Symbol_Here**ilpi.com>


Rob,

Interesting concerning the "sausage making process" you have devised!  Makes me glad that we only have to view the finished product!

BTW, concerning the second quibble, I took a look at the video from a link on the CSB website, and it gives a decent view of the water curtains which the emergency responders used to keep the emissions from the spill onsite:  http://www.fios1news.com/newjersey/edison-chemical-spill-jul-7-2016#.V4BbiTeH_dk.

Regards, Richard Rosera
Rosearray EHS Services LLC


From: "ILPI Support" <info**At_Symbol_Here**ILPI.COM>
To: DCHAS-L**At_Symbol_Here**MED.CORNELL.EDU
Sent: Friday, July 8, 2016 8:35:46 PM
Subject: Re: [DCHAS-L] Chemical Safety headlines from Google (14 articles)

It's great to know that folks are still playing close attention to this ongoing effort.  Just as background on this see http://www.sciencedirect.com/science/article/pii/S1871553213006026.

Let me address the first quibble. The MD versus NJ tag was auto-generated by the logic routine I wrote to scan the article and try to guess where it may have happened.  This is an exceedingly difficult task to achieve from an artificial intelligence (and hack programmer) perspective.  Consider:

1. Many on-line news sources do not use bylines (and they call themselves news!).  And even when they do, they often omit the state, as this one clearly did.
2. In fact, many times, the news site does not clearly identify which state the source itself is even in.  In this case it does, but it's not immediately obvious.
3.  Neither "New Jersey" nor "NJ" appears anywhere in the article itself.
4. "Maryland" appears twice in the article.

The logic routine starts out looking at the byline. If a state is found, it's assigned to that state (although other possible states will appear as alternates for Ralph to select)..  Then we look for the state in the text itself (it only looks at the text that Ralph has highlighted to include with the article headline).  We also check to see where the domain name is from (relying on a  built-in list of over 7,000 web/print outlets and 2,200 TV/radio outlets in the US plus a selection of influential International media outlets that I've collected over the years).  If that's no good, we rely on the country name found in the text itself (if there is one, meaning it's an International incident).  If that's no good, it looks for a state abbreviation in the domain name such as http://www.calepa.ca.gov (that .ca not be confused with Canada for international)  If that's go good, we check the top level TLD of the domain (top level country reference such as .ca for Canada or .fr for France; there are 230 TLD's).

It turns out that of the 122 media outlets listed for NJ, the Burlington County Times was not on the list.  It is now, and the routine now auto-tags the article with New Jersey.  I can also add city names to force state tags. For example, Chicago will usually force a tag to IL unless there is something better in the hierarchy.  You have to be careful with that one, as some city names are fairly common - it would be pointless to try and use Springfield as a location tag trigger, for example.  I have Dublin in there and it will trigger both California and Pennsylvania as well as Ireland as location suggestions.  "Joint Base McGuire" is pretty definitive so I threw it into the list of city name triggers, too. 

So when Ralph runs the script (with the addition of the new web site and city name)  it returns the following location assignments.  Note that #2 and #3 would not have come up without the changes I just made:

State: Maryland
Joint Base McGuire suggests us_NJ 
The domain name is from New Jersey
The text contains Netherlands (indeed it does!)

As Ralph had two choices of Maryland and Netherlands, I'll give him a pass on the mis-assignment here.  He has to manually select the portions of each article he tags and does thousands upon thousands of these, so yeah, once in a while his caffeine-deprived eyes are not going to catch that. Which prompts a really big shout-out to Ralph for his thrice-weekly scanning through dozens of articles while most of us are still in bed so we have these summaries to read on MWF mornings!

BTW, characterizing articles using algorithms also takes a lot of careful thought. For example, if you have to classify the article into Discovery/Fire/Explosion/Release/Followup, you can't just look for the keyword "explosion" because the article might actually say "however, there was no explosion".  To handle that I use regular expressions (regex); this one construction takes out most references that are false positives for an actual explosion. As you read this mess, recognize that ? means the character (or parenthetical construction) before the ? was optional, and | is a logical OR operator. The expressions for injury, death, and fire are 2-3x longer than this one!

noWant = new RegExp("(not the result of|did not result in) an explosion|not explode|kept from exploding|(possibly|potentially|potential) explosive|fears of an explosion|risk of (an)?explosion|potential (for|to cause an) explosion|like an explosion|(could|might) cause an explosion|in case of explosion|(can|could|might) explode|(prone|susceptible|vulnerable) to explosion|in the event of (an )?explosion|controlled (explosion|detonation)|safely (exploded|detonated)|not (high |concentrated )?enough to cause an explosion|(can|could) (have)?(lead|led) to (an)?explo|blast phone message|explosive (chemical|material)|can be detonat|could have caused an explosion|rule(d)? out a(n| chemical) explosion|prevent(ed)? an explosion|blasting cap(s)?","gi");

I'll mention one more short trigger there for followup because that's a hard one with false positives.  First we look for language such as "last year", but that's usually not enough. So then we look for text containing a month and year that is not current (for example "March 2015 accident") because that kind of construction means it's probably a report on an earlier incident; other triggers are "Chemical Safety Board" and "in the wake of" etc:

if (textSelected.search(/last (week|month|year)|(weeks|months|years) ago|(\bin|last|early|late|the) (January|February|March|April|May|June|July|August|September|October|November|December)( 20\d\d)?|\bin (2007|2008|2009|2010|2011|2012|2013|2014|2015)|the (\w)*( year)? anniversary|month investigation|investigatory findings|in the wake of the|final report|Chemical Safety Board/i) != -1) {typeResult=4;}

All that said, the system works remarkably well..  I add minor tweaks every few months at most now, and Ralph reports that the system has been working surprisingly well considering how little maintenance is now required.  Then again, with so many articles over the years now, we've tweaked the algorithm enough times that it recognizes most stuff we throw at it.   But it still takes Ralph to scan the articles, highlight the pertinent text for us, and manually review/tweak the tag suggestions. Uploading to Pinboard and composing the emails is essentially automated, but he still has to deal with Listserv problems, bounce messages and more.  I can not overstate Ralph's time and mental commitment to this project as well as the DCHAS community (and beyond).

As to quibble 2 - that's the news media. Par for the course.

Rob Toreki

 ======================================================
Safety Emporium - Lab & Safety Supplies featuring brand names
you know and trust.  Visit us at http://www.SafetyEmporium.com
esales**At_Symbol_Here**safetyemporium.com  or toll-free: (866) 326-5412
Fax: (856) 553-6154, PO Box 1003, Blackwood, NJ 08012




On Jul 8, 2016, at 6:46 PM, rosera**At_Symbol_Here**COMCAST.NET wrote:

A couple of quibbles regarding the classifications associated with two of the Chemical Safety Headlines:

FINAL PREPARATIONS UNDERWAY TO DESTROY CHEMICAL MUNITIONS FOUND ON JOINT BASE
Tags: us_MD, industrial, discovery, environmental, mustard_gas, phosgene
Note that while the destruction & disposal team is from Maryland, Fort Dix/Lakehurst where the chemical munitions were discovered & will be destroyed prior to disposal is most definitely in New Jersey.

AIRBORNE HAZMAT LEAK AT EDISON PLANT THURSDAY MORNING
Tags: us_NJ, industrial, release, injury, dye
First, the article states it was a "whitening pigment", not a dye.  From my personal knowledge of operations at this plant, it is undoubtedly titanium dioxide.  The real chemical released, however, was almost certainly titanium tetrachloride, which when airborne forms titanium dioxide (visible) and hydrogen chloride gas (not so visible, but MUCH more hazardous).  Interesting that the article (and the company spokesman?) conveniently neglects to mention this!

Richard Rosera
Rosearray EHS Services LLC

From: "Ralph Stuart" <ras2047**At_Symbol_Here**MED.CORNELL.EDU>
To: DCHAS-L**At_Symbol_Here**MED.CORNELL..EDU
Sent: Friday, July 8, 2016 10:12:19 AM
Subject: [DCHAS-L] Chemical Safety headlines from Google (14 articles)

Chemical Safety Headlines From Google
Friday, July 8, 2016 at 8:06:24 AM

A membership benefit of the ACS Division of Chemical Health and Safety
All article summaries and tags are archived at http://pinboard.in/u:dchas

Table of Contents (14 articles)

MH FIREFIGHTERS CLEAN UP CHEMICAL SPILL THURSDAY
Tags: us_AR, industrial, release, response, hydrochloric_acid

HAZMAT CREWS RESPOND TO STRONG ODOR ON TEMPLE UNIVERSITY CAMPUS
Tags: us_PA, laboratory, release, response, unknown_chemical

AIRBORNE HAZMAT LEAK AT EDISON PLANT THURSDAY MORNING
Tags: us_NJ, industrial, release, injury, dye

HAZMAT INCIDENT IN MONROE CONTAINED
Tags: us_NY, public, discovery, response, nitric_acid, sodium

FINAL PREPARATIONS UNDERWAY TO DESTROY CHEMICAL MUNITIONS FOUND ON JOINT BASE
Tags: us_MD, industrial, discovery, environmental, mustard_gas, phosgene

DUPONT ORDERED TO PAY MILLIONS OVER TOXIC CHEMICAL EXPOSURE
Tags: us_OH, public, discovery, environmental, toxics

EXPLOSION INJURES THREE IN HARTFORD
Tags: us_VT, public, explosion, injury, other_chemical

NSF EMPLOYEE ASKS FOR INVESTIGATION OF ROTATING WORKER SYSTEM
Tags: public, discovery, environmental

FDA REQUESTS SAFETY DATA ON HAND SANITIZERS
Tags: industrial, discovery, environmental, drugs, ethanol

REPORT: CHEMICAL SAFETY BOARD MUST CONDUCT MORE INVESTIGATIONS
Tags: industrial, discovery, environmental

6 NEW JERSEY FIREFIGHTERS EXPOSED TO CHEMICALS, HOSPITALIZED
Tags: us_NJ, industrial, release, injury, unknown_chemical

SPRAIN REOPENS AFTER CHEMICAL SPILL
Tags: us_NY, transportation, release, injury, hydrochloric_acid

NO ONE INJURED IN EXPLOSION AT INDUSTRIAL BUILDING
Tags: us_FL, industrial, explosion, response, unknown_chemical

HAWAII LAB EXPLOSION CAUSED BY STATIC DISCHARGE
Tags: us_HI, laboratory, follow-up, injury, biodiesel, hydrogen

(snip)

Previous post   |  Top of Page   |   Next post



The content of this page reflects the personal opinion(s) of the author(s) only, not the American Chemical Society, ILPI, Safety Emporium, or any other party. Use of any information on this page is at the reader's own risk. Unauthorized reproduction of these materials is prohibited. Send questions/comments about the archive to secretary@dchas.org.
The maintenance and hosting of the DCHAS-L archive is provided through the generous support of Safety Emporium.