What does the data in various stages of analysis from a particle collision look like?

3383 views

I've been following the news around the work they are doing at the LHC particle accelerator in CERN. I am wondering what the raw data that is used to visualize the collisions looks like. Maybe someone can provide a sample csv or txt?

Edit: In addition to the raw data, it also seems that I should be interested in the data used at the point where a physicist might begin their analysis, possibly at the "tuple" stage of the data transformation. I'm familiar with RDF tuples, are there any parallels between the two tuples?

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user opensourcechris

asked Jun 22, 2011 in Experimental Physics by opensourcechris (30 points) [ no revision ]

See also: A reference request for real world experimental data.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user dmckee

commented Jun 22, 2011 by dmckee (420 points) [ no revision ]

Keep in mind that data from the LHC come in terrabytes and the processing needs have created a whole new way of data handing, called GRID. cdsweb.cern.ch/record/840543/files/lhcc-2005-024.pdf . It describes the handling of the data.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user anna v

commented Jun 23, 2011 by anna v (2,005 points) [ no revision ]

Your comment on this question:

To answer, leave an answer instead. Comments are usually for non-answers.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
To alert a user, please use the "@" command and remove spaces from the username, example, the user "John Doe" should be pinged as "@JohnDoe", while the user "Johndoe" should be pinged as "@Johndoe". The post author is always automatically pinged (unless you are the post author).
Please consult the FAQ for as to how to format your post.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

2 Answers

Different pieces of equipment will produce somewhat different looking data, but typically it consists of voltages defined as a function of time. In some cases (spark chambers, for example) the "voltage" is digital, and in others it is analog.

Traditionally, the time series for the data is slower than the times required for the (almost light speed) particles to traverse the detector. Thus one had an effective photograph for a single experiment. More modern equipment is faster but they still display the data that way. Here's an LHC example:

enter image description here

In the above, the data has been organized for display according to the shape and geometry of the detector. The raw data itself would be digitized and just a collection of zeroes and ones.

There are typically two types of measurements, "position" and "energy". The position measurements are typically binary, that is, they indicate that a particle either came through that (very small) element or did not. In the above, the yellow lines are position measurements.

Note that some of the yellow lines are curved. Actually all of them are curved at least some. This is because there is a strong magnetic field. The curvature of the particle tracks helps determine what particles they are. For example, given the same speed and charge, a heavier particle will run straighter.

The radius of curvature is given by:
$$r = \frac{m\gamma E}{pB}$$ where $\gamma = 1/\sqrt{1-(v/c)^2}$ is the Lorentz factor, $E$ is the energy, and $p$ is the momentum. This helps determine the particle type and energy.

Energy measurements are generally analog. In them, one gets an indication of how much energy was deposited by the particle as it went by. In the above, the light blue and red data are energy measurements. For these measurements, one doesn't get such a precise position, but the amplitude is very accurate.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user Carl Brannen

answered Jun 22, 2011 by Carl Brannen (240 points) [ no revision ]

Most voted comments show all comments

The reason I ask is to entertain a thought experiment around "How could opening science data to the masses aid in advancement of a given field. Which may be a silly question as anyone who is passionate about particle collision data most likely is working with it already. Is there any niche in the process from sensor data >> transformation >> analysis >> conclusion that can be filled by a corporation or opensource community? Could there be a role for a for profit corp in physics data, where it is mutually beneficial?

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user opensourcechris

commented Jun 22, 2011 by opensourcechris (30 points) [ no revision ]

@dmckee; Yes, my recollection is from the 1980s, I'll correct. @opensourcechris; I think you'd have to talk to someone in the labs. My guess is that most of it is done by academia and they trust themselves more than others.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user Carl Brannen

commented Jun 23, 2011 by Carl Brannen (240 points) [ no revision ]

@Carl you should add that from the curvature one also gets the momentum which together with energy measurements helps determine the mass of the particle.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user anna v

commented Jun 23, 2011 by anna v (2,005 points) [ no revision ]

@opensourcechris this would be an exercise in futility. The raw data are useless without the metadata, including the contents of logs by the shift babysitting the detectors. The for profit niches happen when the detectors are built. A lot are outsourced to industry . There is no profit from data gathering to be shared out. The institutes even pay for the publications.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user anna v

commented Jun 23, 2011 by anna v (2,005 points) [ no revision ]

@opensourcechris I think generally speaking the main thing preventing institutions from releasing data is the shear amount of bandwidth it would take to provide it to everyone. The LHC, for example, produces one petabyte of raw data every second. Automatic filters take out the noise and not-useful data and only a small fraction is recorded. At the end of these cuts, only 25 petabytes is recorded annually. This is a huge amount of data, only 20% or so of it is stored at CERN and the rest is distributed to affiliated organizations.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user Benjamin Horowitz

commented Jun 24, 2011 by Benjamin Horowitz (195 points) [ no revision ]

Most recent comments show all comments

Note that this view isn't even remotely "raw". Considerable reconstruction and tracking has already been done.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user dmckee

commented Jun 22, 2011 by dmckee (420 points) [ no revision ]

This is just amazing science, thank you. I would still like to see some of the data generated by the sensor elements and also possibly at various reconstruction/aggregation stages. Can you help with that?

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user opensourcechris

commented Jun 22, 2011 by opensourcechris (30 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

Years ago, as a grad student in particle physics, I used to work on the PHENIX experiment at BNL. Before I had shown up (I think near the end of run 2) the main data structure used for analysis was called a "tuple". Tuples were pretty much like the lists used today in Python with a bit more structure to make access faster and contained the actual data corresponding to what we called an "event" (something interesting that happened in the detector which was captured by the various subsystems and written eventually into a tuple). Unfortunately tuples were generally just too large and one needed to analyze a smaller subset of the entries in the tuples -- so micro-tuples were born and then shortly afterwards nano-tuples.

There were different types of nano-tuples defined and used by the various working groups on the experiment which had different subsets of the original tuples. Which type of nano-tuple you used depended on the analysis you were trying to do and roughly corresponded to the working group you were in. In my case this was heavy flavor where I was studying charm.

So a nano-tuple might look like this:

(x_1, x_2, ..., x_n)

where the x_i would be all the different quantities of interest associated with the event: transverse momentum, energy deposited in the EM-cal, blah, blah, blah..

In the end the data analysis revolved around the manipulation of these nano-tuples and amounted to:

Put in a request with the data guys to get raw data collected by the different subsystems in the form of nano-tuples.
Wait a couple days for the data to show up on disk since it was a huge set of data.
Loop over the events (nano-tuples) filtering out the stuff you weren't interested in (usually events associated with pions)
Bin the data in each entry of the tuple
Overlay the theoretical prediction of these distributions on top of what you extracted from the tuple
Make your statement about what was going on. (confirmation of theory, conjecture about disagreement, etc..)

The truth is that we rarely looked at the RAW, raw data streaming out of the detector unless you were on shift and part of the data acquisition system had stopped operating for some reason. But in that case the data was pretty meaningless when you looked at it. You'd be more concerned that the data wasn't flowing. However if you were one of the people responsible for maintaining a subsystem (say, EM-cal) then you'd probably be doing calibration on a regular basis and regularly looking over raw data from your particular subsystem to tune the calibration and make the raw data analyzable.

Mostly the raw data was only meaningful for the subsystem you had a responsibility to and looking at all the raw data from all the subsystems as a whole wasn't really done. I don't think anyone had that kind of breadth across all the different subsystems...

Regarding the data for the visualizations you asked about: I believe these were specially defined nano-tuples which had entries from enough of the subsystems to allow for reconstruction and the final visualization (pretty pictures) but I'm 99% sure the visualizations weren't created from the "raw" data. Rather they were done using these nano-tuples.

If you poke around the PHENIX website you can see some pretty fancy animations (at least fancy for back then) of collisions in the detector. Mostly these pics and movies were part of a larger experiment wide PR effort. They were made by a guy named Jeffery Mitchel and you should email him to find out more details on the format of the data he used (mitchell@bnl.gov) Everyone was talking about the LHC back then (2004 or 2005ish) and most of them have long since moved on so you can probably get more insight into the "raw" data created by the LHC today and used for those visualizations if you ask someone like him directly.

This post imported from StackExchange Physics at 2014-07-18 04:56 (UCT), posted by SE-user unclejamil

answered Apr 20, 2013 by unclejamil (140 points) [ no revision ]

Your comment on this answer:

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if a comment is added after mine:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

[captcha placeholder]

Please complete the anti-spam verification

Your answer

Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
Please consult the FAQ for as to how to format your post.
This is the answer box; if you want to write a comment instead, please use the 'add comment' button.

Live preview (may slow down editor) Preview

Your name to display (optional):

Email me at this address if my answer is selected or commented on:

Privacy: Your email address will only be used for sending these notifications.

Anti-spam verification:

If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:
p$\hbar$ysicsOverf$\varnothing$ow
Then drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds).

Please complete the anti-spam verification

News

Tools for paper authors

Tools for SE users

Public \(\beta\) tools

Most popular tags

Site Statistics

What does the data in various stages of analysis from a particle collision look like?

Your comment on this question:

Live Preview

Preview

2 Answers

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview

News

Tools for paper authors

Tools for SE users

Public \(\beta\) tools

Most popular tags

Related questions

Site Statistics

What does the data in various stages of analysis from a particle collision look like?

Your comment on this question:

Live Preview

Preview

2 Answers

Your comment on this answer:

Live Preview

Preview

Your comment on this answer:

Live Preview

Preview

Your answer

Live Preview

Preview