• Register
PhysicsOverflow is a next-generation academic platform for physicists and astronomers, including a community peer review system and a postgraduate-level discussion forum analogous to MathOverflow.

Welcome to PhysicsOverflow! PhysicsOverflow is an open platform for community peer review and graduate-level Physics discussion.

Please help promote PhysicsOverflow ads elsewhere if you like it.


PO is now at the Physics Department of Bielefeld University!

New printer friendly PO pages!

Migration to Bielefeld University was successful!

Please vote for this year's PhysicsOverflow ads!

Please do help out in categorising submissions. Submit a paper to PhysicsOverflow!

... see more

Tools for paper authors

Submit paper
Claim Paper Authorship

Tools for SE users

Search User
Reclaim SE Account
Request Account Merger
Nativise imported posts
Claim post (deleted users)
Import SE post

Users whose questions have been imported from Physics Stack Exchange, Theoretical Physics Stack Exchange, or any other Stack Exchange site are kindly requested to reclaim their account and not to register as a new user.

Public \(\beta\) tools

Report a bug with a feature
Request a new functionality
404 page design
Send feedback


(propose a free ad)

Site Statistics

205 submissions , 163 unreviewed
5,079 questions , 2,229 unanswered
5,348 answers , 22,758 comments
1,470 users with positive rep
819 active unimported users
More ...

  What does the data in various stages of analysis from a particle collision look like?

+ 6 like - 0 dislike

I've been following the news around the work they are doing at the LHC particle accelerator in CERN. I am wondering what the raw data that is used to visualize the collisions looks like. Maybe someone can provide a sample csv or txt?

Edit: In addition to the raw data, it also seems that I should be interested in the data used at the point where a physicist might begin their analysis, possibly at the "tuple" stage of the data transformation. I'm familiar with RDF tuples, are there any parallels between the two tuples?

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user opensourcechris
asked Jun 22, 2011 in Experimental Physics by opensourcechris (30 points) [ no revision ]
See also: A reference request for real world experimental data.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user dmckee
Keep in mind that data from the LHC come in terrabytes and the processing needs have created a whole new way of data handing, called GRID. cdsweb.cern.ch/record/840543/files/lhcc-2005-024.pdf . It describes the handling of the data.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user anna v

2 Answers

+ 6 like - 0 dislike

Different pieces of equipment will produce somewhat different looking data, but typically it consists of voltages defined as a function of time. In some cases (spark chambers, for example) the "voltage" is digital, and in others it is analog.

Traditionally, the time series for the data is slower than the times required for the (almost light speed) particles to traverse the detector. Thus one had an effective photograph for a single experiment. More modern equipment is faster but they still display the data that way. Here's an LHC example:

enter image description here

In the above, the data has been organized for display according to the shape and geometry of the detector. The raw data itself would be digitized and just a collection of zeroes and ones.

There are typically two types of measurements, "position" and "energy". The position measurements are typically binary, that is, they indicate that a particle either came through that (very small) element or did not. In the above, the yellow lines are position measurements.

Note that some of the yellow lines are curved. Actually all of them are curved at least some. This is because there is a strong magnetic field. The curvature of the particle tracks helps determine what particles they are. For example, given the same speed and charge, a heavier particle will run straighter.

The radius of curvature is given by:
$$r = \frac{m\gamma E}{pB}$$ where $\gamma = 1/\sqrt{1-(v/c)^2}$ is the Lorentz factor, $E$ is the energy, and $p$ is the momentum. This helps determine the particle type and energy.

Energy measurements are generally analog. In them, one gets an indication of how much energy was deposited by the particle as it went by. In the above, the light blue and red data are energy measurements. For these measurements, one doesn't get such a precise position, but the amplitude is very accurate.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user Carl Brannen
answered Jun 22, 2011 by Carl Brannen (240 points) [ no revision ]
Most voted comments show all comments
The reason I ask is to entertain a thought experiment around "How could opening science data to the masses aid in advancement of a given field. Which may be a silly question as anyone who is passionate about particle collision data most likely is working with it already. Is there any niche in the process from sensor data >> transformation >> analysis >> conclusion that can be filled by a corporation or opensource community? Could there be a role for a for profit corp in physics data, where it is mutually beneficial?

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user opensourcechris
@dmckee; Yes, my recollection is from the 1980s, I'll correct. @opensourcechris; I think you'd have to talk to someone in the labs. My guess is that most of it is done by academia and they trust themselves more than others.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user Carl Brannen
@Carl you should add that from the curvature one also gets the momentum which together with energy measurements helps determine the mass of the particle.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user anna v
@opensourcechris this would be an exercise in futility. The raw data are useless without the metadata, including the contents of logs by the shift babysitting the detectors. The for profit niches happen when the detectors are built. A lot are outsourced to industry . There is no profit from data gathering to be shared out. The institutes even pay for the publications.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user anna v
@opensourcechris I think generally speaking the main thing preventing institutions from releasing data is the shear amount of bandwidth it would take to provide it to everyone. The LHC, for example, produces one petabyte of raw data every second. Automatic filters take out the noise and not-useful data and only a small fraction is recorded. At the end of these cuts, only 25 petabytes is recorded annually. This is a huge amount of data, only 20% or so of it is stored at CERN and the rest is distributed to affiliated organizations.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user Benjamin Horowitz
Most recent comments show all comments
Note that this view isn't even remotely "raw". Considerable reconstruction and tracking has already been done.

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user dmckee
This is just amazing science, thank you. I would still like to see some of the data generated by the sensor elements and also possibly at various reconstruction/aggregation stages. Can you help with that?

This post imported from StackExchange Physics at 2014-07-18 04:55 (UCT), posted by SE-user opensourcechris
+ 5 like - 0 dislike

Years ago, as a grad student in particle physics, I used to work on the PHENIX experiment at BNL. Before I had shown up (I think near the end of run 2) the main data structure used for analysis was called a "tuple". Tuples were pretty much like the lists used today in Python with a bit more structure to make access faster and contained the actual data corresponding to what we called an "event" (something interesting that happened in the detector which was captured by the various subsystems and written eventually into a tuple). Unfortunately tuples were generally just too large and one needed to analyze a smaller subset of the entries in the tuples -- so micro-tuples were born and then shortly afterwards nano-tuples.

There were different types of nano-tuples defined and used by the various working groups on the experiment which had different subsets of the original tuples. Which type of nano-tuple you used depended on the analysis you were trying to do and roughly corresponded to the working group you were in. In my case this was heavy flavor where I was studying charm.

So a nano-tuple might look like this:

(x_1, x_2, ..., x_n)

where the x_i would be all the different quantities of interest associated with the event: transverse momentum, energy deposited in the EM-cal, blah, blah, blah..

In the end the data analysis revolved around the manipulation of these nano-tuples and amounted to:

  1. Put in a request with the data guys to get raw data collected by the different subsystems in the form of nano-tuples.
  2. Wait a couple days for the data to show up on disk since it was a huge set of data.
  3. Loop over the events (nano-tuples) filtering out the stuff you weren't interested in (usually events associated with pions)
  4. Bin the data in each entry of the tuple
  5. Overlay the theoretical prediction of these distributions on top of what you extracted from the tuple
  6. Make your statement about what was going on. (confirmation of theory, conjecture about disagreement, etc..)

The truth is that we rarely looked at the RAW, raw data streaming out of the detector unless you were on shift and part of the data acquisition system had stopped operating for some reason. But in that case the data was pretty meaningless when you looked at it. You'd be more concerned that the data wasn't flowing. However if you were one of the people responsible for maintaining a subsystem (say, EM-cal) then you'd probably be doing calibration on a regular basis and regularly looking over raw data from your particular subsystem to tune the calibration and make the raw data analyzable.

Mostly the raw data was only meaningful for the subsystem you had a responsibility to and looking at all the raw data from all the subsystems as a whole wasn't really done. I don't think anyone had that kind of breadth across all the different subsystems...

Regarding the data for the visualizations you asked about: I believe these were specially defined nano-tuples which had entries from enough of the subsystems to allow for reconstruction and the final visualization (pretty pictures) but I'm 99% sure the visualizations weren't created from the "raw" data. Rather they were done using these nano-tuples.

If you poke around the PHENIX website you can see some pretty fancy animations (at least fancy for back then) of collisions in the detector. Mostly these pics and movies were part of a larger experiment wide PR effort. They were made by a guy named Jeffery Mitchel and you should email him to find out more details on the format of the data he used (mitchell@bnl.gov) Everyone was talking about the LHC back then (2004 or 2005ish) and most of them have long since moved on so you can probably get more insight into the "raw" data created by the LHC today and used for those visualizations if you ask someone like him directly.

This post imported from StackExchange Physics at 2014-07-18 04:56 (UCT), posted by SE-user unclejamil
answered Apr 20, 2013 by unclejamil (140 points) [ no revision ]

Your answer

Please use answers only to (at least partly) answer questions. To comment, discuss, or ask for clarification, leave a comment instead.
To mask links under text, please type your text, highlight it, and click the "link" button. You can then enter your link URL.
Please consult the FAQ for as to how to format your post.
This is the answer box; if you want to write a comment instead, please use the 'add comment' button.
Live preview (may slow down editor)   Preview
Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
Anti-spam verification:
If you are a human please identify the position of the character covered by the symbol $\varnothing$ in the following word:
Then drag the red bullet below over the corresponding character of our banner. When you drop it there, the bullet changes to green (on slow internet connections after a few seconds).
Please complete the anti-spam verification

user contributions licensed under cc by-sa 3.0 with attribution required

Your rights