Skip to content

Selecting Data

Selecting data is one of the most important aspects when you perform an analysis. Before going into the details of selection it is healthy to have a notion of the data flow in LHCb.

Data Flow of LHCb

For more information about the LHCb data flow, the stripping lines, and how do they work, check the LHCb Starterkit website

The data processing at the LHC has to be efficient and fast in order to process the gigantic flow of information from the collisions taking place. At LHCb, there are multiple steps in order to filter the flow of information:

  • Triggering at the detector : This layer consist of low-level triggers (hardware) and high-level triggers; where the latter is performed by the software Moore.
  • Reconstruction process : the software Brunel is in charge of this step. The software takes the signals that triggered in the detector and transforms them into clusters and tracks. The output of this layer are a type of files called DSTs.
  • Stripping : The output files from the reconstruction process are further filtered due to the huge size of DST files. This filtering is divided into the so called streams, where every stream is related to a type of physics that is studied at LHCb.

Stripping

The stripping process is central to know what type of data you want to use. As was previously mentioned, the output of this step is subdivided into streams. The idea is to create output files that are smaller in size categorized by physics that researchers may be inerested in. For instance, common streams found in stripping lines are:

  • Bhadron
  • Charm
  • Dimuon
  • Electroweak
  • Leptonic
  • Semileptonic

Streams are further categorized in to stripping lines that specify the sequence of algorithms applied to select a particular candidate (e.g. a particular B hadron decay).

Screenshot

Selecting Data at the Open Data Portal

Data collected during Run 1 of LHCb can be found on the CERN Open Data Portal. As you can see, the name of the data sets indicates all the relevant information including the stream and stripping version.

Screenshot

In this case, the set of stripping lines in the Bhadron stream would address decays coming from B hadrons. In general, when choosing a particular dataset to analyze, it is recommended to check the corresponding stripping line index to understand if a stripping line within meets your needs. The link to the index can be found by navigating to a dataset entry (like the one pictured above), and scrolling under the heading How can you use these data? near the bottom of the page.

The datasets are organized by stream, magnet polarity, and stripping version. To process the full dataset for a particular stream and year, both magnet polarities will need to be processed independently.

One will generally find a much larger selection of stripping lines in the earliest stripping version for a particular running year, as further stripping versions represent "restrippings" or reprocessing of the data with additional stripping line selections. Stripping lines are typically not duplicated from one version to another in the stripping line index, so it is recommended to check the index for each version starting with the earliest version until a stripping line is found that suits your analysis needs.

Entries in the stripping line index are organized both by stream and often represent the selection for a particular decay. The stripping lines typically follow a naming convension that includes the top level decaying particle first in the string, followed by intermmediate and final state particles, separated by a "2" to represent the decay process. You may also notice additional descriptive words or acronyms relating to the selection process, e.g. "Promt" denotes that the particle in question is likely to come from the primary vertex and not from the decay of a particle that travelled far enough to distinguish a secondary decay vertex, while "Resolved" or "Merged" denote that \(\pi^0\) particles in the decay either were reconstructed as two independent electromagnetic calorimeter clusters or a single calorimeter cluster respectively. There are of course exceptions to this naming convention, and the best way to determine how the data is selected is by navigating to a particular entry in the index to observe the sequence of algorithms applied on the reconstrued data to select candidates. As you scroll through the algorithms, you can see how the candidates are selected starting from collections of different types of identified particles. In certain steps of the sequence, you may see a "DecayDescriptor(s)" entry that shows a string representation of the total or indermediate decay associated with this particular stripping line.

You must use the stripping version corresponding to the line index that has the line you want in it, or else the ntupling software DaVinci will not run properly. In the case of a stripping line existing in multiple stripping versions, only one stripping version should be chosen to avoid duplication of events and candidates. This can be decided by navigating the stripping line entry on the index page for the respective stripping versions to determine which version of the stripping line best suits your analysis needs.

Choosing a stripping line will enable two things:

  • Verifying the appropriate stream and dataset to search for the corresponding candidates
  • Allowing for prefiltering of events in a particular dataset based on if the stripping line has output or not

The latter can be achived by adding the following to your DaVinci options file (see examples in following pages for more details):

# Add a prefilter to make the process faster
flters = LoKi_Filters (STRIP_Code = "HLT_PASS_RE('<StrippingLineName+Decision>')")
DaVinci().EventPreFilters = flters.filters('Filters')
where <StrippingLineName+Decision> is to be replaced with the name of the stripping line you are interested in analyzing + the string "Decision". This is included in the examples in the following pages for your reference.

Example : Choosing a \(Z^0\)\(\mu^{+}\mu^{-}\) decay

Lets say we would like to study the decay of a Z boson into a muon pair. To achieve this we should follow these steps:

  1. Look for the correct stream in the list of available datasets. In this case, the correct stream would be Electroweak or EW. You can select the stream using the filter option located at the left side of the webpage.

    Screenshot

  2. Lets select the following dataset:

    Screenshot

  3. To make sure the decay of interest is on the selected stripping line, click on Stripping lines index for stripping21r1 and check on the EW stream if the decay is present. In this case, you will see a large list that includes StrippingZ02MuMuLine.

  4. Navigate back to the open data record and scroll down until you find the File and indexes section (it might take a second to load). Then, click on Download index and select the txt format.

    Screenshot

  5. The downloaded file should contain a gigantic list of files. As a reference, here is a fragment of the list from the stripping line that we have used as an example:

    Screenshot

  6. This is the type of file name that you will use when you indicate DaVinci which files to process in order to get the ntuples you need.