Design of a general purpose data collection module for the NuTel telescope


Department of Physics, National Taiwan University, Taipei 106, Taiwan

Received 15 March 2005; accepted 2 June 2005
Available online 22 July 2005

Abstract

We have developed a Data Collection Module (DCM) to digitize, store and select data from the NuTel telescope, which observes Cherenkov photons from near horizontal air showers. Multi-anode photo-multiplier tubes (MAPMT) are used as photon-sensitive devices. DCM processes 32 input signals from the charge-sensitive pre-amplifiers located close to the MAPMT. The module design uses 40-MHz 10-bit pipeline ADCs and medium-size FPGAs. A programmable gain/attenuation control \( \times 0.5–2 \) is applied to each channel before the ADC providing a comfortable operation with a multi-channel system using MAPMT as photon-sensitive device because the gain of MAPMT fluctuates from channel to channel as 1:3. DCM has a flexible on-board trigger inside FPGA firmware. The system design is made in 32-bit 33-MHz cPCI. Thirty-two DCMs housed in two crates process signals from the two telescopes of 512 channels each looking to the same direction for coincidence.

© 2005 Elsevier B.V. All rights reserved.
PACS: 07.05.Hd; 07.50.—e; 07.50.Qx; 85.60.Ha

Keywords: ADC; Trigger; DAQ; cPCI; MAPMT; FPGA

1. Introduction

The NuTel collaboration [1] is building a Cherenkov telescope to be installed on a mountain site for observing near horizontal air showers emerging from another mountain [2–4] or the ground [5,6]. Cosmic tau neutrinos are the primary source of such showers. This technique will be realized for the first time in the \( v_\tau \) energy range of 1–1000 PeV.

To increase the acceptance of the NuTel telescope we need to detect events with energy as low as possible, corresponding to a small number
of photoelectrons. However, the night sky background (NSB) photon flux is strong; it has been measured to be \( \sim 2 \times 10^{12} \, \text{photons m}^{-2} \, \text{s}^{-1} \, \text{sr}^{-1} \) near horizon line (mountain) with wavelength in 300–400 nm [7,8] and that reflected from a mountain is \( \sim 10 \) times less. To reject NSB we use two identical telescopes located close to each other and looking in the same direction. Fig. 1 shows simulated NSB light (top picture) and light from a 1 PeV electron shower (bottom picture), both within a gate of 25 ns. The simulated conditions are: an effective light collection area of 1 m\(^2\), pixel size of 0.5\(^\circ\) x 0.5\(^\circ\), PMT quantum efficiency of 20% and optical efficiency of 50%.

While NSB is random both in time and direction, a pulse of Cherenkov light is short (10–100 ns) and two telescopes detect it at the same time and from the same direction. The main idea of the background reduction in NuTel is the following:

- **Hardware trigger**: looking for the clusters of fired pixels in both telescopes at the same time (in any direction).
- **On-line software trigger** (simultaneously with readout): additional checking the coincidence in direction of the light pulses at both telescopes.
- **Off-line selection**: more precise reconstruction of the shower direction, looking for a possible source on the sky.

NuTel will start operation from the prototype configuration of two identical telescopes with \( \sim 1 \, \text{m}^2 \) effective light collection area, 16\(^\circ\) x 8\(^\circ\) field of view and 0.5\(^\circ\) x 0.5\(^\circ\) pixel size. The system will consist of 2 \times 512 channels. The expected event rate in this configuration is \( \sim 1 \) event/year. If one event is observed during two years of operation, the system could be upgraded to a larger one consisting of four telescopes with 100\(^\circ\) x 8\(^\circ\) field of view each.

### 2. System overview

The DAQ system of NuTel consists of two cPCI crates connected to each other via Ethernet, each with one single board computer (SBC) and 16 Data Collection Modules (DCMs). Each chassis will process information from one telescope (512 channels). DCM combines functions of 32-channels ADC module and trigger module. SBC is used for the readout/storage and for on-line software trigger purposes. During daytime SBC will be used for off-line data processing. Fig. 2 shows the schematics of the NuTel electronics system.

We use Hamamatsu H7546 MAPMT with 8 x 8 = 64 pixels as the photon sensor. The quantum efficiency reaches a maximum value at wavelength 350–450 nm, a good match to Cherenkov photons. The gain is around 3 \times 10^5 at 800 V, but is not uniform for all pixels. The maximum gain can be five times the minimum gain (typical as 3:1). Such non-uniformity is reduced in two ways. First, we connect tubes with similar mean gain to the same channel of the high voltage power supply. Second, we adjust the gain inside DCM before the ADC (programmable, between \( \times 0.5 \) and \( \times 2 \)) channel by channel to compensate the difference.

The range of interest of the energy spans from 1 PeV to hundreds of PeV, necessitating dynamic range of \( \sim 500 \). A 10-bit ADC alone is unable to reach the desired dynamic range. Using a 12-bit ADC instead of 10-bit would double the cost of the electronics. One cost-effective way of enlarging

---

Fig. 1. Simulated random background and Cherenkov light.
the dynamic range is via signal sharing. A 64-channel signal-sharing board (SSB) is plugged onto the MAPMT anode pins. On this board every channel is connected to two other channels resistively with current division 0.9:0.05:0.05 to form a cross-talk network. The signal-sharing channels are about 2.1° away in the field of view in order to reduce the probability of signal shining by the same shower (Fig. 3). With the SSB the dynamic range of the system increases by a factor of about 10, becoming 1000.

After the SSB the signals go to the charge-sensitive pre-amplifier board, which converts a charge pulse from PMT into the output voltage signals. The charge collection capacitor at the pre-amplifier is discharged with time constant of \( \frac{1}{C_{24}} \) of 390 ns to avoid a base-line shift due to the NSB. This timing constant is matched to allow a simple determination of number of photoelectrons at each clock interval inside FPGA \((\text{exp}\{-25 \text{ ns}/390 \text{ ns}\} \approx \frac{15}{16}\)).

Signals from pre-amplifiers are passed to the DCMs via twisted pair cables. Each DCM card accepts 32 analog inputs. One 10-bit pipelined ADC running at 40 MHz digitizes the voltage of one input channel. FPGAs of the Xilinx Spartan-2e family process data from the ADC. There are two dual-port RAM blocks both of 256 × 16 bit size per channel inside the FPGA. One block is used as a cycle memory to keep data for the time needed by the trigger logic operation, maximal keeping time is 256 × 25 ns = 6.4 ms. Another memory block is used as an event buffer to store 32 triggered events of eight clocks (200 ns) of data each: two clocks before signal (pedestal + jitter), four clocks of event (10–100 ns) and two clocks after event.

One SBC card collects event data from 16 DCMs located in the same crate through the 33 MHz 32-bit wide compact PCI bus. To maximize data transfer rate, data are read when the event buffers of DCMs are half-full. The CPU on SBC assembles the event and applies more
sophisticated background rejection algorithm to further reduce data volume. The data are finally written to storage by the SBC. The SBC runs under a Linux operating system.

The method of calculation of the charge passed to the pre-amplifier during each system clock is illustrated in Fig. 4. This method is based on the optimization of discharging time at the charge-sensitive amplifier such that if there is no charge injection (new photoelectrons), the output of the module will be \( \frac{15}{16} \) of the amplitude one clock (25 ns) before. So, the charge injected during every system clock is calculated as the current ADC code minus an expected amplitude, which is \( \frac{15}{16} \) of the previous ADC code. Because there is no negative overshoot after signal, such logic is less sensitive to the previous signals. So, it operates more correctly in the difficult background conditions in comparison with the traditional “simple difference” method.

3. Hardware design details

3.1. General design

A detailed description of the DCM is provided in Ref. [9]. Fig. 5 shows a general schematic of DCM. The main parts of the hardware design are:

- 32 analog channels consist of differential receiver amplifier, programmable potentiometer, analog switch for calibration purposes and 40-MHz 10-bit pipelined ADC.
- Four ADC_FPGAs process data from ADCs. Signals for trigger purpose are calculated inside ADC_FPGA, three programmable discriminators (three different thresholds) at every channel are used for this purpose.
- A trigger FPGA collects the discriminator outputs from all 32 channels on board and makes the trigger request according to the logic implemented inside its firmware code.
- LVDS ↔ TTL transformers between front panel connector J3 and Trigger_FPGA are used for connection Trigger_FPGAs from different DCMs into the integrated system by daisy chain mechanism.
- A flash memory IC is used for the storage of all FPGAs configuration firmware. Flash RAM usage fraction is only 3%, the remaining memory could be used to store hardware characteristics (gains) and default settings in the future.
Cherenkov photons pulse

\[ \Delta A \sim Q \]

Delay due pipeline ADC

exp(-t/390ns)

exp(-25/390) = 15/16

Reconstructed photons pulse

System clock (40 MHz)

\[ \Delta A \sim Q \]

Fig. 4. Charge calculation at every clock.

Fig. 5. Schematics of Data Collection Module.
• A configuration CPLD controls configuration/initialization of all FPGAs using data from the flash memory.
• A calibration circuit based on 12-bit DAC is used for the calibration of ADCs and pre-amplifiers.
• The Control_FPGA controls the data transfer processes through the local bus, the calibration process and the memory (address) distribution for all units inside module.
• PLX PCI9054 is used as adapter between the local bus and the cPCI bus.
• A special IC and MOSFET switches are used to control and limit a large current surge at the turn-on time. This feature is very useful in large electronic systems.

3.2. Analog part before FPGA

Fig. 6 shows the schematic for one channel. Main parts of this design are: differential receiver amplifier; programmable potentiometer; analog switch; 10-bit 40-MHz pipelined ADC “AD9203ARU”. All components are of SMD-type and are mounted on both sides of PCB.

Because the gain of MAPMT fluctuates from channel to channel typically by 1:3 a potentiometer is used to compensate these fluctuations. Effective gain $G$ of this circuit is

$$G \approx \frac{1K + R_1}{2K - R_1}$$

where $R_1$ varies between $\sim 0\, \Omega$ and $1\, \Omega$.

If $R_1 = 0$, gain is 0.5; if $R_1 = 500\, \Omega$, gain is 1; if $R_1 = 1K$, gain is 2. A programmable potentiometer has 256 resistor pairs $\{R_1, R_2\}$ inside. The accuracy of gain setting is

$$\frac{\Delta G}{G} \approx \frac{1}{G} \times \left( \frac{\text{d}G}{\text{d}R_1} \times \Delta R_1 \right)$$

$$= \frac{3K}{(2K - R_1)(1K + R_1)} \times \frac{1K}{255}.$$

The best accuracy of $3/1.5/1.5/255 = 0.52\%$ is when $G = 1$ and worst accuracy of $3/2/255 = 0.59\%$ is when $G = 0.5$ or 2. This precision is sufficient as the statistic accuracy for 1000 photo-electrons is only 3.2%.

An analog switch is used for ADC calibration (pedestal and gain), it can switch the ADC input between signal from receiver and signal from 12-bit DAC. Digitized data from the ADC pass to four FPGA named ADC_FPGA. Dynamic range of the ADC is 2 V, or $\sim 2\, \text{mV/bin}$.

3.3. Calibration circuit

Fig. 7 shows the schematics of the calibration circuit used at DCM. Principal parts of this design are: 12-bit DAC; two amplifiers operated with gains $\times 2$ and $\times 1$; field-effect transistor used for generation a charge pulse to calibrate pre-amplifier; analog multiplexer $4 \rightarrow 1$ for DAC calibration by external digital multimeter.

The output range of the DAC is 4.096 V; that means $\sim 1\, \text{mV/bin}$. So, the resolution of DAC is half that of the ADC. This allows a precise calibration of ADCs. Because the pre-amplifier is a charge-sensitive device, DCM generates a charge pulse to calibrate the pre-amplifier by using a voltage-to-ground step through capacitor. DAC could be calibrated by an external digital multimeter (DMM) through the front-panel.
3.4. Interconnections between DCMs

DCMs are connected with each other via twisted pair cables for creating a stand-alone system. System consists of the two cPCI chassis. Every chassis includes 15 slave DCMs and one master DCM. Differences between the two master DCMs (A and B) are in “system clock” and “reset” signals only. The two subsystems share the same clock and reset. The front-panel connector is connected to the trigger FPGA via the set of LVDS receivers/transmitters. Each module has six inputs and six outputs. Three kinds of signals are used in the system:

System clock. “Master A” DCM generates it from the 40 MHz local bus clock. This signal passes to the both master DCMs. Lengths of the cables used for this purpose are adjusted to compensate for the delay due to the cable between the two crates. The system clock received back at both masters is distributed to the slave DCMs in both crates.

Reset. “Master B” DCM generates it at the command from SBC. This signal passes to both master DCMs. Then reset signal at both masters is distributed to the slave DCMs in both crates. Such distribution of system clock and system reset signals between two master DCMs is made to minimize the size of the front-panel connector.

Trigger. Any DCM could generate a trigger request according to the logic implemented in the trigger FPGA. A trigger request passes to the master DCM in crates via the daisy chain mechanism. Two bits are used for transporting trigger signals. This allows an operation with up to three different trigger types. Each slave DCM receives a trigger request from the left DCM, adds its own one by the logical OR and sends it to the right module. The rightmost DCM in the chassis is the master DCM. Both master DCMs send the summarized trigger request from the whole chassis to each other. A final trigger decision is made in both master DCMs after checking a timing coincidence between two telescopes. Trigger decision is distributed to all DCM modules in both crates synchronously.

4. Firmware design details

Most of the features of the DCM functionality are implemented in six FPGAs and one CPLD to take advantage of the flexibility and economy of this highly integrated approach. All FPGA are of the Xilinx Spartan-2e family. Codes used for the FPGA configuration are stored inside 32M flash-memory on board and are automatically downloaded when power is switched on. VHDL hardware design language is used for all firmware codes. Xilinx ISE-6.2 was used to develop firmware code and ModelSim II v5.7g—for simulations. The size of the used resources inside FPGAs is the following: \( \sim 3500 \) logic cells (\( \sim 50\% \)) inside each of four ADC FPGA; \( \sim 1400 \) cells (\( \sim 60\% \)) inside trigger FPGA and \( \sim 250 \) logic cells (\( \sim 10\% \)) inside control FPGA. So high integration allows usage of a complicated logic design with a wide flexibility on the programmable (software) level.

The largest code is developed for the operation inside four ADC FPGAs, each process data from eight ADCs. That is an all-in-one project used four different user constraint files (pin assignments) for the four FPGAs with the same core part. Four different codes operate inside trigger FPGA for four different kinds of DCM: slave, single master, master-A and master-B DCMs. Codes for control FPGA and configuration CPLD are the same for all kinds of DCM.

4.1. Firmware design of ADC_FPGA

The structure of the core module of ADC FPGA firmware design is shown in Fig. 8. The charge passed to the pre-amplifier during one system clock \( \Delta A \) is calculated by the module “one_channel_logic.vhd”. This charge is compared with the three programmable thresholds: LL (low level), HL (high level) and VHL (very high level). Outputs from the comparators go to the trigger FPGA for making a trigger request and are delayed inside cycle RAM together with the ADC data on a time needed for the trigger operation. LL and HL signals are used in searching of event clusters (cluster sample is shown at the bottom part of Fig. 1) and VHL signal is used as logical OR combination from the whole
detector. The trigger decision signal from trigger FPGA starts recording information from the cycle RAM into buffer RAM. Because this process needs eight system clocks (size of one event), other trigger signal occurred during this time interval is delayed by the “normalized_trigger.vhd” module. As the format of one event in buffer RAM is eight 128-bit words while the format of the data on the bus is 80 16-bit words, a reformatting process is needed when data are read from buffer RAM. This process is controlled by a “buffer_control.vhdl” module. A set of programmable registers with pedestals, thresholds, time intervals, delays, etc. used for the operation of the whole firmware code is located inside the “setting_registers.vhd” module together with the local bus interface.

As 32-bit cPCI is used, two FPGA are read at the same time: channels 0–7 (16–23) via D0–D15 and channels 8–15 (24–31) via D16–D31. To verify data transmission, the D15 (D31) bit is used as odd/even bit for D0–D14 (D16–D30). The size of one event is 160 32-bit words for 32 channels. To reduce dead time due to the interrupt latency in cPCI a block of 16 events is read at once. Integrated background charges for all channels are included in the event information to enable monitoring of the stars moving across the telescope’s field of view.

4.2. Firmware design of Trigger FPGA

Because the four kinds of DCM are used (slave, single master, master-A and master-B) in the system and the difference between these modules is in the firmware code for trigger FPGA only, there are four modules on the top of hierarchy of firmware design for trigger FPGA. All top-level modules include an individual synchronization part: slave and master-B DCM use an external system clock while single master and master-A DCM generate it from its own local clock; also slave and master-A DCM use external system reset while single-master and master-B generate it on the command from SBC. The firmware module for slave DCM does not include the “trigger_decision.vhdl” module as the trigger decision is made inside the master DCM.
Fig. 9 shows schematics of “mastera_dcm.vhdl” and “masterb_dcm.vhdl” modules. Two master DCMs (“master-A” and “master-B”) are used at the two-telescopes NuTel system. Firmware schematics operating in the trigger FPGA of these modules are very similar. The only difference is in the synchronization part. Each of these modules collects trigger requests from the whole chassis via a daisy-chain mechanism, makes the final trigger decision after checking a timing coincidence between two telescopes, generates signal for slave DCMs and sends an interrupt request to SBC to start the data readout.

Fig. 10 shows the design of the clocking signals in the “synchro_slave.vhdl” module. A special feature of the Spartan-2e FPGA—Delay-Locked Loop (DLL)—is used for making clocking signals. There are four DLL units inside every FPGA. The module uses system clock “sys_clk” from master DCM, corrects it by DLL and sends to all ADCs and FPGAs on the board. Due to the same logic used inside each FPGA in the NuTel system (all trigger and ADC FPGAs), the whole system is correctly synchronized by a single system clock.

4.3. Firmware design of control FPGA and configuration CPLD

The control FPGA controls data transfer processes through the local bus, makes an address distribution between FPGAs, controls analog switches before ADCs and controls DAC and calibration pulses. Firmware for configuration CPLD is downloaded via JTAG; it controls write/read processes to/from flash memory IC and the configuration of all FPGAs using the information recorded in flash RAM. A special jumper on board allows enabling/disabling the automatic FPGA configuration function. If automatic configuration is disabled, PLX will directly access to the flash RAM omitting control FPGA (which will be non-configured). This function is used if a wrong code for control FPGA was recorded at flash memory.
5. Trigger design details

Sum of charges within two or three system clocks (40 MHz) is calculated from ADC data inside the ADC FPGA, avoiding jitter. The choice will be done after TeV gamma-ray observation. This sum is compared with the three programmable thresholds: LL, HL and VHL. Sensitivity of the electronics is about 10 ADC counts/photon if $HV = 750 \text{ V}$ and gain in DCM of $\frac{1}{\sqrt{2}}$. Total RMS noise in the system has been measured to be between 0.6 and 0.8 ADC counts for different channels, allowing operation with thresholds down to $\frac{1}{3}$ single photon out of $\frac{1}{3}$ range.

Outputs of comparators pass to the trigger FPGA which processes signals from 32 pixels ($4 \times 8$ array). Three kinds of trigger request are calculated from this information: VHL trigger (code 3) as logical OR of 32 VHL signals; hard trigger (HT, code 2) as logical OR of HT cells; soft trigger (ST, code 1) as logical OR of ST cells. HT/ST cell processes signals from $3 \times 3$ array using simple logic with programmable parameters: HT = HL(central pixel) and (sum of LL hits $\geq N_{HT}$); ST = [HL(central pixel), LL(central pixel)] and (sum of LL hits $\geq N_{ST}$).

Because 12 of the trigger cells are located inside the central part of the $4 \times 8$ pixels array while 16 are near the edge and four cells are in the corners, three kinds of trigger cells are used in the logic: central, edge and corner ones. Central trigger cell processes a $3 \times 3$ pixels array, edge trigger cell—$3 \times 2$ pixels array and corner cell—only $2 \times 2$ pixels array. Simulated efficiency of this logic is about 95% in comparison with the usage of a $3 \times 3$ pixels cells only which necessitates a lot of interconnections between modules.

Trigger request signals are collected from the whole telescope (512 channels) via a daisy-chain mechanism through the front-panel of the DCM using logical OR with varying priority: ST has the lowest priority and VHL trigger the highest priority. The last DCM in the crate is master DCM. If a single telescope is used, the single master module generates trigger decision signals synchronously to all slave DCM in the chassis. This signal initiates storage of the event information from cycle RAM to buffer RAM. Trigger decision occurs on each VHL trigger request while HL and SL requests are divided by programmable divider as $1 : N$, where $N = 1, \ldots, 2^{16} - 1$. If 16 trigger signals passed, module generates interrupt request to send data to SBC.

In the two-telescope configuration (master-A and master-B) additional checking of the timing coincidence of $\pm 1$ system clocks between the two telescopes are applied inside the trigger decision logic. Final trigger request in this case is a logical
AND with priority as: VHL (highest), HT and ST (lowest). Trigger decision logic is the same as in the single master DCM. Fig. 11 shows simplified schematics of the distributed trigger system in NuTel project.

Observed total delay between light pulse to MAPMT and trigger decision signal is about 700 ns if only one DCM is used (single DCM) and is 50 ns more for each additional module. For the full system consisting of two telescopes of 512 channels each the total delay is about 1.5 μs. Within a distributed trigger operation data are kept inside cycle RAM, which is used as a programmable digital delay with delay up to 6.4 μs.

Expected NSB photon flux per pixel is 15 photons/μs when looking at a mountain and 150 photons/μs when looking at the sky near the horizon, that corresponds to 1.5 and 15 pe/μs in case of 20% quantum efficiency of MAPMT and 50% efficiency of optical system (mirror, lenses, fibers). The usage of the single threshold per channel (VHL) allows detection of air showers with the total number of photons reaching one telescope of ≥200 that corresponds to ~20 photoelectrons. Expected event rate due to NSB in this case is a few hundreds of Hz. If cluster trigger logic is used (HT), event rate due to background is reduced by a factor of ~10 with the same sensitivity limitation for air showers, of 200 photons.

Event rate from random NSB is reduced dramatically if two telescopes are used: from 100 Hz down to ~10^{-3} Hz after the timing coincidence between the signals in two telescopes (using hardware/firmware trigger described above). An additional reduction factor of ~500 times is obtained after checking the coincidence of directions (on-line software trigger). Off-line reconstruction of the shower direction allows separate residual background events from the real air showers reached from the mountain or from the Earth. The expected event flux is ~1 event/year. To ensure the system functionality we will use a high-efficiency, high-background ST. Also we can store a fraction of events not passing the timing coincidence.

6. Conclusion

We have developed the electronics of the data acquisition system for the NuTel project to process data from MAPMT-based detectors. The system consists of 32 DCM modules—two master DCM and 30 slave DCM—located in two cPCI-crates together with two single board computers. This system selects events using trigger logic and timing coincidence between two telescopes, digitizes, collects, sends data to SBC via 33-MHz 32-bit wide cPCI bus. Main features of DCM design are: low-cost per channel; high integration (32 channels per module); programmable gain control before ADC between $2^{0}$ and $2^{2}$; 40-MHz 10-bit ADC per channel; two memory blocks of 256 × 16 bit size each per channel; up to three programmable discriminators per channel; trigger logic on module; easy combination of any number of modules into the complete system via front-panel; calibration pulse to pre-amplifier via front-panel using 12-bit DAC on board; low power-on current. The usage of six FGPAAs with a total number of logic cells of about 32000 provides wide flexibility on both firmware and software levels allowing development of a complicated design at the firmware level and makes this module an universal instrument for most scientific applications to process data from various detectors especially those using MAPMT.

The system consisting of the single master DCM and a set of slave DCMs is working well. Software for the operation of a two-chassis system is in development and debugging state now. Mass
production of modules for the 1024-channel system has been completed in August 2004. Debugging and testing have been done in November 2004. Integration tests in the laboratory are on-going and are expected to be completed in summer 2005. We are planning to begin operation with the observation of well-known TeV γ-ray sources on the sky to calibrate our detector in fall 2005. After that from the beginning of 2006 we will be on duty for observation of air showers emerging from the mountain.

References