Receipt Leaf Circuit by Zyouell · Pull Request #405 · Lagrange-Labs/mapreduce-plonky2

Zyouell · 2024-11-08T14:14:15Z

This PR adds the functionality for verifying leaves in Receipt MPT Trie in a circuit, together with helper functions for extracting relevant information about specific event from and extra functionality for extract values from large arrays in circuit.

Zyouell · 2024-11-08T14:18:01Z

Removed some unneeded code used for debugging circuits.

nikkolasg

Cool, very nice PR ! Left a bunch of comments but nothing critical

nikkolasg · 2024-11-08T20:38:02Z

+            })
+            .collect();
+
+        // We need to express `at` in base 64, we are also assuming that the initial array was smaller than 64^2 = 4096 which we enforce with a range check.


why do you need to do a range check for the length of an Array , isn't it a const generic i.e. constant (and therefore a simple assert! for us would be sufficient to make sure we're not abusing this API but as long as we put the constant < 4096 then we should be good ?)

We could just assert but the paranoid aspect of me (which comes from years of dealing with doing privacy) wanted the actual circuit to enforce that the prover didn't supply an index outside the size of the array possibly resulting in some unwanted behaviour.

Ah right the check is done on the at but the comment says we enforce via a range check but it's talking about the size of the array so that was confusing for me. Can you quickly re-write the comment then ? All good for the range check on the at

nikkolasg · 2024-11-08T20:41:01Z

+        // We need to express `at` in base 64, we are also assuming that the initial array was smaller than 64^2 = 4096 which we enforce with a range check.
+        // We also check that `at` is smaller that the size of the array.
+        let array_size = b.constant(F::from_noncanonical_u64(SIZE as u64));
+        let less_than_check = less_than_unsafe(b, at, array_size, 12);


why unsafe is safe there ? Can you put that in a comment ?

Yep, it was because on line 637 I range check at to be a 12-bit number anyway so it seemed pointless to do it twice.

nikkolasg · 2024-11-08T20:43:39Z

+        let true_target = b._true();
+        b.connect(less_than_check.target, true_target.target);
+        b.range_check(at, 12);
+        let (low_bits, high_bits) = b.split_low_high(at, 6, 12);


Sorry i'm a bit lost in this function. Can you write an example or just more doc about how you approach the decomposition and how you do the random access and the recombination at the end ?

Yeah sure thing

nikkolasg · 2024-11-08T20:44:25Z

    }

+    #[test]
+    fn test_random_access_large_array() {


Can you write a quick comparison of gates in either a comment on the function or in the test to see what is actually the gain ? Not asking for a full fledged benchmark but at least an idea of the gain.

nikkolasg · 2024-11-08T20:45:17Z

+    /// This is more expensive than [`Self::extract_array`] due to using [`Self::random_access_large_array`]
+    /// instead of [`Self::value_at`]. This function enforces that the values extracted are within the array.


Well, it is more expensive only if the array is small. Can you say more about when it is beneficial to using this function ?

This should now be in the code, for the case I tested the old method took 129 rows and this new method took 15.

nikkolasg · 2024-11-08T21:37:34Z

+};
+use plonky2::field::types::Field;
+
+/// Calculate `metadata_digest = D(key_id || value_id || slot)` for receipt leaf.


The code below doesn't seem to reflect this comment. Can you clarify ?

Apologies, I was using some of the already existing code as a starting point and adapting it, I must have forgotten to update the comment I'll fix that now

nikkolasg · 2024-11-08T21:38:03Z

+const H_RANGE: PublicInputRange = 0..PACKED_HASH_LEN;
+/// - `K : [6]F` : Length of the transaction index in nibbles
+const K_RANGE: PublicInputRange = H_RANGE.end..H_RANGE.end + MAX_INDEX_NIBBLES;
+/// `T : F` pointer in the MPT indicating portion of the key already traversed (from 6 → 0)


6 should be a const generic right ? why 6 ?

I was working off of the 30 million gas block limit on Ethereum. Since any transaction costs a minimum of 21 000 gas the theoretical maximum number of transactions is 1428 in a block which is 3 bytes long, so 6 nibbles.

Correction its two bytes long so should only be 4 nibbles, I forgot how to do maths there for a little bit.

Great, can you write this down in the source code then ? 🙏

nikkolasg · 2024-11-13T11:45:49Z

        // Spin up a local node.

-        let rpc = ProviderBuilder::new().on_anvil();
+        let rpc = ProviderBuilder::new().on_anvil_with_config(|anvil| Anvil::block_time(anvil, 1));


mhhh why are you introducing time effects now ? I don't understand the reason. Is it to have multiple tx inside a block or stg ?
Couldn't we use multicalls instead for example ?

It was to have multiple transactions in a block, I did not know the multicall functionality existed so I'll update it to use that instead.

If it's not too much more complicated it'd be great. Reason is i have personally very bad experience relying on timing in tests in CI, it's often randomly failing.

So it turns out Multicall as a crate didn't play nice with the version of alloy that we were using. So instead I remembered that I can just make Anvil only mine blocks when its told to rather than at set intervals. So we now simply send all the transactions and mine the block after they have all been sent.

Zyouell · 2024-12-02T14:00:45Z

The latest commit does a number of things. Firstly it rebases the branch on to a more up to date version of the main repository. It also makes changes to reflect the documentation as per the notion page.

We now treat receipt extraction as a subset of value extraction, along with simple and mapping. The APIs in value_extraction/api.rs have been updated to reflect this along with added unit testing for the receipt case.

In block_extraction/circuit.rs we now specify the type of extraction we are performing, this allows us to select which of the three MPT roots we wish to verify against depending on if this extraction is to do with storage, receipts or transactions. The relevant type is selected using an enum.

In final_extraction we add a new circuit that does not use BaseCircuit since receipt extraction does not require verifying account storage roots in the state trie root.

SInce these commits have made some sizeable changes to the previous state of the PR if @nicholas-mainardi could look over again to check the logic as well as @nikkolasg that would be much appreciated.

nicholas-mainardi

Dense PR but really good work given the complexity of the circuits. Main changes are:

Define methods to compute column ids uniquely for the receipt trie table
Bind transaction index to the MPT key used to extract leaves
Compute values digests coherently with the structure expected by the DB creation circuit after generic struct extraction

nicholas-mainardi · 2024-12-10T11:22:42Z

+        let arrays: Vec<Array<T, RANDOM_ACCESS_SIZE>> = (0..padded_size)
+            .map(|i| Array {
+                arr: create_array(|j| {
+                    let index = 64 * i + j;


64 should be RANDOM_ACCESS_SIZE?

Yep, will change

nicholas-mainardi · 2024-12-10T11:57:35Z

+        let less_than_check = less_than_unsafe(b, at, array_size, 12);
+        let true_target = b._true();
+        b.connect(less_than_check.target, true_target.target);
+        b.range_check(at, 12);


Isn't this range-check redundant? It seems to me that at will be implicitly range-checked inside split_low_high

Yes it is, I forgot that it performed a range check inside split_low_high so I'll remove it

nicholas-mainardi · 2024-12-10T14:05:07Z

+
+        // We know that the rlp encoding of the compact encoding of the key is going to be in roughly the first 10 bytes of
+        // the node since the node is list byte, 2 bytes for list length (maybe 3), key length byte (1), key compact encoding (4 max)
+        // so we take 10 bytes to be safe since this won't effect the number of random access gates we use.


AFAIU, the value 10 depends on the maximum number of nibbles we expect for transaction index, which should be 4. So, shouldn't we compute this constant depending from the maximum number of nibbles? Otherwise, if we decide to change this upper bound later on, then this estimation will no longer hold

nicholas-mainardi · 2024-12-10T14:43:50Z

+
+/// Given a `PartitionWitness` that has only inputs set, populates the rest of the witness using the
+/// given set of generators.
+pub fn debug_generate_partial_witness<


Thanks for adding this debugging utility. So, it looks to me that this method is basically a copy of generate_partial_witness of Plonky2 with some print information, is it correct? If it is, then wdyt about modifying directly the method in our Plonky2 fork (maybe adding a debug input parameter that specifies whether to print out this data or not)? In this way, we don't duplicate the logic. Wdyt?

nicholas-mainardi · 2024-12-10T15:08:22Z

+    /// The topics for this Log
+    topics: [LogColumn; 3],
+    /// The extra data stored by this Log
+    data: [LogColumn; 2],


Shouldn't we use constants in place of integers in this struct?

For data yes we probably should, but topics is hard limited by solidity at 4 (with the first being the event signature hash) so I think it should be fine to leave this as three rather than add another constant.

nicholas-mainardi · 2024-12-11T11:53:16Z

+        b: &mut CircuitBuilder<GoldilocksField, 2>,
+        block_pi: &[Target],
+        value_pi: &[Target],
+    ) -> ReceiptExtractionWires {


Why do we need to return wires here? build should return wires either if there are input wires that need to be assigned, or if this a gadget with output wires that need to be employed by other circuit components. But it looks to me we are in neither of these 2 cases.

nicholas-mainardi · 2024-12-11T12:08:39Z

+            // We also keep track of which log this is in the receipt to avoid having identical rows in the table in the case
+            // that the event we are tracking can be emitted multiple times in the same transaction but has no topics or data.
+            let log_number = b.constant(F::from_canonical_usize(index + 1));
+            let log_no_digest = b.map_to_curve_point(&[one, log_number]);


Same as for gas used digest: I think we should employ a column identifier provided as input and computed off-circuit rather than the hard-coded constant 1

Zyouell requested review from nicholas-mainardi and nikkolasg November 8, 2024 14:14

Zyouell force-pushed the zyouell/receipt-leaf branch from f0c0e28 to 434479c Compare November 8, 2024 14:17

Zyouell force-pushed the zyouell/receipt-leaf branch from 434479c to 4bd1f0a Compare November 8, 2024 14:23

nikkolasg reviewed Nov 8, 2024

View reviewed changes

Zyouell force-pushed the zyouell/receipt-leaf branch 2 times, most recently from 002ed27 to af6589f Compare November 11, 2024 19:11

nikkolasg reviewed Nov 13, 2024

View reviewed changes

nikkolasg and others added 11 commits December 2, 2024 13:44

test with receipts encoding

cb98991

wip

0922676

further testing

82f304d

WIP: Receipt Trie leaves

375beb5

Receipt Leaf Circuit added with tests

6a2d2d8

Change Receipt query test

ba702ee

Address review comments

bbf02b0

Value digest computation corrected

13d3cec

Moved receipt value extraction location

64b8c9f

Added unit tests for receipt leaf api

fc59d4d

Rebased onto feat/receipt-trie

f4d4a4b

Zyouell force-pushed the zyouell/receipt-leaf branch from 6a94288 to f4d4a4b Compare December 2, 2024 13:53

Reworked block extraction to extract all three roots

455290f

nicholas-mainardi suggested changes Dec 11, 2024

View reviewed changes

Zyouell added 2 commits December 17, 2024 16:44

Added testing for final extraction API

d644c70

Addressed review comments

04f6d13

Zyouell force-pushed the zyouell/receipt-leaf branch from f07a164 to 04f6d13 Compare December 17, 2024 16:44

Zyouell requested a review from nicholas-mainardi December 17, 2024 16:45

Fixed tests to pass CI

5a2f7e3

		/// This is more expensive than [`Self::extract_array`] due to using [`Self::random_access_large_array`]
		/// instead of [`Self::value_at`]. This function enforces that the values extracted are within the array.

Conversation

Zyouell commented Nov 8, 2024

Uh oh!

Zyouell commented Nov 8, 2024

Uh oh!

nikkolasg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Zyouell commented Dec 2, 2024

Uh oh!

nicholas-mainardi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment